Re: [HACKERS] recovery_target_time and standby_mode
On 11/12/2014 10:06 AM, Robert Haas wrote: >> hat *appears* to be happening is that the pause_at_recovery_target, >> > followed by the restart, on the replica causes it to advance one commit >> > on timeline 1. But *not all the time*; this doesn't happen in my >> > pgbench-based tests. >> > >> > There's a workaround for the user (they just restore the replica to 5 >> > minutes earlier), but I'm thinking this is a minor bug somewhere. > I'm not sure what's going on here, but keep in mind that when you > restart the replica, it's going to back up to the most recent > restartpoint and begin replication from there, not from the point it > was at when you shut down. Except that in the problem case, it appears to be going *forwards*. What would cause that? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Mon, Oct 27, 2014 at 5:26 PM, Amit Kapila wrote: > > > Going further with verification of this patch, I found below issue: > Run the testcase.sql file at below link: > http://www.postgresql.org/message-id/4205e661176a124faf891e0a6ba9135266347...@szxeml509-mbs.china.huawei.com > ./vacuumdb --analyze-in-stages -j 8 -d postgres > Generating minimal optimizer statistics (1 target) > Segmentation fault > > Server Log: > ERROR: syntax error at or near "minimal" at character 12 > STATEMENT: ANALYZE ng minimal optimizer statistics (1 target) > LOG: could not receive data from client: Connection reset by peer > As mentioned by you offlist that you are not able reproduce this issue, I have tried again and what I observe is that I am able to reproduce it only on *release* build and some cases work without this issue as well, example: ./vacuumdb --analyze-in-stages -t t1 -t t2 -t t3 -t t4 -t t5 -t t6 -t t7 -t t8 -j 8 -d postgres Generating minimal optimizer statistics (1 target) Generating medium optimizer statistics (10 targets) Generating default (full) optimizer statistics So to me, it looks like this is a timing issue and please notice why in error the statement looks like "ANALYZE ng minimal optimizer statistics (1 target)". I think this is not a valid statement. Let me know if you still could not reproduce it. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] On partitioning
> ow...@postgresql.org] On Behalf Of Amit Langote > Sent: Thursday, November 13, 2014 3:50 PM > > Greenplum uses a single table for this purpose with separate columns for range > and list cases, for example. They store allowed values per partition though. > They have 6 partitioning related catalog/system views., by the way. Perhaps, > interesting as a reference. > > http://gpdb.docs.pivotal.io/4330/index.html#ref_guide/system_catalogs/pg_p > arti > tions.html > Oops, wrong link. Use this one instead. http://gpdb.docs.pivotal.io/4330/index.html#ref_guide/system_catalogs/pg_parti tion_rule.html > Thanks, > Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] group locking: incomplete patch, just for discussion
On Wed, 2014-11-12 at 14:16 -0500, Robert Haas wrote: > Detected deadlocks are fine. Improving the deadlock detector is the > heart of what needs to be done here. OK, great. > As you say, the lock requests > we're talking about will rarely wait, so deadlocks won't be frequent. > The issue is making sure that, if they do happen, we get a better > behavior than "your parallel query hangs forever; good luck figuring > out why". Right. We can still use this patch's notion of a lock group in the deadlock detector, but we don't need it to actually affect the way a lock is granted. That should eliminate concerns about subtle bugs. Later, after we understand how this is actually used, and if we see deadlock problems, we can look for ways to solve/mitigate them. This seems to be what Andres was saying, here: http://www.postgresql.org/message-id/20141031130727.gf13...@awork2.anarazel.de So I'll follow up in that thread, because it's an interesting discussion. > More generally, I think there's some misunderstanding about the > overall goal of the parallelism infrastructure that I'm trying to > create. ... But my goal is in some ways > the opposite: I'm trying to make it possible to run as much existing > PostgreSQL backend code as possible inside a parallel worker without > any modification. Thank you for clarifying, I think this is a good approach. Back to the patch: If I understand correctly, the _primary_ goal of this patch is to make it safe to take out heavyweight locks in worker processes, even if the deadlock involves LWLocks/latches synchronizing among the processes within a lock group. For example, say processes A1 and A2 are in the same lock group, and B is in a different lock group. A1 is holding heavyweight lock H1 and waiting on a LW lock L1; A2 is holding L1 and waiting on heavyweight lock H2; and B is holding H2 and waiting on H1. The current deadlock detector would see a dependency graph like: A2 -> B -> A1 But with lock groups, it would see: (A1 A2) -> B -> (A1 A2) which is a cycle, and can be detected regardless of the synchronization method used between A1 and A2. There are some details to work out to avoid false positives, of course. Is that about right? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On the warpath again about ill-considered inclusion nests
* Tom Lane (t...@sss.pgh.pa.us) wrote: > src/include/rewrite/rowsecurity.h, which one would > reasonably think to be a rewriter header (nevermind its header comment > to the contrary), nonetheless includes execnodes.h (executor stuff) I'll fix the header comment. The include of execnodes.h was a not used leftover from prior versions. > and relation.h (planner stuff), neither of which a rewriter header relation.h was included only for the hook definition, and only for Relation's definition at that, which should have been coming from utils/relcache.h instead, will fix that. > has any business including. And if that weren't bad enough, it's > been included into utils/rel.h (relcache), This is for the definition of RowSecurityDesc. I'm happy to move that to a utils/rowsecurity.h instead, following how TriggerDesc is handled. > This needs to be cleaned up. If you don't want me doing it with > an axe, better put some suggestions forward. If the above is agreeable, I'll get it done tomorrow (err, later today, at this point). Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] On partitioning
> From: Stephen Frost [mailto:sfr...@snowman.net] > Sent: Thursday, November 13, 2014 3:40 PM > > > The point for me is just that range and list partitioning probably > > need different structure, and hash partitioning, if we want to support > > that, needs something else again. Range partitioning needs an array > > of partition boundaries and an array of child OIDs. List partitioning > > needs an array of specific values and a child table OID for each. > > Hash partitioning needs something probably quite different. We might > > be able to do it as a pair of arrays - one of type anyarray and one of > > type OID - and meet all needs that way. > > I agree that these will require different structures in the catalog.. > While reviewing the superuser checks, I expected to have a similar need > and discussed various options- having multiple catalog tables, having a > single table with multiple columns, having a single table with a 'type' > column and then a bytea blob. In the end, it wasn't really necessary as > the only thing which I expected to need more than 'yes/no' were the > directory permissions (which it looks like might end up killed anyway, > much to my sadness..), but while considering the options, I continued to > feel like anything but independent tables was hacking around to try and > reduce the number of inodes used for folks who don't actually use these > features, and that's a terrible reason to complicate the catalog and > code, in my view. > Greenplum uses a single table for this purpose with separate columns for range and list cases, for example. They store allowed values per partition though. They have 6 partitioning related catalog/system views., by the way. Perhaps, interesting as a reference. http://gpdb.docs.pivotal.io/4330/index.html#ref_guide/system_catalogs/pg_parti tions.html Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
On Thu, Nov 13, 2014 at 12:20 PM, Etsuro Fujita wrote: > Hi Ashutosh, > > Thanks for the review! > > (2014/11/13 15:23), Ashutosh Bapat wrote: > >> I tried to apply fdw-inh-3.patch on the latest head from master branch. >> It failed to apply using both patch and git apply. >> >> "patch" failed to apply because of rejections in >> contrib/file_fdw/output/file_fdw.source and >> doc/src/sgml/ref/create_foreign_table.sgml >> > > As I said upthread, fdw-inh-3.patch has been created on top of [1] and > fdw-chk-3.patch. Did you apply these patche first? > > Oh, sorry, I didn't pay attention to that. I will apply both the patches and review the inheritance patch. Thanks for pointing that out. > [1] https://commitfest.postgresql.org/action/patch_view?id=1599 > > Best regards, > Etsuro Fujita > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] On the warpath again about ill-considered inclusion nests
* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > Tom Lane wrote: > > I noticed that the recent custom-path commit completely ignored my > > advice about not including executor headers into planner headers or > > vice versa. On the way to fixing that, I was dismayed to discover > > that the RLS patch has utterly bollixed all semblance of modularization > > of the headers. src/include/rewrite/rowsecurity.h, which one would > > reasonably think to be a rewriter header (nevermind its header comment > > to the contrary), nonetheless includes execnodes.h (executor stuff) > > and relation.h (planner stuff), neither of which a rewriter header > > has any business including. And if that weren't bad enough, it's > > been included into utils/rel.h (relcache), which is close enough > > to guaranteeing that all planner and executor symbols are visible > > in every darn module we've got. Might as well just put everything > > we have in postgres.h and abandon all pretense of modularity. > > I noticed the RLS side of things a week ago as well, and wasn't very > pleased about it. I don't know about an axe, but we do need some > serious cleanup. Alright- I'll be looking into this. I've been in the weeds with the renaming previously suggested but may just address this first. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] inherit support for foreign tables
Hi Ashutosh, Thanks for the review! (2014/11/13 15:23), Ashutosh Bapat wrote: I tried to apply fdw-inh-3.patch on the latest head from master branch. It failed to apply using both patch and git apply. "patch" failed to apply because of rejections in contrib/file_fdw/output/file_fdw.source and doc/src/sgml/ref/create_foreign_table.sgml As I said upthread, fdw-inh-3.patch has been created on top of [1] and fdw-chk-3.patch. Did you apply these patche first? [1] https://commitfest.postgresql.org/action/patch_view?id=1599 Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
* Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Nov 12, 2014 at 5:06 PM, Tom Lane wrote: > > Robert Haas writes: > >> Maybe as anyarray, but I think pg_node_tree > >> might even be better. That can also represent data of some arbitrary > >> type, but it doesn't enforce that everything is uniform. > > > > Of course, the more general you make it, the more likely that it'll be > > impossible to optimize well. Agreed- a node tree seems a bit too far to make this really work well.. But, I'm curious what you were thinking specifically? A node tree which accepts an "argument" of the constant used in the original query and then spits back a table might work reasonably well for that case- but with declarative partitioning, I expect us to eventually be able to eliminate complete partitions from consideration on both sides of a partition-table join and optimize cases where we have two partitioned tables being joined with a compatible join key and only actually do joins between the partitions which overlap each other. I don't see those happening if we're allowing a node tree (only). If having a node tree is just one option among other partitioning options, then we can provide users with the ability to choose what suits their particular needs. > The point for me is just that range and list partitioning probably > need different structure, and hash partitioning, if we want to support > that, needs something else again. Range partitioning needs an array > of partition boundaries and an array of child OIDs. List partitioning > needs an array of specific values and a child table OID for each. > Hash partitioning needs something probably quite different. We might > be able to do it as a pair of arrays - one of type anyarray and one of > type OID - and meet all needs that way. I agree that these will require different structures in the catalog.. While reviewing the superuser checks, I expected to have a similar need and discussed various options- having multiple catalog tables, having a single table with multiple columns, having a single table with a 'type' column and then a bytea blob. In the end, it wasn't really necessary as the only thing which I expected to need more than 'yes/no' were the directory permissions (which it looks like might end up killed anyway, much to my sadness..), but while considering the options, I continued to feel like anything but independent tables was hacking around to try and reduce the number of inodes used for folks who don't actually use these features, and that's a terrible reason to complicate the catalog and code, in my view. It occurs to me that we might be able to come up with a better way to address the inode concern and therefore ignore it. There are other considerations to having more catalog tables, but declarative partitioning is an important enough feature, in my view, that I wouldn't care if it required 10 catalog tables to implement. Misrepresenting it with a catalog that's got a bunch of columns, all but one of which are NULL, or by using essentially removing the knowledge of the data type from the system by using a type column with some binary blob, isn't doing ourselves or our users any favors. That's not to say that I'm against a solution which only needs one catalog table, but let's not completely throw away proper structure because of inode or other resource consideration issues. We have quite a few other catalog tables which are rarely used and it'd be good to address the issue with those consuming resources independently. I'm not a fan of using pg_class- there are a number of columns in there which I would *not* wish to be allowed to be different per partition (starting with relowner and relacl...). Making those NULL would be just as bad (probably worse, really, since we'd also need to add new columns to pg_class to indicate the partitioning...) as having a sparsely populated new catalog table. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] inherit support for foreign tables
Hi Fujita-san, I tried to apply fdw-inh-3.patch on the latest head from master branch. It failed to apply using both patch and git apply. "patch" failed to apply because of rejections in contrib/file_fdw/output/file_fdw.source and doc/src/sgml/ref/create_foreign_table.sgml On Fri, Nov 7, 2014 at 5:31 PM, Etsuro Fujita wrote: > (2014/11/07 14:57), Kyotaro HORIGUCHI wrote: > >> Here are separated patches. > > fdw-chk.patch - CHECK constraints on foreign tables > fdw-inh.patch - table inheritance with foreign tables > > The latter has been created on top of [1]. > [1] > http://www.postgresql.org/message-id/540da168.3040...@lab.ntt.co.jp > >>> To be exact, it has been created on top of [1] and fdw-chk.patch. >>> >> I tried both patches on the current head, the newly added >> parameter to analyze_rel() hampered them from applying but it is >> easy to fix seemingly and almost all the other part was applied >> cleanly. >> > > Thanks for the review! > > By the way, are these the result of simply splitting of your last >> patch, foreign_inherit-v15.patch? >> >> http://www.postgresql.org/message-id/53feef94.6040...@lab.ntt.co.jp >> > > The answer is "no". > > The result of apllying whole-in-one version and this splitted >> version seem to have many differences. Did you added even other >> changes? Or do I understand this patch wrongly? >> > > As I said before, I splitted the whole-in-one version into three: 1) CHECK > constraint patch (ie fdw-chk.patch), 2) table inheritance patch (ie > fdw-inh.patch) and 3) path reparameterization patch (not posted). In > addition to that, I slightly modified #1 and #2. > > IIUC, #3 would be useful not only for the inheritance cases but for union > all cases. So, I plan to propose it independently in the next CF. > > I noticed that the latter disallows TRUNCATE on inheritance trees that contain at least one child foreign table. But I think it would be better to allow it, with the semantics that we quietly ignore the child foreign tables and apply the operation to the child plain tables, which is the same semantics as ALTER COLUMN SET STORAGE on such inheritance trees. Comments welcome. >>> >>> Done. And I've also a bit revised regression tests for both >>> patches. Patches attached. >>> >> > I rebased the patches to the latest head. Here are updated patches. > > Other changes: > > * fdw-chk-3.patch: the updated patch revises some ereport messages a > little bit. > > * fdw-inh-3.patch: I noticed that there is a doc bug in the previous > patch. The updated patch fixes that, adds a bit more docs, and revises > regression tests in foreign_data.sql a bit further. > > > Thanks, > > Best regards, > Etsuro Fujita > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] tracking commit timestamps
On Wed, Nov 12, 2014 at 10:06 PM, Petr Jelinek wrote: > Brief list of changes: > - the commit timestamp record now stores timestamp, lsn and nodeid Now that not only the commit timestamp is stored, calling that "commit timestamp", "committs" or "commit_timestamp" is strange, no? If this patch is moving toward being a more complex information provider, calling it commit information or commit data is more adapted, no? Documentation would need a fresh brush as well in this case. > - added code to disallow turning track_commit_timestamp on with too small > pagesize > - the get interfaces error out when track_commit_timestamp is off OK, that's sane. > - if the xid passed to get interface is out of range -infinity timestamp is > returned (I think it's bad idea to throw errors here as the valid range is > not static and same ID can start throwing errors between calls > theoretically) Already mentioned by Jim in a previous mail: this would be better as NULL. > - renamed the sql interfaces to pg_xact_commit_timestamp, > pg_xact_commit_timestamp_data and pg_last_committed_xact, they don't expose > the nodeid atm, I personally am not big fan of the "xact" but it seems more > consistent with existing naming pg_xact_commit_timestamp and pg_xact_commit_timestamp_data are overlapping. What's wrong with a single function able to return the whole set (node ID, commit timetamp, commit LSN)? Let's say pg_xact_commit_information or pg_xact_commit_data. Already mentioned, but I also find using a SRF able to return all the available information from a given XID value quite useful. And this does not conflict with what is proposed currently, you would need just to call the function with XID + number of entries wanted to get a single one. Comments from other folks about that? > - documented pg_resetxlog changes and make all the pg_resetxlog options > alphabetically ordered > - added WAL logging of the track_commit_timestamp GUC > - added alternative expected output of the regression test so that it works > with make installcheck when track_commit_timestamp is on > - added C interface to set default nodeid for current backend > - several minor comment and naming adjustments mostly suggested by Michael Thanks for those adjustments. Then more input about the latest patch: 1) This block is not needed, option -e is listed twice: The -o, -x, -e, - -m, -O, - and -l + -m, -O, -l + and -e 2) Very small thing: a couple of files have no newlines at the end, among them committs.conf and test_committs/Makefile. 3) pg_last_committed_xact and not pg_last_xact_commit_information or similar? 4) storage.sgml needs to be updated with the new folder pg_committs 5) Er.. node ID is missing in pg_last_committed_xact, no? 6) This XXX notice can be removed: + /* +* Return empty if the requested value is older than what we have or +* newer than newest we have. +* +* XXX: should this be error instead? +*/ We are moving toward returning invalid information in the SQL interface when the information is not in history instead of an error, no? (Note that I am still a partisan of an error message to let the caller know that commit info history does not have the information requested). 7) Note that TransactionTreeSetCommitTsData still never sets do_xlog at true and that WriteSetTimestampXlogRec never gets called. So no information is WAL-logged with this patch. Wouldn't that be useful for standbys as well? Perhaps I am missing why this is disabled? This code should be activated IMO or it would be just untested. 8) As a more general point, the node ID stuff makes me uncomfortable and is just added on top of the existing patch without much thinking... So I am really skeptical about it. The need here is to pass on demand a int8 that is a node ID that can only be set through a C interface, so only extensions could play with it. The data passed to a WAL record is always built and determined by the system and entirely transparent to the user, inserting user-defined data like that inconsistent with what we've been doing until now, no? Also, a question particularly for BDR and Slony folks: do you sometimes add a new node using the base backup of an existing node :) See what I come up with: a duplication of this new node ID system with the already present system ID, no? Similarly, the LSN is added to the WAL record containing the commit timestamp, but cannot the LSN of the WAL record containing the commit timestamp itself be used as a point of reference for a better ordering? That's not exactly the same as the LSN of the transaction commit, still it provides a WAL-based reference. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Race in "tablespace" test on Windows
On Thu, Nov 13, 2014 at 8:46 AM, Noah Misch wrote: > > On Tue, Nov 11, 2014 at 10:21:26AM +0530, Amit Kapila wrote: > > On Sat, Nov 8, 2014 at 10:34 AM, Noah Misch wrote: > > > Here is a briefer command sequence exhibiting the same problem: > > > > > > To make this work as well on Windows as it does elsewhere, DROP TABLESPACE > > > would need to wait for other backends to close relevant unlinked files. > > > Perhaps implement "wait_unlinked_files(const char *dirname)" to poll > > unlinked, > > > open files until they disappear. (An attempt to open an unlinked file > > reports > > > ERROR_ACCESS_DENIED. It might be tricky to reliably distinguish this > > cause > > > from other causes of that error, but it should be possible.) > > > > I think the proposed mechanism can work but the wait can be very long > > (untill the backend holding descriptor executes another command). > > The DROP TABLESPACE could send a catchup interrupt. > Yeah, that can work. > > Can we think of some other solution like in Drop Tablespace instead of > > checking if directory is empty, check if there is no object that belongs > > to database/cluster, then allow to forcibly delete that directory someway. > > I'm not aware of a way to forcibly delete the directory. One could rename > files to the tablespace top-level directory just before unlinking them. Since > DROP TABLESPACE never removes that directory, their continued presence there > would not pose a problem. (Compare use of the rename-before-unlink trick in > RemoveOldXlogFiles().) That adds the overhead of an additional system call to > every unlink, which might be acceptable. It may be possible to rename after > unlink, as-needed in DROP TABLESPACE. > Right. I think we can discuss further about which approach is better, once someone decides to work on this issue. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] tracking commit timestamps
On Thu, Nov 13, 2014 at 7:56 AM, Jim Nasby wrote: > On 11/12/14, 7:06 AM, Petr Jelinek wrote: >> >> - if the xid passed to get interface is out of range -infinity timestamp >> is returned (I think it's bad idea to throw errors here as the valid range >> is not static and same ID can start throwing errors between calls >> theoretically) > > > Wouldn't NULL be more appropriate? Definitely. Defining a given value for information not valid is awkward. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()
On Tue, Nov 11, 2014 at 3:52 PM, Peter Geoghegan wrote: > I'm pretty puzzled by this. Other than our "agree to disagree and > defer to committer" position on the question of whether or not more > than one suggestion can come from a single RTE, which you were fine > with before [1], I have only restored the core/contrib separation to a > state recently suggested by Robert as the best and simplest all around > [2]. > Did I miss something else? My point is: I am not sure I can be defined as a reviewer of this patch or take any credit in this patch review knowing that the latest version submitted is a simple rebase of the version I did my first review on. Hence, code speaking, this patch is in the same state as when it has been firstly submitted. Thanks, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback
Thanks for reviewing the patch! On Thu, Nov 13, 2014 at 4:05 AM, Alvaro Herrera wrote: > Fujii Masao wrote: > >> --- 127,152 >>When this option is used, pg_receivexlog will >> report >>a flush position to the server, indicating when each segment has >> been >>synchronized to disk so that the server can remove that segment >> if it >> ! is not otherwise needed. --synchronous option >> must >> ! be specified when making pg_receivexlog run as >> ! synchronous standby by using replication slot. Otherwise WAL data >> ! cannot be flushed frequently enough for this to work correctly. >> >> >> > > Whitespace damage here. Fixed. >> + printf(_(" --synchronous flush transaction log in real >> time\n")); > > "in real time" sounds odd. How about "flush transaction log > immediately after writing", or maybe "have transaction log writes be > synchronous". The former sounds better to me. So I chose it. >> --- 781,791 >> now = feGetCurrentTimestamp(); >> >> /* >> ! * Issue sync command as soon as there are WAL data which >> ! * has not been flushed yet if synchronous option is true. >>*/ >> if (lastFlushPosition < blockpos && >> ! walfile != -1 && synchronous) > > I'd put the "synchronous" condition first in the if(), and start the > comment with it rather than putting it at the end. Both seem weird. Fixed, i.e., moved the "synchronous" condition first in the if()'s test and also moved the comment "If synchronous option is true" also first in the comment. >> progname, >> current_walfile_name, strerror(errno)); >> goto error; >> } >> lastFlushPosition = blockpos; >> ! >> ! /* >> ! * Send feedback so that the server sees the latest >> WAL locations >> ! * immediately if synchronous option is true. >> ! */ >> ! if (!sendFeedback(conn, blockpos, now, false)) >> ! goto error; >> ! last_status = now; > > I'm not clear about this comment .. why does it say "if synchronous > option is true" when it's not checking the condition? I added that comment because the code exists with the if() block checking "synchronous" condition. But it seems confusing. Just removed that part from the comment. Attached is the updated version of the patch. Regards, -- Fujii Masao *** a/doc/src/sgml/ref/pg_receivexlog.sgml --- b/doc/src/sgml/ref/pg_receivexlog.sgml *** *** 49,54 PostgreSQL documentation --- 49,61 +Unlike the standby's WAL receiver, pg_receivexlog +flushes WAL data only when WAL file is closed, by default. +--synchronous option must be specified to flush WAL data +in real time and ensure it's safely flushed to disk. + + + The transaction log is streamed over a regular PostgreSQL connection, and uses the replication protocol. The connection must be made with a superuser or a user *** *** 86,106 PostgreSQL documentation --F interval ---fsync-interval=interval - - - Specifies the maximum time to issue sync commands to ensure the - received WAL file is safely flushed to disk, in seconds. The default - value is zero, which disables issuing fsyncs except when WAL file is - closed. If -1 is specified, WAL file is flushed as - soon as possible, that is, as soon as there are WAL data which has - not been flushed yet. - - - - - -n --no-loop --- 93,98 *** *** 135,150 PostgreSQL documentation When this option is used, pg_receivexlog will report a flush position to the server, indicating when each segment has been synchronized to disk so that the server can remove that segment if it ! is not otherwise needed. When using this parameter, it is important ! to make sure that pg_receivexlog cannot become the ! synchronous standby through an incautious setting of ! ; it does not flush ! data frequently enough for this to work correctly. -v --verbose --- 127,152 When this option is used, pg_receivexlog will report a flush position to the server, indicating when each segment has been synchronized to disk so that the server can remove that segment if it ! is not otherwise needed. --synchronous option must ! be specified when making pg_receivexlog run as ! synchronous standby by using r
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Wed, Nov 12, 2014 at 9:30 PM, Fujii Masao wrote: > On Wed, Nov 12, 2014 at 12:40 AM, Tom Lane wrote: >> Fujii Masao writes: >>> On Tue, Nov 11, 2014 at 10:14 PM, Robert Haas wrote: Not to kibitz too much after-the-fact, but wouldn't it be better to give this a name that has "GIN" in it somewhere? >> >>> Maybe. gin_pending_list_cleanup_size? gin_pending_list_limit? Better name? >> >> gin_pending_list_limit sounds good to me. > > OK, barring any objection, I will rename the reloption and GUC to > gin_pending_list_limit. Done. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Race in "tablespace" test on Windows
On Tue, Nov 11, 2014 at 10:21:26AM +0530, Amit Kapila wrote: > On Sat, Nov 8, 2014 at 10:34 AM, Noah Misch wrote: > > Here is a briefer command sequence exhibiting the same problem: > > > > CREATE TABLESPACE testspace LOCATION '...somewhere...'; > > CREATE TABLE atable (c int) tablespace testspace; > > SELECT COUNT(*) FROM atable;-- open heap > > \c - > > ALTER TABLE atable SET TABLESPACE pg_default; > > DROP TABLESPACE testspace; -- bug: fails sometimes > > DROP TABLESPACE testspace; -- second one ~always works > > DROP TABLE atable; > > > > For me, it doesn't get success even second time, I am getting > the same error until I execute some command on first session > which means till first session has processed the invalidation > messages. > > postgres=# Drop tablespace tbs; > ERROR: tablespace "tbs" is not empty > postgres=# Drop tablespace tbs; > ERROR: tablespace "tbs" is not empty > > I have tested this on Windows 7. The behavior you see makes sense if you have a third, idle backend. I had only the initial backend and the "\c"-created second one. > > To make this work as well on Windows as it does elsewhere, DROP TABLESPACE > > would need to wait for other backends to close relevant unlinked files. > > Perhaps implement "wait_unlinked_files(const char *dirname)" to poll > unlinked, > > open files until they disappear. (An attempt to open an unlinked file > reports > > ERROR_ACCESS_DENIED. It might be tricky to reliably distinguish this > cause > > from other causes of that error, but it should be possible.) > > I think the proposed mechanism can work but the wait can be very long > (untill the backend holding descriptor executes another command). The DROP TABLESPACE could send a catchup interrupt. > Can we think of some other solution like in Drop Tablespace instead of > checking if directory is empty, check if there is no object that belongs > to database/cluster, then allow to forcibly delete that directory someway. I'm not aware of a way to forcibly delete the directory. One could rename files to the tablespace top-level directory just before unlinking them. Since DROP TABLESPACE never removes that directory, their continued presence there would not pose a problem. (Compare use of the rename-before-unlink trick in RemoveOldXlogFiles().) That adds the overhead of an additional system call to every unlink, which might be acceptable. It may be possible to rename after unlink, as-needed in DROP TABLESPACE. > > I propose to add > > this as a TODO, then bandage the test case with s/^\\c -$/RESET ROLE;/. > > Yeah, this make sense. Done. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_multixact not getting truncated
Jim Nasby wrote: > On 11/10/14, 12:16 AM, Josh Berkus wrote: > >On 11/09/2014 08:00 PM, Josh Berkus wrote: > >On 11/08/2014 01:46 PM, Andres Freund wrote: > >>>I'm these days suggesting that people should add manual vacuuming for > "older" relations during off peak hours on busy databases. There's too > many sites which service degrades noticeably during a full table vacuum. > >>Me too: https://github.com/pgexperts/flexible-freeze > > > >It turns out that not even a program of preventative scheduled vacuuming > >helps. This is because the template0 database anchors the minmxid and > >prevents it from being advanced until autovacuum gets around to that > >database, at whatever the minmxid threshold is. > > How did template0 even get a MultiXact? That sounds like they're really > abusing the template databases. :( (Do keep in mind that MXID 1 is a special > value.) No, it's normal -- template0 does not have a multixact in any tuple's xmax, but datminxid is set to the value that is current when it is frozen. > BTW, the only reason I know of not to set both min_age parameters to > zero is to prevent loss of forensic information. If that's not a > concern you can always just set them to zero. Even if it is a concern, > I suspect that the forensic info you could gather from a MultiXact is > a lot more limited than for an XID, so it's probably pretty safe > setting that to zero. Freezing tuples too early could cause useless dirtying of pages; if the tuple is deleted, updated or locked again after being frozen, you end up with more writes. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates
Andreas Karlsson wrote: > On 11/13/2014 02:03 AM, Andreas Karlsson wrote: > >Here is version 2 of the patch which detects the presence of gcc/clang > >style 128-bit integers and has been cleaned up to a reviewable state. I > >have not added support for any other compilers since I found no > >documentation 128-bit support with icc or MSVC. I do not have access to > >any Windows machines either. > > > >A couple of things I was not sure about was the naming of the new > >functions and if I should ifdef the size of the aggregate state in the > >catalog or not. configure is a generated file. If your patch touches it but not configure.in, there is a problem. > diff --git a/configure b/configure -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On the warpath again about ill-considered inclusion nests
Tom Lane wrote: > I noticed that the recent custom-path commit completely ignored my > advice about not including executor headers into planner headers or > vice versa. On the way to fixing that, I was dismayed to discover > that the RLS patch has utterly bollixed all semblance of modularization > of the headers. src/include/rewrite/rowsecurity.h, which one would > reasonably think to be a rewriter header (nevermind its header comment > to the contrary), nonetheless includes execnodes.h (executor stuff) > and relation.h (planner stuff), neither of which a rewriter header > has any business including. And if that weren't bad enough, it's > been included into utils/rel.h (relcache), which is close enough > to guaranteeing that all planner and executor symbols are visible > in every darn module we've got. Might as well just put everything > we have in postgres.h and abandon all pretense of modularity. I noticed the RLS side of things a week ago as well, and wasn't very pleased about it. I don't know about an axe, but we do need some serious cleanup. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates
On 11/13/2014 02:03 AM, Andreas Karlsson wrote: Here is version 2 of the patch which detects the presence of gcc/clang style 128-bit integers and has been cleaned up to a reviewable state. I have not added support for any other compilers since I found no documentation 128-bit support with icc or MSVC. I do not have access to any Windows machines either. A couple of things I was not sure about was the naming of the new functions and if I should ifdef the size of the aggregate state in the catalog or not. The correct file is attached in the message. -- Andreas Karlsson diff --git a/configure b/configure new file mode 100755 index c4f70e8..f11f1c8 *** a/configure --- b/configure *** _ACEOF *** 13734,13739 --- 13734,13751 fi + ac_fn_c_check_type "$LINENO" "__int128_t" "ac_cv_type___int128_t" "#include + " + ac_fn_c_check_type "$LINENO" "__uint128_t" "ac_cv_type___uint128_t" "#include + " + if test "x$ac_cv_type___int128_t" = xyes && test "x$ac_cv_type___uint128_t" = xyes; then : + + cat >>confdefs.h <<_ACEOF + #define HAVE_GCC_INT128_T 1 + _ACEOF + + fi + # We also check for sig_atomic_t, which *should* be defined per ANSI # C, but is missing on some old platforms. diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c new file mode 100644 index d61af92..d53905c *** a/src/backend/utils/adt/numeric.c --- b/src/backend/utils/adt/numeric.c *** static void apply_typmod(NumericVar *var *** 402,407 --- 402,410 static int32 numericvar_to_int4(NumericVar *var); static bool numericvar_to_int8(NumericVar *var, int64 *result); static void int8_to_numericvar(int64 val, NumericVar *var); + #ifdef HAVE_INT128 + static void int16_to_numericvar(int128 val, NumericVar *var); + #endif static double numeric_to_double_no_overflow(Numeric num); static double numericvar_to_double_no_overflow(NumericVar *var); *** numeric_float4(PG_FUNCTION_ARGS) *** 2639,2644 --- 2642,2650 * Actually, it's a pointer to a NumericAggState allocated in the aggregate * context. The digit buffers for the NumericVars will be there too. * + * On platforms which support 128-bit integers some aggergates instead use a + * 128-bit integer based transition datatype to speed up calculations. + * * -- */ *** numeric_accum_inv(PG_FUNCTION_ARGS) *** 2897,2902 --- 2903,2967 PG_RETURN_POINTER(state); } + #ifdef HAVE_INT128 + typedef struct Int16AggState + { + bool calcSumX2; /* if true, calculate sumX2 */ + int64 N; /* count of processed numbers */ + int128 sumX; /* sum of processed numbers */ + int128 sumX2; /* sum of squares of processed numbers */ + } Int16AggState; + + /* + * Prepare state data for a 128-bit aggregate function that needs to compute + * sum, count and optionally sum of squares of the input. + */ + static Int16AggState * + makeInt16AggState(FunctionCallInfo fcinfo, bool calcSumX2) + { + Int16AggState *state; + MemoryContext agg_context; + MemoryContext old_context; + + if (!AggCheckCallContext(fcinfo, &agg_context)) + elog(ERROR, "aggregate function called in non-aggregate context"); + + old_context = MemoryContextSwitchTo(agg_context); + + state = (Int16AggState *) palloc0(sizeof(Int16AggState)); + state->calcSumX2 = calcSumX2; + + MemoryContextSwitchTo(old_context); + + return state; + } + + /* + * Accumulate a new input value for 128-bit aggregate functions. + */ + static void + do_int16_accum(Int16AggState *state, int128 newval) + { + if (state->calcSumX2) + state->sumX2 += newval * newval; + + state->sumX += newval; + state->N++; + } + + /* + * Remove an input value from the aggregated state. + */ + static void + do_int16_discard(Int16AggState *state, int128 newval) + { + if (state->calcSumX2) + state->sumX2 -= newval * newval; + + state->sumX -= newval; + state->N--; + } + #endif /* * Integer data types all use Numeric accumulators to share code and *** numeric_accum_inv(PG_FUNCTION_ARGS) *** 2905,2915 --- 2970,2996 * for the sum(X*X) value. Hence, we use int2_accum and int4_accum only * for stddev/variance --- there are faster special-purpose accumulator * routines for SUM and AVG of these datatypes. + * + * Similarily we can, where available, use 128-bit integer accumulators + * for sum(X) for int8 and sum(X*X) for int2 and int4, but not sum(X*X) + * for int8. */ Datum int2_accum(PG_FUNCTION_ARGS) { + #ifdef HAVE_INT128 + Int16AggState *state; + + state = PG_ARGISNULL(0) ? NULL : (Int16AggState *) PG_GETARG_POINTER(0); + + /* Create the state data on the first call */ + if (state == NULL) + state = makeInt16AggState(fcinfo, true); + + if (!PG_ARGISNULL(1)) + do_int16_accum(state, (in128) PG_GETARG_INT16(1)); + #else NumericAggState *state; state = PG_ARGISNULL(0) ? NULL : (N
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On Thu, Nov 13, 2014 at 10:26 AM, Michael Paquier wrote: > On Thu, Nov 13, 2014 at 9:31 AM, Andres Freund wrote: >> I don't recall what the problem with just swapping the names was - but >> I'm pretty sure there was one... Hm. The index relation oids are >> referred to by constraints and dependencies. That's somewhat >> solvable. But I think there was something else as well... > The reason given 2 years ago for not using relname was the fast that > the oid of the index changes, and to it be refered by some pg_depend > entries: Feel free to correct: "and that it could be referred". -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On Thu, Nov 13, 2014 at 9:31 AM, Andres Freund wrote: > I don't recall what the problem with just swapping the names was - but > I'm pretty sure there was one... Hm. The index relation oids are > referred to by constraints and dependencies. That's somewhat > solvable. But I think there was something else as well... The reason given 2 years ago for not using relname was the fast that the oid of the index changes, and to it be refered by some pg_depend entries: http://www.postgresql.org/message-id/20121208133730.ga6...@awork2.anarazel.de http://www.postgresql.org/message-id/12742.1354977...@sss.pgh.pa.us Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
Andres Freund wrote: > On 2014-11-12 18:23:38 -0500, Robert Haas wrote: > > > The problem is that it's very hard to avoid the wrong index's > > > relfilenode being used when swapping the relfilenodes between two > > > indexes. > > > > How about storing both the old and new relfilenodes in the same pg_class > > entry? > > That's quite a cool idea > > [think a bit] > > But I think it won't work realistically. We have a *lot* of > infrastructure that refers to indexes using it's primary key. Hmm, can we make the relmapper do this job instead of having another pg_class column? Essentially the same sketch Robert proposed, instead we would initially set relfilenode=0 and have all onlookers use the relmapper to obtain the correct relfilenode; switching to the new relfilenode can be done atomically, and un-relmap the index once the process is complete. The difference from what Robert proposes is that the transient state is known to cause failures for anyone not prepared to deal with it, so it should be easy to spot what places need adjustment. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates
Hi, Here is version 2 of the patch which detects the presence of gcc/clang style 128-bit integers and has been cleaned up to a reviewable state. I have not added support for any other compilers since I found no documentation 128-bit support with icc or MSVC. I do not have access to any Windows machines either. A couple of things I was not sure about was the naming of the new functions and if I should ifdef the size of the aggregate state in the catalog or not. -- Andreas Karlsson -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()
On Wed, Nov 12, 2014 at 4:54 PM, Peter Geoghegan wrote: > Attached patch moves the Levenshtein distance implementation into core. Oops. Somehow managed to send a *.patch.swp file. :-) Here is the actual patch. -- Peter Geoghegan From b7df918f1a52107637600f3b22d1cff18bd07ae1 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Sat, 30 Nov 2013 23:15:00 -0800 Subject: [PATCH 1/2] Move Levenshtein distance implementation into core The fuzzystmatch Levenshtein distance implementation is moved from /contrib to core. However, the related SQL-callable functions may still only be used with the fuzzystmatch extension installed -- the fuzzystmatch definitions become simple forwarding stubs. It is not anticipated that the user-facing SQL functions will be moved into core in the future, due to the MAX_LEVENSHTEIN_STRLEN restriction. An in-core Levenshtein distance implementation is only anticipated to be helpful in building diagnostic messages. --- contrib/fuzzystrmatch/Makefile| 3 - contrib/fuzzystrmatch/fuzzystrmatch.c | 81 --- contrib/fuzzystrmatch/levenshtein.c | 403 -- src/backend/utils/adt/Makefile| 2 + src/backend/utils/adt/levenshtein.c | 394 + src/backend/utils/adt/varlena.c | 24 ++ src/include/utils/builtins.h | 5 + 7 files changed, 481 insertions(+), 431 deletions(-) delete mode 100644 contrib/fuzzystrmatch/levenshtein.c create mode 100644 src/backend/utils/adt/levenshtein.c diff --git a/contrib/fuzzystrmatch/Makefile b/contrib/fuzzystrmatch/Makefile index 024265d..0327d95 100644 --- a/contrib/fuzzystrmatch/Makefile +++ b/contrib/fuzzystrmatch/Makefile @@ -17,6 +17,3 @@ top_builddir = ../.. include $(top_builddir)/src/Makefile.global include $(top_srcdir)/contrib/contrib-global.mk endif - -# levenshtein.c is #included by fuzzystrmatch.c -fuzzystrmatch.o: fuzzystrmatch.c levenshtein.c diff --git a/contrib/fuzzystrmatch/fuzzystrmatch.c b/contrib/fuzzystrmatch/fuzzystrmatch.c index 7a53d8a..62e650f 100644 --- a/contrib/fuzzystrmatch/fuzzystrmatch.c +++ b/contrib/fuzzystrmatch/fuzzystrmatch.c @@ -154,23 +154,6 @@ getcode(char c) /* These prevent GH from becoming F */ #define NOGHTOF(c) (getcode(c) & 16) /* BDH */ -/* Faster than memcmp(), for this use case. */ -static inline bool -rest_of_char_same(const char *s1, const char *s2, int len) -{ - while (len > 0) - { - len--; - if (s1[len] != s2[len]) - return false; - } - return true; -} - -#include "levenshtein.c" -#define LEVENSHTEIN_LESS_EQUAL -#include "levenshtein.c" - PG_FUNCTION_INFO_V1(levenshtein_with_costs); Datum levenshtein_with_costs(PG_FUNCTION_ARGS) @@ -180,8 +163,20 @@ levenshtein_with_costs(PG_FUNCTION_ARGS) int ins_c = PG_GETARG_INT32(2); int del_c = PG_GETARG_INT32(3); int sub_c = PG_GETARG_INT32(4); - - PG_RETURN_INT32(levenshtein_internal(src, dst, ins_c, del_c, sub_c)); + const char *s_data; + const char *t_data; + int s_bytes, +t_bytes; + + /* Extract a pointer to the actual character data */ + s_data = VARDATA_ANY(src); + t_data = VARDATA_ANY(dst); + /* Determine length of each string in bytes and characters */ + s_bytes = VARSIZE_ANY_EXHDR(src); + t_bytes = VARSIZE_ANY_EXHDR(dst); + + PG_RETURN_INT32(varstr_leven(s_data, s_bytes, t_data, t_bytes, ins_c, + del_c, sub_c)); } @@ -191,8 +186,20 @@ levenshtein(PG_FUNCTION_ARGS) { text *src = PG_GETARG_TEXT_PP(0); text *dst = PG_GETARG_TEXT_PP(1); - - PG_RETURN_INT32(levenshtein_internal(src, dst, 1, 1, 1)); + const char *s_data; + const char *t_data; + int s_bytes, +t_bytes; + + /* Extract a pointer to the actual character data */ + s_data = VARDATA_ANY(src); + t_data = VARDATA_ANY(dst); + /* Determine length of each string in bytes and characters */ + s_bytes = VARSIZE_ANY_EXHDR(src); + t_bytes = VARSIZE_ANY_EXHDR(dst); + + PG_RETURN_INT32(varstr_leven(s_data, s_bytes, t_data, t_bytes, 1, 1, + 1)); } @@ -206,8 +213,20 @@ levenshtein_less_equal_with_costs(PG_FUNCTION_ARGS) int del_c = PG_GETARG_INT32(3); int sub_c = PG_GETARG_INT32(4); int max_d = PG_GETARG_INT32(5); - - PG_RETURN_INT32(levenshtein_less_equal_internal(src, dst, ins_c, del_c, sub_c, max_d)); + const char *s_data; + const char *t_data; + int s_bytes, +t_bytes; + + /* Extract a pointer to the actual character data */ + s_data = VARDATA_ANY(src); + t_data = VARDATA_ANY(dst); + /* Determine length of each string in bytes and characters */ + s_bytes = VARSIZE_ANY_EXHDR(src); + t_bytes = VARSIZE_ANY_EXHDR(dst); + + PG_RETURN_INT32(varstr_leven_less_equal(s_data, s_bytes, t_data, t_bytes, + ins_c, del_c, sub_c, max_d)); } @@ -218,8 +237,20 @@ levenshtein_less_equal(PG_FUNCTION_ARGS) text *src = PG_GETARG_TEXT_PP(0); text *dst = PG_GETARG_TEXT_PP(1); int max_d = PG_GETARG_INT32(2); - - PG_RETURN_INT32(levenshtein_less_equal_internal(src, dst, 1, 1, 1, max_d)); + const char *s_data; +
Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()
On Wed, Nov 12, 2014 at 1:13 PM, Peter Geoghegan wrote: > On Wed, Nov 12, 2014 at 12:59 PM, Robert Haas wrote: >> I agree with your proposed approach to moving Levenshtein into core. >> However, I think this should be separated into two patches, one of >> them moving the Levenshtein functionality into core, and the other >> adding the new treatment for missing column errors. If you can do >> that relatively soon, I'll make an effort to get the refactoring patch >> committed in the near future. Once that's done, we can focus in on >> the interesting part of the patch, which is the actual machinery for >> suggesting alternatives. > > Okay, thanks. I think I can do that fairly soon. Attached patch moves the Levenshtein distance implementation into core. You're missing patch 2 of 2 here, because I have yet to incorporate your feedback on the HINT itself -- when I've done that, I'll post a newly rebased patch 2/2, with those items taken care of. As you pointed out, there is no reason to wait for that. -- Peter Geoghegan .0001-Move-Levenshtein-distance-implementation-into-core.patch.swp Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On 2014-11-12 18:23:38 -0500, Robert Haas wrote: > On Wed, Nov 12, 2014 at 4:39 PM, Andres Freund wrote: > > On 2014-11-12 16:11:58 -0500, Robert Haas wrote: > >> On Wed, Nov 12, 2014 at 4:10 PM, Robert Haas wrote: > >> > On Thu, Nov 6, 2014 at 9:50 AM, Peter Eisentraut wrote: > >> >> If REINDEX cannot work without an exclusive lock, we should invent some > >> >> other qualifier, like WITH FEWER LOCKS. > >> > > >> > What he said. > > > > I'm unconvinced. A *short* exclusive lock (just to update two pg_class > > row), still gives most of the benefits of CONCURRENTLY. > > I am pretty doubtful about that. It's still going to require you to > wait for all transactions to drain out of the table while new ones are > blocked from entering. Which sucks. Unless all of your transactions > are very short, but that's not necessarily typical. Yes, it sucks. But it beats not being able to reindex a relation with a primary key (referenced by a fkey) without waiting several hours by a couple magnitudes. And that's the current situation. > > The problem is that it's very hard to avoid the wrong index's > > relfilenode being used when swapping the relfilenodes between two > > indexes. > > How about storing both the old and new relfilenodes in the same pg_class > entry? That's quite a cool idea [think a bit] But I think it won't work realistically. We have a *lot* of infrastructure that refers to indexes using it's primary key. I don't think we want to touch all those places to also disambiguate on some other factor. All the relevant APIs are either just passing around oids or relcache entries. There's also the problem that we'd really need two different pg_index rows to make things work. Alternatively we can duplicate the three relevant columns (indisready, indislive, indislive) in there for the different filenodes. But that's not entirely pretty. > 1. Take a snapshot. > 2. Index all the tuples in that snapshot. > 3. Publish the new relfilenode to an additional pg_class column, > relnewfilenode or similar. > 4. Wait until everyone can see step #3. Here all backends need to update both indexes, right? And all the indexing infrastructure can't deal with that without having separate oids & relcache entries. > 5. Rescan the table and add any missing tuples to the index. > 6. Set some flag in pg_class to mark the relnewfilenode as active and > relfilenode as not to be used for queries. > 7. Wait until everyone can see step #6. > 8. Set some flag in pg_class to mark relfilenode as not even to be opened. > 9. Wait until everyone can see step #8. > 10. Drop old relfilenode. > 11. Clean up by setting relfilenode = relnewfilenode, relfilenode = 0. Even that one isn't trivial - how do you deal with the fact that somebody looking at updating newrelfilenode might, in the midst of processing, see newrelfilenode = 0? I've earlier come up with a couple possible solutions, but I unfortunately found holes in all of them. And if I can find holes in them, there surely are more :(. I don't recall what the problem with just swapping the names was - but I'm pretty sure there was one... Hm. The index relation oids are referred to by constraints and dependencies. That's somewhat solvable. But I think there was something else as well... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] On the warpath again about ill-considered inclusion nests
I noticed that the recent custom-path commit completely ignored my advice about not including executor headers into planner headers or vice versa. On the way to fixing that, I was dismayed to discover that the RLS patch has utterly bollixed all semblance of modularization of the headers. src/include/rewrite/rowsecurity.h, which one would reasonably think to be a rewriter header (nevermind its header comment to the contrary), nonetheless includes execnodes.h (executor stuff) and relation.h (planner stuff), neither of which a rewriter header has any business including. And if that weren't bad enough, it's been included into utils/rel.h (relcache), which is close enough to guaranteeing that all planner and executor symbols are visible in every darn module we've got. Might as well just put everything we have in postgres.h and abandon all pretense of modularity. This needs to be cleaned up. If you don't want me doing it with an axe, better put some suggestions forward. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan
On Wed, Nov 12, 2014 at 9:49 PM, Robert Haas wrote: > On Tue, Nov 11, 2014 at 7:48 PM, Kouhei Kaigai wrote: >> Isn't provolatile = PROVOLATILE_IMMUTABLE sufficient? > > There are certainly things that are parallel-safe that are not > immutable. It might be the case that everything immutable is > parallel-safe. FWIW, when working on the concept of expression and clause shippability for Postgres-XC (aka the possibility to pass it safely to another backend, but in another PG node in this case), we discussed similar things and if I recall correctly we even discussed about adding a flag to pg_proc to define if a function was shippable or not. Finally what we finished with was not adding a new flag and use as rule that all the immutable functions can be safely shipped, and others not, even some stable functions that *could* be safe. Maybe Ashutosh has more comments on that, my memory may be failing. In the end, I think that you would finish with something similar. My 2c. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Teaching pg_dump to use NOT VALID constraints
On 11/10/14, 12:00 PM, Simon Riggs wrote: On 10 November 2014 17:33, Alvaro Herrera wrote: pg_dump --no-revalidaton will add "NOT VALID" onto the recreation SQL for any FKs, but only for ones that were already known to be valid. Well. Constraints that haven't been validated already have a NOT VALID emitted by ruleutils.c, yes? So what this patch does is add such a clause for all *other* constraints. Right? In other words what it aims to do is speed up loading of data by skipping the validation step on restore. Is that right? Correct. CHECK constraints are added onto main table so they validate at load. ISTM we could have the default pg_dump behavior emit NOT VALID constraints, and add VALIDATE CONSTRAINT commands at the end; that way the database is usable sooner but the constraints end up marked as validated by the time the dump is finished. Yes, may be an even better idea. We'd still want the --no-revalidation option, AFAICS. FKs are already "at the end". Perhaps we should add another "validation" section? I like the idea, just not sure how long it would take. Isn't the real use-case here that if constraints were valid when you dumped then we shouldn't have to *any* re-validate when we load? (Though, we'd have to be careful of that with CHECK because that can call user code...) -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_multixact not getting truncated
On 11/10/14, 12:16 AM, Josh Berkus wrote: On 11/09/2014 08:00 PM, Josh Berkus wrote: On 11/08/2014 01:46 PM, Andres Freund wrote: I'm these days suggesting that people should add manual vacuuming for "older" relations during off peak hours on busy databases. There's too many sites which service degrades noticeably during a full table vacuum. Me too: https://github.com/pgexperts/flexible-freeze It turns out that not even a program of preventative scheduled vacuuming helps. This is because the template0 database anchors the minmxid and prevents it from being advanced until autovacuum gets around to that database, at whatever the minmxid threshold is. How did template0 even get a MultiXact? That sounds like they're really abusing the template databases. :( (Do keep in mind that MXID 1 is a special value.) Regarding linking the two settings, I agree with others that XIDs and MXIDs are basically completely independent (as your customer apparently has discovered). If you set both of the min_age parameters fairly small then it doesn't matter which max limit (the table_age parameters) you hit; you'll get a full scan and the low min_age limits will mean you'll get good freezing of both. The only other thing I can think of would be having yet another set of minimum age limits that come into play when you're doing a full scan as opposed to a partial one, but that seems like overkill to me. I guess another option would be to get more aggressive depending on the size of pg_multixact/... BTW, the only reason I know of not to set both min_age parameters to zero is to prevent loss of forensic information. If that's not a concern you can always just set them to zero. Even if it is a concern, I suspect that the forensic info you could gather from a MultiXact is a lot more limited than for an XID, so it's probably pretty safe setting that to zero. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
On 11/12/14, 5:27 PM, Robert Haas wrote: Maybe as anyarray, but I think pg_node_tree >>might even be better. That can also represent data of some arbitrary >>type, but it doesn't enforce that everything is uniform. > >Of course, the more general you make it, the more likely that it'll be >impossible to optimize well. The point for me is just that range and list partitioning probably need different structure, and hash partitioning, if we want to support that, needs something else again. Range partitioning needs an array of partition boundaries and an array of child OIDs. List partitioning needs an array of specific values and a child table OID for each. Hash partitioning needs something probably quite different. We might be able to do it as a pair of arrays - one of type anyarray and one of type OID - and meet all needs that way. Another issue is I don't know that we could support multi-key partitions with something like an anyarray. Perhaps that's OK as a first pass, but I expect it'll be one of the next things folks ask for. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unintended restart after recovery error
On Wed, Nov 12, 2014 at 4:52 PM, Antonin Houska wrote: > Fujii Masao wrote: > >> On Wed, Nov 12, 2014 at 6:52 PM, Antonin Houska wrote: >> > While looking at postmaster.c:reaper(), one problematic case occurred to >> > me. >> > >> > >> > 1. Startup process signals PMSIGNAL_RECOVERY_STARTED. >> > >> > 2. Checkpointer process is forked and immediately dies. >> > >> > 3. reaper() catches this failure, calls HandleChildCrash() and thus sets >> > FatalError to true. >> > >> > 4. Startup process exits with non-zero status code too - either due to >> > SIGQUIT >> > received from HandleChildCrash or due to some other failure of the startup >> > process itself. However, FatalError is already set, because of the previous >> > crash of the checkpointer. Thus reaper() does not set RecoveryError. >> > >> > 5. As RecoverError failed to be set to true, postmaster will try to restart >> > the cluster, although it apparently should not. >> >> Why shouldn't postmaster restart the cluster in that case? >> > > At least for the behavior to be consistent with simpler cases of failed > recovery (e.g. any FATAL error in StartupXLOG), which end up not restarting > the cluster. It's true that if the startup process dies we don't try to restart, but it's also true that if the checkpointer dies we do try to restart. I'm not sure why this specific situation should be an exception to that general rule. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
On Wed, Nov 12, 2014 at 5:06 PM, Tom Lane wrote: > Robert Haas writes: >> I thought putting the partition boundaries into pg_inherits was a >> strange choice. I'd put it in pg_class, or in pg_partition if we >> decide to create that. > > Yeah. I rather doubt that we want this mechanism to be very closely > tied to the existing inheritance features. If we do that, we are > going to need a boatload of error checks to prevent people from breaking > partitioned tables by applying the sort of twiddling that inheritance > allows. Well, as I said upthread, I think it would be a pretty poor idea to imagine that the first version of this feature is going to obsolete everything we've done with inheritance. Are we going to reinvent the machinery to make inheritance children get scanned when the parent does? Reinvent Merge Append? >> Maybe as anyarray, but I think pg_node_tree >> might even be better. That can also represent data of some arbitrary >> type, but it doesn't enforce that everything is uniform. > > Of course, the more general you make it, the more likely that it'll be > impossible to optimize well. The point for me is just that range and list partitioning probably need different structure, and hash partitioning, if we want to support that, needs something else again. Range partitioning needs an array of partition boundaries and an array of child OIDs. List partitioning needs an array of specific values and a child table OID for each. Hash partitioning needs something probably quite different. We might be able to do it as a pair of arrays - one of type anyarray and one of type OID - and meet all needs that way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On Wed, Nov 12, 2014 at 4:39 PM, Andres Freund wrote: > On 2014-11-12 16:11:58 -0500, Robert Haas wrote: >> On Wed, Nov 12, 2014 at 4:10 PM, Robert Haas wrote: >> > On Thu, Nov 6, 2014 at 9:50 AM, Peter Eisentraut wrote: >> >> If REINDEX cannot work without an exclusive lock, we should invent some >> >> other qualifier, like WITH FEWER LOCKS. >> > >> > What he said. > > I'm unconvinced. A *short* exclusive lock (just to update two pg_class > row), still gives most of the benefits of CONCURRENTLY. I am pretty doubtful about that. It's still going to require you to wait for all transactions to drain out of the table while new ones are blocked from entering. Which sucks. Unless all of your transactions are very short, but that's not necessarily typical. > The problem is that it's very hard to avoid the wrong index's > relfilenode being used when swapping the relfilenodes between two > indexes. How about storing both the old and new relfilenodes in the same pg_class entry? 1. Take a snapshot. 2. Index all the tuples in that snapshot. 3. Publish the new relfilenode to an additional pg_class column, relnewfilenode or similar. 4. Wait until everyone can see step #3. 5. Rescan the table and add any missing tuples to the index. 6. Set some flag in pg_class to mark the relnewfilenode as active and relfilenode as not to be used for queries. 7. Wait until everyone can see step #6. 8. Set some flag in pg_class to mark relfilenode as not even to be opened. 9. Wait until everyone can see step #8. 10. Drop old relfilenode. 11. Clean up by setting relfilenode = relnewfilenode, relfilenode = 0. This is basically CREATE INDEX CONCURRENTLY (without the first step where we out-wait people who might create now-invalid HOT chains, because that can't arise with a REINDEX of an existing index) plus DROP INDEX CONCURRENTLY. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Race condition between hot standby and restoring a FPW
On 11/12/14, 9:47 AM, Tom Lane wrote: Heikki Linnakangas writes: On 11/12/2014 05:20 PM, Tom Lane wrote: On reconsideration I think the "RBM_ZERO returns page already locked" alternative may be the less ugly. That has the advantage that any code that doesn't get updated will fail clearly and reliably. Yeah, I'm leaning to that approach as well. It's made more ugly by the fact that you sometimes need a cleanup lock on the buffer, so the caller will somehow need to specify whether to get a cleanup lock or a normal exclusive lock. Maybe add yet another mode, RBM_ZERO_WITH_CLEANUP_LOCK. Could also rename RBM_ZERO to e.g. RBM_ZERO_AND_LOCK, to make any code that's not updated to break even more obviously, at compile-time. Yeah, I was considering suggesting changing the name of the mode too. +1 for solving the lock-type problem with 2 modes. (You could also consider leaving RBM_ZERO in place with its current semantics, but I think what you've shown here is that there is no safe way to use it, so probably that's not what we should do.) If we're tweaking modes, can we avoid zeroing the buffer if we're about to dump a full page image into it? Presumably that means we'd have to keep the page locked until the image is written... -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On 11/12/14, 7:06 AM, Petr Jelinek wrote: - if the xid passed to get interface is out of range -infinity timestamp is returned (I think it's bad idea to throw errors here as the valid range is not static and same ID can start throwing errors between calls theoretically) Wouldn't NULL be more appropriate? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
Robert Haas writes: > I thought putting the partition boundaries into pg_inherits was a > strange choice. I'd put it in pg_class, or in pg_partition if we > decide to create that. Yeah. I rather doubt that we want this mechanism to be very closely tied to the existing inheritance features. If we do that, we are going to need a boatload of error checks to prevent people from breaking partitioned tables by applying the sort of twiddling that inheritance allows. > Maybe as anyarray, but I think pg_node_tree > might even be better. That can also represent data of some arbitrary > type, but it doesn't enforce that everything is uniform. Of course, the more general you make it, the more likely that it'll be impossible to optimize well. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add CREATE support to event triggers
On 2014-11-12 16:36:30 -0500, Robert Haas wrote: > On Mon, Nov 10, 2014 at 9:02 PM, Jim Nasby wrote: > > +1. Adding columns is a PITA, you have to manually ensure you do it on all > > slaves first. > > > > Drop is somewhat worse, because you have to do it on the master first, > > opposite of the (more usual) case of adding a column. > > > > RENAME is a complete disaster. > > > > Handing scripts to your replication system to execute isn't a very good > > alternative either; it assumes that you actually have a script (bad > > assumption with ORMs), and that you have a reasonable way to get that script > > to wherever you run your replication system. > > I don't disagree with any of that, but running the command on the > master and then propagating it to the slaves where it may succeed or > fail - and if it fails, you won't know unless you're watching the logs > on those machines, and, oh by the way, replication will also be broken > - is not good either. That's already the situation today with all the logical replication solutions. They *constantly* break in the field. Most commonly because of DDL differences. I don't understand why you think it's likely for logical replication to break due to this? You mean because deparse yielded a invalid statement? In a normal single master setup there really shouldn't be scenarios where that happens? Except bugs - but as you know we had more than in HS/SR as well? Or are you worried about stuff like ALTER TABLE ... USING()? I think that's the replication solution's job to take care of/prevent. For multimaster the situation is more complex, I agree, but I don't think in core stuff needs to solve that for now? We are thinking about extending 2PC to be usable across logical decoding. That's a relatively simple patch. Then it's possible to do the DDL on the primary, ship it to the standby, apply it there, and only afterwards commit the prepared xact if that was successfull. That's quite cool - but somewhat in the remit of the replication solution. > I think the approach to DDL > replication that Alvaro, Andres, et al. are proposing here is > absolutely fine - even praiseworthy - as an out-of-core solution that > users can adopt if they are willing to accept the associated risks, as > many users probably will be. But you wouldn't convince me to run it > on any production system for which I was responsible. The solution here doesn't force you to do that, does it? It's something that can be used by more than replication solution? I just don't see the alternative you're proposing? I've so far not even seen a credible *sketch* of an alternative design that also can handle ALTER. The only current alternatives are 1) the user inserts some events into the queue manually. If they depend on any local state you're screwed. If they have syntax errors they're often screwed. 2). The user does all actions on the standby first. Then on the primary. That's hard for ALTER ADD COLUMN and similar, and just about impossible for renaming things. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unintended restart after recovery error
Fujii Masao wrote: > On Wed, Nov 12, 2014 at 6:52 PM, Antonin Houska wrote: > > While looking at postmaster.c:reaper(), one problematic case occurred to me. > > > > > > 1. Startup process signals PMSIGNAL_RECOVERY_STARTED. > > > > 2. Checkpointer process is forked and immediately dies. > > > > 3. reaper() catches this failure, calls HandleChildCrash() and thus sets > > FatalError to true. > > > > 4. Startup process exits with non-zero status code too - either due to > > SIGQUIT > > received from HandleChildCrash or due to some other failure of the startup > > process itself. However, FatalError is already set, because of the previous > > crash of the checkpointer. Thus reaper() does not set RecoveryError. > > > > 5. As RecoverError failed to be set to true, postmaster will try to restart > > the cluster, although it apparently should not. > > Why shouldn't postmaster restart the cluster in that case? > At least for the behavior to be consistent with simpler cases of failed recovery (e.g. any FATAL error in StartupXLOG), which end up not restarting the cluster. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
On Mon, Nov 10, 2014 at 8:53 PM, Amit Langote wrote: > Above commands are merely transformed into ALTER TABLE subcommands that > arrange > partitioned table and partitions into inheritance hierarchy, but with extra > information, that is, allowed values for the partition in a new anyarray > column > called 'pg_inherits.values'. A special case of ATExecAddInherit() namely > ATExecAttachPartitionI(), as part of its processing, also adds partition > constraints in the form of appropriate CHECK constraints. So, a few of the > manual steps are automated and additional (IMHO non-opaque) metadata (namely > partition boundaries/list values) is added. I thought putting the partition boundaries into pg_inherits was a strange choice. I'd put it in pg_class, or in pg_partition if we decide to create that. Maybe as anyarray, but I think pg_node_tree might even be better. That can also represent data of some arbitrary type, but it doesn't enforce that everything is uniform. So you could have a list of objects of the form {RANGEPARTITION :lessthan {CONST ...} :partition 16982} or similar. The relcache could load that up and convert the list to a C array, which would then be easy to binary-search. As you say, you also need to store the relevant operator somewhere, and the fact that it's a range partition rather than list or hash, say. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reverse Engineering - search constraints are not explicitly stated in the tables from the VIEW
nill writes: > I am analyzing query plans generated by the view in the database PostgreSQL > 8.3, looking for missing information "constraints not explicitly registrants > in the tables." You realize of course that 8.3 is nearly 7 years old and has been out of support for awhile. > In the analysis of the query plan and its subplane, I can not understand > what the parameter $0 represents, without looking the SQL query. My question > is: looking only at the query plan product, you can understand what is the > parameter $0? It's a variable passed down from the outer query level. It's true that you can't tell which variable, in 8.3. Less obsolete versions produce more readable output though. (I won't claim it's perfect; we still don't try very hard to decompile ANY/ALL subplans.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On 2014-11-12 16:11:58 -0500, Robert Haas wrote: > On Wed, Nov 12, 2014 at 4:10 PM, Robert Haas wrote: > > On Thu, Nov 6, 2014 at 9:50 AM, Peter Eisentraut wrote: > >> If REINDEX cannot work without an exclusive lock, we should invent some > >> other qualifier, like WITH FEWER LOCKS. > > > > What he said. I'm unconvinced. A *short* exclusive lock (just to update two pg_class row), still gives most of the benefits of CONCURRENTLY. Also, I do think we can get rid of that period in the not too far away future. > But more to the point why, precisely, can't this work without an > AccessExclusiveLock? And can't we fix that instead of setting for > something clearly inferior? It's nontrivial to fix, but I think we can fix it at some point. I just think we should get the *major* part of the feature before investing lots of time making it even better. There's *very* frequent questions about having this. And people do really dangerous stuff (like manually updating pg_class.relfilenode and such) to cope. The problem is that it's very hard to avoid the wrong index's relfilenode being used when swapping the relfilenodes between two indexes. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add CREATE support to event triggers
On Mon, Nov 10, 2014 at 9:02 PM, Jim Nasby wrote: > +1. Adding columns is a PITA, you have to manually ensure you do it on all > slaves first. > > Drop is somewhat worse, because you have to do it on the master first, > opposite of the (more usual) case of adding a column. > > RENAME is a complete disaster. > > Handing scripts to your replication system to execute isn't a very good > alternative either; it assumes that you actually have a script (bad > assumption with ORMs), and that you have a reasonable way to get that script > to wherever you run your replication system. I don't disagree with any of that, but running the command on the master and then propagating it to the slaves where it may succeed or fail - and if it fails, you won't know unless you're watching the logs on those machines, and, oh by the way, replication will also be broken - is not good either. We would never have shipped physical replication solution with that kind of limitation. What has made streaming replication so popular and successful with PostgreSQL users over the last five years is that, while it's a bit of a pain to get set up, once you have it set up, it is rock-solid. If there were a series of legal SQL commands that you could execute without error on a cluster of servers connected by streaming replication such that, when you got done, replication was broken, our users would scream bloody murder, or just stop using PostgreSQL. I think the approach to DDL replication that Alvaro, Andres, et al. are proposing here is absolutely fine - even praiseworthy - as an out-of-core solution that users can adopt if they are willing to accept the associated risks, as many users probably will be. But you wouldn't convince me to run it on any production system for which I was responsible. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()
On Wed, Nov 12, 2014 at 12:59 PM, Robert Haas wrote: > I agree with your proposed approach to moving Levenshtein into core. > However, I think this should be separated into two patches, one of > them moving the Levenshtein functionality into core, and the other > adding the new treatment for missing column errors. If you can do > that relatively soon, I'll make an effort to get the refactoring patch > committed in the near future. Once that's done, we can focus in on > the interesting part of the patch, which is the actual machinery for > suggesting alternatives. Okay, thanks. I think I can do that fairly soon. > On that topic, I think there's unanimous consensus against the design > where equally-distant matches are treated differently based on whether > they are in the same RTE or different RTEs. I think you need to > change that if you want to get anywhere with this. Alright. It wasn't as if I felt very strongly about it either way. > On a related note, > the use of the additional parameter AttrNumber closest[2] to > searchRangeTableForCol() and of the additional parameters AttrNumber > *matchedatt and int *distance to scanRTEForColumn() is less than > self-documenting. I suggest creating a structure called something > like FuzzyAttrMatchState and passing a pointer to it down to both > functions. Sure. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On Wed, Nov 12, 2014 at 4:10 PM, Robert Haas wrote: > On Thu, Nov 6, 2014 at 9:50 AM, Peter Eisentraut wrote: >> If REINDEX cannot work without an exclusive lock, we should invent some >> other qualifier, like WITH FEWER LOCKS. > > What he said. But more to the point why, precisely, can't this work without an AccessExclusiveLock? And can't we fix that instead of setting for something clearly inferior? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On Thu, Nov 6, 2014 at 9:50 AM, Peter Eisentraut wrote: > If REINDEX cannot work without an exclusive lock, we should invent some > other qualifier, like WITH FEWER LOCKS. What he said. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error building the EnterpriseDB mysql_fdw on OSX
On Wed, Nov 12, 2014 at 3:48 PM, Kirk Roybal wrote: > I'm running 10.9.5 of OSX. > > I got the MySQL and PostgreSQL dependencies installed (I think). > > Checked out the git repo for mysql_fdw from > git://github.com/EnterpriseDB/mysql_fdw > > USE_PGXS=1 make > > and got the error: > > mysql_fdw.c > mysql_fdw.c:153:56: error: use of undeclared identifier 'RTLD_DEEPBIND' > mysql_dll_handle = dlopen(_MYSQL_LIBNAME, RTLD_LAZY | > RTLD_DEEPBIND); > > > Is RTLD_DEEPBIND supported on OSX? > > Did I do something wrong getting the dependencies together? I don't know whether the folks who work on that FDW read this list regularly, but they do keep an eye on issues opened in GitHub. So you might want to open one there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()
On Sun, Nov 9, 2014 at 11:48 PM, Peter Geoghegan wrote: > On Fri, Nov 7, 2014 at 12:57 PM, Robert Haas wrote: >> Based on this review from a month ago, I'm going to mark this Waiting >> on Author. If nobody updates the patch in a few days, I'll mark it >> Returned with Feedback. Thanks. > > Attached revision fixes the compiler warning that Heikki complained > about. I maintain SQL-callable stub functions from within contrib, > rather than follow Michael's approach. In other words, very little has > changed from my revision from July last [1]. I agree with your proposed approach to moving Levenshtein into core. However, I think this should be separated into two patches, one of them moving the Levenshtein functionality into core, and the other adding the new treatment for missing column errors. If you can do that relatively soon, I'll make an effort to get the refactoring patch committed in the near future. Once that's done, we can focus in on the interesting part of the patch, which is the actual machinery for suggesting alternatives. On that topic, I think there's unanimous consensus against the design where equally-distant matches are treated differently based on whether they are in the same RTE or different RTEs. I think you need to change that if you want to get anywhere with this. On a related note, the use of the additional parameter AttrNumber closest[2] to searchRangeTableForCol() and of the additional parameters AttrNumber *matchedatt and int *distance to scanRTEForColumn() is less than self-documenting. I suggest creating a structure called something like FuzzyAttrMatchState and passing a pointer to it down to both functions. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Error building the EnterpriseDB mysql_fdw on OSX
I'm running 10.9.5 of OSX. I got the MySQL and PostgreSQL dependencies installed (I think). Checked out the git repo for mysql_fdw from git://github.com/EnterpriseDB/mysql_fdw USE_PGXS=1 make and got the error: mysql_fdw.c mysql_fdw.c:153:56: error: use of undeclared identifier 'RTLD_DEEPBIND' mysql_dll_handle = dlopen(_MYSQL_LIBNAME, RTLD_LAZY | RTLD_DEEPBIND); Is RTLD_DEEPBIND supported on OSX? Did I do something wrong getting the dependencies together? /Kirk -- -- The best virus protection for Windows is fdisk, format, insert linux disk.
Re: [HACKERS] pg_prewarm really needs some CHECK_FOR_INTERRUPTS
On 2014-11-11 12:17:11 +0100, Andres Freund wrote: > pg_prewarm() currently can't be cannot be interrupted - which seems odd > given that it's intended to read large amounts of data from disk. A > rather slow process. > > Unless somebody protests I'm going to add a check to the top of each of > the three loops. Pushed to master and 9.4. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Reverse Engineering - search constraints are not explicitly stated in the tables from the VIEW
I am analyzing query plans generated by the view in the database PostgreSQL 8.3, looking for missing information "constraints not explicitly registrants in the tables." In nested queries, (ex. IN clause, ...), the query plan consist in the evaluation of the subplane derived from clause (SELECT * ) and external queries. In the present case: HashAggregate (cost=15.14..15.16 rows=1 width=247) -> Nested Loop IN Join (cost=3.46..15.12 rows=1 width=247) Join Filter: (o.c_doctype_id = c_doctype.c_doctype_id) -> Hash Left Join (cost=3.46..12.38 rows=1 width=247) Hash Cond: (bp.c_invoiceschedule_id = si.c_invoiceschedule_id)" Filter: ((o.invoicerule = 'I'::bpchar) OR ((o.invoicerule = 'O'::bpchar) AND (NOT (subplan))) OR ((o.invoicerule = 'D'::bpchar) AND (l.qtyinvoiced <> l.qtydelivered)) OR ((o.invoicerule = 'S'::bpchar) AND (bp.c_invoiceschedule_id IS NULL)) OR ((o.invoicerule = 'S'::bpchar) AND (bp.c_invoiceschedule_id IS NOT NULL) AND ((si.invoicefrequency IS NULL) OR (si.invoicefrequency = 'D'::bpchar) OR (si.invoicefrequency = 'W'::bpchar) OR ((si.invoicefrequency = 'T'::bpchar) AND (((adempiere.trunc((o.dateordered)::timestamp with time zone) <= (((adempiere.firstof(adempiere.getdate(), 'MM'::character varying))::timestamp with time zone OPERATOR(adempiere.+) si.invoicedaycutoff) - 1)) AND (adempiere.trunc(adempiere.getdate()) >= (((adempiere.firstof((o.dateordered)::timestamp with time zone, 'MM'::character varying))::timestamp with time zone OPERATOR(adempiere.+) si.invoiceday) - 1))) OR ((adempiere.trunc((o.dateordered)::timestamp with time zone) <= (((adempiere.firstof(adempiere.getdate(), 'MM'::character varying))::timestamp with time zone OPERATOR(adempiere.+) si.invoicedaycutoff) + 14)) AND (adempiere.trunc(adempiere.getdate()) >= (((adempiere.firstof((o.dateordered)::timestamp with time zone, 'MM'::character varying))::timestamp with time zone OPERATOR(adempiere.+) si.invoiceday) + 14) OR ((si.invoicefrequency = 'M'::bpchar) AND (adempiere.trunc((o.dateordered)::timestamp with time zone) <= (((adempiere.firstof(adempiere.getdate(), 'MM'::character varying))::timestamp with time zone OPERATOR(adempiere.+) si.invoicedaycutoff) - 1)) AND (adempiere.trunc(adempiere.getdate()) >= (((adempiere.firstof((o.dateordered)::timestamp with time zone, 'MM'::character varying))::timestamp with time zone OPERATOR(adempiere.+) si.invoiceday) - 1)) -> Hash Join (cost=2.44..3.87 rows=3 width=300) Hash Cond: (l.c_order_id = o.c_order_id) -> Seq Scan on c_orderline l (cost=0.00..1.31 rows=25 width=141) Filter: (qtyordered <> qtyinvoiced) -> Hash (cost=2.40..2.40 rows=3 width=172)" -> Hash Join (cost=1.13..2.40 rows=3 width=172) Hash Cond: (bp.c_bpartner_id = o.c_bpartner_id) -> Seq Scan on c_bpartner bp (cost=0.00..1.17 rows=17 width=26) -> Hash (cost=1.10..1.10 rows=3 width=159) -> Seq Scan on c_order o (cost=0.00..1.10 rows=3 width=159) Filter: (docstatus = ANY ('{CO,CL,IP}'::bpchar[])) -> Hash (cost=1.01..1.01 rows=1 width=47) -> Seq Scan on c_invoiceschedule si (cost=0.00..1.01 rows=1 width=47) SubPlan -> Seq Scan on c_orderline zz1 (cost=0.00..1.38 rows=1 width=0) Filter: ((qtyordered <> qtydelivered) AND (c_order_id = $0)) -> Seq Scan on c_doctype (cost=0.00..2.73 rows=1 width=13) Filter: ((c_doctype.docbasetype = 'SOO'::bpchar) AND (c_doctype.docsubtypeso <> ALL ('{ON,OB,WR}'::bpchar[]))) In the analysis of the query plan and its subplane, I can not understand what the parameter $0 represents, without looking the SQL query. My question is: looking only at the query plan product, you can understand what is the parameter $0? SELECT o.ad_client_id, o.ad_org_id, o.c_bpartner_id, o.c_order_id, o.documentno, o.dateordered, o.c_doctype_id, sum((l.qtyordered - l.qtyinvoiced) * l.priceactual) AS totallines FROM c_order o JOIN c_orderline l ON o.c_order_id = l.c_order_id JOIN c_bpartner bp ON o.c_bpartner_id = bp.c_bpartner_id LEFT JOIN c_invoiceschedule si ON bp.c_invoiceschedule_id = si.c_invoiceschedule_id WHERE (o.docstatus = ANY (ARRAY['CO'::bpchar, 'CL'::bpchar, 'IP'::bpchar])) AND (o.c_doctype_id IN ( SELECT c_doctype.c_doctype_id FROM c_doctype WHERE c_doctype.docbasetype = 'SOO'::bpchar AND (c_doctype.docsubtypeso <> ALL (ARRAY['ON'::bpchar, 'OB'::bpchar, 'WR'::bpchar] AND l.qtyordered <> l.qtyinvoiced AND (o.invoicerule = 'I'::bpchar OR o.invoicerule = 'O'::bpchar AND NOT (EXISTS ( SELECT 1 FROM c_orderline zz1 WHERE **zz1.c_order_id = o.c_order_id** AND zz1.qtyordered <> zz1.qtydeliv
Re: [HACKERS] group locking: incomplete patch, just for discussion
On Wed, Nov 12, 2014 at 2:57 AM, Jeff Davis wrote: > Trying to catch up on this thread, please excuse me if these questions > are already covered. Welcome to the party. The more, the merrier. :-) > You mention the possibility of undetected deadlocks, which is surely > unacceptable. But why not improve the deadlock detector? How bad are > _detected_ deadlocks? A lot of the concern was around catalog accesses, > but those lock requests would rarely wait anyway. Detected deadlocks are fine. Improving the deadlock detector is the heart of what needs to be done here. As you say, the lock requests we're talking about will rarely wait, so deadlocks won't be frequent. The issue is making sure that, if they do happen, we get a better behavior than "your parallel query hangs forever; good luck figuring out why". > I also wonder if group locking is generally the wrong approach to > parallelism. Parallel scan/sort should work by assigning workers to > chunks of data, and then merging the results. In principle the workers > don't need to touch the same data at all, so why are we trying so hard > to get them to all take the same locks? > > The reason, I assume, is that a full table is the lockable object, but > we are imagining the segment files as the chunks. But maybe there's a > way to address this more directly with an extra field in the lock tag, > and perhaps some catalog knowledge? In the common case, we're only taking AccessShareLock on some relation to prevent it from being concurrently dropped or rewritten. Locking only part of the relation wouldn't lead to any improvement in concurrency, because the full-relation lock isn't conflicting with anything that the partial-relation lock wouldn't also need to conflict with. More generally, I think there's some misunderstanding about the overall goal of the parallelism infrastructure that I'm trying to create. Many people have proposed interesting strategies for how we could do various things (locking, planning, execution) better. Some of those ideas are, doubtless, good ones. But my goal is in some ways the opposite: I'm trying to make it possible to run as much existing PostgreSQL backend code as possible inside a parallel worker without any modification. To do that, I'm trying to modify the subsystems that are widely used throughout the backend - such as locking - in such a way that the people using that infrastructure need not put conditional logic into their code to make it do one thing when in parallel mode and another thing when not in parallel mode. And I'm also trying to avoid fundamentally changing the way major subsystems work today as a precondition of parallelism. So I'm taking a very surgical approach to the patches I propose. I'm not proposing patches that do anything fundamentally new or different; instead, I'm proposing patches that let the stuff we already do continue to work. In a single backend executing a query, we have lots of code that assumes you can allocate memory, look data up in a syscache, throw an error, lock a buffer, examine the value of a GUC, test a tuple against the active snapshot, and so on. For parallel mode to be useful, those same things need to work in a parallel worker. We of course do not need every crazy thing somebody may want to do in a parallel worker, nor will it. But anything that's done in thousands of places throughout the code had better work, or parallelism will be so restricted as to be useless. So commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, for example, is about letting "ereport" work in a parallel worker, and this patch is about making things like "SearchSysCache1" and "PG_GETARG_TEXT_PP" work *reliably* in a parallel worker. Now, the time to do new and different things is coming, and is maybe even now not so very far away. I'm not an expert on parallel execution, and I'm sure there are a number of people far more skilled than I at figuring out how to do a parallel scan, join, sort, or whatever quickly. What I'm trying to do is get the infrastructure to a point where those people don't have to worry about whether the basic facilities that we rely on throughout the backend are gonna work. By way of comparison, think about the periodic discussions about making the backend multi-threaded. Since the backend relies on global variables in an enormous number of places, it isn't thread-safe. If you spawn a thread inside a PostgreSQL backend, it can't safely ereport(), palloc(), LWLockAcquire(), or just about anything else, which means that, although anybody can write code that calls pthread_create(), it's not useful to do so because there's practically no existing backend code that you could safely call from the new thread. Using background worker processes eliminates a lot of those problems - palloc and lwlocks just work, for example - but not all of them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing l
Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback
Fujii Masao wrote: > --- 127,152 >When this option is used, pg_receivexlog will > report >a flush position to the server, indicating when each segment has > been >synchronized to disk so that the server can remove that segment if > it > ! is not otherwise needed. --synchronous option > must > ! be specified when making pg_receivexlog run as > ! synchronous standby by using replication slot. Otherwise WAL data > ! cannot be flushed frequently enough for this to work correctly. > > > Whitespace damage here. > + printf(_(" --synchronous flush transaction log in real > time\n")); "in real time" sounds odd. How about "flush transaction log immediately after writing", or maybe "have transaction log writes be synchronous". > --- 781,791 > now = feGetCurrentTimestamp(); > > /* > ! * Issue sync command as soon as there are WAL data which > ! * has not been flushed yet if synchronous option is true. >*/ > if (lastFlushPosition < blockpos && > ! walfile != -1 && synchronous) I'd put the "synchronous" condition first in the if(), and start the comment with it rather than putting it at the end. Both seem weird. > --- 793,807 > progname, current_walfile_name, > strerror(errno)); > goto error; > } > lastFlushPosition = blockpos; > ! > ! /* > ! * Send feedback so that the server sees the latest WAL > locations > ! * immediately if synchronous option is true. > ! */ > ! if (!sendFeedback(conn, blockpos, now, false)) > ! goto error; > ! last_status = now; I'm not clear about this comment .. why does it say "if synchronous option is true" when it's not checking the condition? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_time and standby_mode
On Wed, Nov 12, 2014 at 12:12 PM, Josh Berkus wrote: > On 11/07/2014 02:03 PM, Josh Berkus wrote: >> But, like I said, there's a serviceable workaround. > > Some update on this. We've seen a problem in production with this setup > which I can't reproduce as a test case, but which may jog Heikki's > memory for something to fix. > > 1. Recover master to 2014-11-10 12:10:00 > 2. Recover replica to 2014-11-10 12:10:00, >with pause_at_recovery_target > 3. reconfigure recovery.conf for streaming replication >and restart the replica > 4. get a fatal error for replication, because >the replica is ahead of the master on timeline1 > > What *appears* to be happening is that the pause_at_recovery_target, > followed by the restart, on the replica causes it to advance one commit > on timeline 1. But *not all the time*; this doesn't happen in my > pgbench-based tests. > > There's a workaround for the user (they just restore the replica to 5 > minutes earlier), but I'm thinking this is a minor bug somewhere. I'm not sure what's going on here, but keep in mind that when you restart the replica, it's going to back up to the most recent restartpoint and begin replication from there, not from the point it was at when you shut down. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] what does this mean: "running xacts with xcnt == 0"
On Wed, Nov 12, 2014 at 12:47 PM, Andres Freund wrote: >> >> > So maybe 'Encountered xl_running_xacts record with xcnt = 0.'? >> >> >> >> That's not very user-facing, is it -- I mean, why bother the user with >> >> the names of structs and members thereof? It seems better to describe >> >> what the condition is; something like "found point in time with no >> >> running transaction". Maybe "point in time" should be "WAL record" >> >> instead. >> > >> > Is that really a win in clarity? When analyzing a problem I'd much >> > rather have a concrete hint than something fuzzy. >> >> You can't phrase error messages in terms of internal concepts that 99% >> of users won't understand or care about. Like Peter says, user-facing >> error messages need to be written in English, not C. > > That's not the actual message, but an errdetail() - and lots of those > refer to internals? And it's not an error, but a log message? E.g. we > add error contexts for wal replay errors that print the internals > literaly? And it's *really* helpful? Like what? That's the only errdetail() currently in the tree that contains ==, for example. I see there are three different detail messages associated with this error message. You don't really need these messages to be phrased in any; it's enough to have the messages be different from each other. 1. found initial snapshot in snapbuild file 2. Transaction ID %u finished; no more running transactions. 3. running xacts with xcnt == 0 The second one follows style guidelines, but the other two do not. I suggest: 1. Logical decoding will begin using saved snapshot. 2. Transaction ID %u finished; no more running transactions. 3. There are no running transactions. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alter user set local_preload_libraries.
On Wed, Nov 5, 2014 at 1:22 AM, Peter Eisentraut wrote: > On 10/9/14 1:58 PM, Fujii Masao wrote: >> Also I think that it's useful to allow ALTER ROLE/DATABASE SET to >> set PGC_BACKEND and PGC_SU_BACKEND parameters. So, what >> about applying the attached patch? This patch allows that and >> changes the context of session_preload_libraries to PGC_SU_BACKEND. > > After looking through this again, I wonder whether there is any reason > why ignore_system_indexes cannot be plain PGC_USERSET. With this > change, we'd allow setting it via ALTER ROLE, but the access to > pg_db_role_setting happens before it. Even without the patch, we can set ignore_system_indexes at the startup of the connection because it's defined with PGC_BACKEND context, but the access to system tables can happen before that. Am I missing something? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Log inability to lock pages during vacuum
On Wed, Nov 12, 2014 at 12:37 PM, Andres Freund wrote: > Stop overdesigning this. > > Add it to the existing mesage and let us be done with this. This thread > has already wasted far too much time. That's a little harsh, but I agree. Producing a warning here is just going to be log-spam. We've had this behavior for years and - to my knowledge - we have not got one complaint that can be attributed to this feature. What used to happen is that VACUUM would get stuck for minutes, hours, or days trying to vacuum a table because of an open cursor, and people did complain about that. It was a serious nuisance that is now gone. The entire argument in favor of some change here is "even though a careful theoretical analysis indicates that this is safe, even though it solved a real problem that was hurting our users, and even though we have no evidence whatsoever that the change is hurting anybody in any way whatsoever, the lack of instrumentation means that it could possibly be hurting somebody and we wouldn't know." That is, of course, quite true, but you could apply the same argument to lots of patches. It would usually be a waste of time because most of the patches we think are good actually ARE good - but of course, every once in a while you would find a real problem that had been overlooked. Maybe the right thing here is not to make any change to the source code *at all* but to patch his own copy and try to come up with a reproducible test case where this is actually a problem. Then, if there is a problem, we could actually fix it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] what does this mean: "running xacts with xcnt == 0"
On 2014-11-12 12:40:41 -0500, Robert Haas wrote: > On Wed, Nov 12, 2014 at 10:59 AM, Andres Freund > wrote: > > On 2014-11-12 11:56:01 -0300, Alvaro Herrera wrote: > >> Andres Freund wrote: > >> > Hi, > >> > > >> > On 2014-11-12 09:03:40 -0500, Peter Eisentraut wrote: > >> > > Could someone translate this detail message to English: > >> > > > >> > > ereport(LOG, > >> > > (errmsg("logical decoding found consistent point at > >> > > %X/%X", > >> > > (uint32) (lsn >> 32), (uint32) lsn), > >> > > errdetail("running xacts with xcnt == 0"))); > >> > > >> > It means there a xl_running_xacts record was encountered that had xcnt = > >> > 0 - allowing logical decoding to find a consistent start point > >> > > >> > > (or downgrade to debug message, if appropriate)? > >> > > >> > The message generally is quite relevant, as the process of finding a > >> > consistent start point can take quite a while. we don't really have a > >> > nice way to make errdetail() only be logged on a certain severity level > >> > as far as I am aware off. > >> > >> Can we do just the errmsg() and remove with the errdetail? > > > > No, I really don't want to do that. When trying to see whether logical > > replication started that's imo quite an importantdetail. Especially when > > first seing > > ereport(LOG, > > (errmsg("logical decoding found initial starting > > point at %X/%X", > > (uint32) (lsn >> 32), (uint32) lsn), > > errdetail_plural("%u transaction needs to finish.", > > "%u transactions > > need to finish.", > > > > builder->running.xcnt, > > (uint32) > > builder->running.xcnt))); > > > > Btw, Peter, why did you add a (uint32) to one, but not both, > > builder->running.xcnt references? > > > >> > So maybe 'Encountered xl_running_xacts record with xcnt = 0.'? > >> > >> That's not very user-facing, is it -- I mean, why bother the user with > >> the names of structs and members thereof? It seems better to describe > >> what the condition is; something like "found point in time with no > >> running transaction". Maybe "point in time" should be "WAL record" > >> instead. > > > > Is that really a win in clarity? When analyzing a problem I'd much > > rather have a concrete hint than something fuzzy. > > You can't phrase error messages in terms of internal concepts that 99% > of users won't understand or care about. Like Peter says, user-facing > error messages need to be written in English, not C. That's not the actual message, but an errdetail() - and lots of those refer to internals? And it's not an error, but a log message? E.g. we add error contexts for wal replay errors that print the internals literaly? And it's *really* helpful? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] what does this mean: "running xacts with xcnt == 0"
On Wed, Nov 12, 2014 at 10:59 AM, Andres Freund wrote: > On 2014-11-12 11:56:01 -0300, Alvaro Herrera wrote: >> Andres Freund wrote: >> > Hi, >> > >> > On 2014-11-12 09:03:40 -0500, Peter Eisentraut wrote: >> > > Could someone translate this detail message to English: >> > > >> > > ereport(LOG, >> > > (errmsg("logical decoding found consistent point at >> > > %X/%X", >> > > (uint32) (lsn >> 32), (uint32) lsn), >> > > errdetail("running xacts with xcnt == 0"))); >> > >> > It means there a xl_running_xacts record was encountered that had xcnt = >> > 0 - allowing logical decoding to find a consistent start point >> > >> > > (or downgrade to debug message, if appropriate)? >> > >> > The message generally is quite relevant, as the process of finding a >> > consistent start point can take quite a while. we don't really have a >> > nice way to make errdetail() only be logged on a certain severity level >> > as far as I am aware off. >> >> Can we do just the errmsg() and remove with the errdetail? > > No, I really don't want to do that. When trying to see whether logical > replication started that's imo quite an importantdetail. Especially when > first seing > ereport(LOG, > (errmsg("logical decoding found initial starting > point at %X/%X", > (uint32) (lsn >> 32), (uint32) lsn), > errdetail_plural("%u transaction needs to finish.", > "%u transactions > need to finish.", > > builder->running.xcnt, > (uint32) > builder->running.xcnt))); > > Btw, Peter, why did you add a (uint32) to one, but not both, > builder->running.xcnt references? > >> > So maybe 'Encountered xl_running_xacts record with xcnt = 0.'? >> >> That's not very user-facing, is it -- I mean, why bother the user with >> the names of structs and members thereof? It seems better to describe >> what the condition is; something like "found point in time with no >> running transaction". Maybe "point in time" should be "WAL record" >> instead. > > Is that really a win in clarity? When analyzing a problem I'd much > rather have a concrete hint than something fuzzy. You can't phrase error messages in terms of internal concepts that 99% of users won't understand or care about. Like Peter says, user-facing error messages need to be written in English, not C. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
On Wed, Nov 12, 2014 at 11:36 AM, Robert Haas wrote: > On Wed, Nov 12, 2014 at 11:19 AM, Andres Freund > wrote: >> The question is whether the library is actually loaded in that case? >> Because that normally only happens early during startup - which is why >> it's a PGC_BACKEND guc. > > It looks like that does not work. > > [rhaas pgsql]$ PGOPTIONS='-c local_preload_libraries=auto_explain' psql > psql (9.5devel) > Type "help" for help. > > rhaas=# select * from pg_background_result(pg_background_launch('show > auto_explain.log_min_duration')) as (x text); > ERROR: unrecognized configuration parameter "auto_explain.log_min_duration" > CONTEXT: background worker, pid 31316 > > So, there's more to be done here. Rats. It turned out to be quite simple to fix both problems. This particular case fails because the call that loads the libraries specified by session_preload_libraries and local_preload_libraries is in PostgresMain() and thus never gets called by pg_background. I fixed that by adding that call to pg_background in the appropriate place. While I was at it, I added the REVOKE statements we discussed earlier to pg_background's .sql file. The other problem was due to this code in set_config_option: /* * If a PGC_BACKEND or PGC_SU_BACKEND parameter is changed in * the config file, we want to accept the new value in the * postmaster (whence it will propagate to * subsequently-started backends), but ignore it in existing * backends. This is a tad klugy, but necessary because we * don't re-read the config file during backend start. * * In EXEC_BACKEND builds, this works differently: we load all * nondefault settings from the CONFIG_EXEC_PARAMS file during * backend start. In that case we must accept PGC_SIGHUP * settings, so as to have the same value as if we'd forked * from the postmaster. We detect this situation by checking * IsInitProcessingMode, which is a bit ugly, but it doesn't * seem worth passing down an explicit flag saying we're doing * read_nondefault_variables(). */ #ifdef EXEC_BACKEND if (IsUnderPostmaster && !IsInitProcessingMode()) return -1; #else if (IsUnderPostmaster) return -1; #endif When restoring variables via RestoreGUCState(), we need the same kind of special-handling that we do when running in EXEC_BACKEND mode and restoring variables via read_nondefault_variables(). Extending the IsInitProcessingMode kludge() doesn't look appealing, so I instead added the flag contemplated by the comment. Updated patches attached. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company commit dc8980bdc9be89acf6d61bbfa5b1d7391992e8e8 Author: Robert Haas Date: Thu Jul 17 07:58:32 2014 -0400 Add infrastructure to save and restore GUC values. Amit Khandekar, Noah Misch, Robert Haas V2: Fix misuse of NULL vs. NUL/zero-byte. V2: Check return value of set_config_option. V2: Add an explicit is_reload flag to set_config_option. diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c index 9a0afa4..6692bb5 100644 --- a/src/backend/commands/extension.c +++ b/src/backend/commands/extension.c @@ -814,11 +814,11 @@ execute_extension_script(Oid extensionOid, ExtensionControlFile *control, if (client_min_messages < WARNING) (void) set_config_option("client_min_messages", "warning", PGC_USERSET, PGC_S_SESSION, - GUC_ACTION_SAVE, true, 0); + GUC_ACTION_SAVE, true, 0, false); if (log_min_messages < WARNING) (void) set_config_option("log_min_messages", "warning", PGC_SUSET, PGC_S_SESSION, - GUC_ACTION_SAVE, true, 0); + GUC_ACTION_SAVE, true, 0, false); /* * Set up the search path to contain the target schema, then the schemas @@ -843,7 +843,7 @@ execute_extension_script(Oid extensionOid, ExtensionControlFile *control, (void) set_config_option("search_path", pathbuf.data, PGC_USERSET, PGC_S_SESSION, - GUC_ACTION_SAVE, true, 0); + GUC_ACTION_SAVE, true, 0, false); /* * Set creating_extension and related variables so that diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c index c0156fa..2f02303 100644 --- a/src/backend/utils/adt/ri_triggers.c +++ b/src/backend/utils/adt/ri_triggers
Re: [HACKERS] Proposal: Log inability to lock pages during vacuum
On 2014-11-12 11:34:04 -0600, Jim Nasby wrote: > On 11/11/14, 2:01 AM, Andres Freund wrote: > >On 2014-11-10 19:36:18 -0600, Jim Nasby wrote: > >>Towards that simple end, I'm a bit torn. My preference would be to > >>simply log, and throw a warning if it's over some threshold. I believe > >>that would give the best odds of getting feedback from users if this > >>isn't as uncommon as we think. > > > >I'm strongly against a warning. We have absolutely no sane way of tuning > >that. We'll just create a pointless warning that people will get > >confused about and that they'll have to live with till the next release. > > To clarify: I'm only suggesting we issue a warning if we have to skip some > significant number of pages; say 5 or 0.01% of the table, whichever is > greater. That's aimed directly at the goal of letting us know if this is > actually a problem or not. Meh. You have a 5 page relation and it'll trigger quite easily. And it's absolutely harmless. > The reason I'm inclined to do the warning is because I don't think people > will notice this otherwise. If this really isn't a problem then it won't > matter; if it's a *big* problem then we'll at least know about it. > > I'm thinking of an undocumented GUC to control the threshold, but I assume no > one else would be on board with that? Stop overdesigning this. Add it to the existing mesage and let us be done with this. This thread has already wasted far too much time. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Log inability to lock pages during vacuum
On 11/11/14, 2:01 AM, Andres Freund wrote: On 2014-11-10 19:36:18 -0600, Jim Nasby wrote: On 11/10/14, 12:56 PM, Andres Freund wrote: On 2014-11-10 12:37:29 -0600, Jim Nasby wrote: On 11/10/14, 12:15 PM, Andres Freund wrote: If what we want is to quantify the extent of the issue, would it be more convenient to save counters to pgstat? Vacuum already sends pgstat messages, so there's no additional traffic there. I'm pretty strongly against that one in isolation. They'd need to be stored somewhere and they'd need to be queryable somewhere with enough context to make sense. To actually make sense of the numbers we'd also need to report all the other datapoints of vacuum in some form. That's quite a worthwile project imo - but*much* *much* more work than this. We already report statistics on vacuums (lazy_vacuum_rel()/pgstat_report_vacuum), so this would just be adding 1 or 2 counters to that. Should we add the other counters from vacuum? That would be significantly more data. At the very least it'd require: * The number of buffers skipped due to the vm * The number of buffers actually scanned * The number of full table in contrast to partial vacuums If we're going to track full scan vacuums separately, I think we'd need two separate scan counters. Well, we already have the entire number of vacuums, so we'd have that. I mean number of pages scanned, but as I said below I don't think that's really necessary. I think (for pgstats) it'd make more sense to just count initial failure to acquire the lock in a full scan in the 'skipped page' counter. In terms of answering the question "how common is it not to get the lock", it's really the same event. It's absolutely not. You need to correlate the number of skipped pages to the number of vacuumed pages. If you have 100k skipped in 10 billion total scanned pages it's something entirely different than 100k in 200k pages. If the goal here is to find out if this even is a problem then I think the critical question is not "did we vacuum", but "were we able to acquire the lock on the first try". Obviously users will care much more about the vacuuming and not so much about the lock; but if this really is a non-issue as most tend to believe I don't think it's worth worrying about any of this (except perhaps putting dtrace/system tap probes in). Honestly, my desire at this point is just to see if there's actually a problem. Many people are asserting that this should be a very rare occurrence, but there's no way to know. Ok. Towards that simple end, I'm a bit torn. My preference would be to simply log, and throw a warning if it's over some threshold. I believe that would give the best odds of getting feedback from users if this isn't as uncommon as we think. I'm strongly against a warning. We have absolutely no sane way of tuning that. We'll just create a pointless warning that people will get confused about and that they'll have to live with till the next release. To clarify: I'm only suggesting we issue a warning if we have to skip some significant number of pages; say 5 or 0.01% of the table, whichever is greater. That's aimed directly at the goal of letting us know if this is actually a problem or not. The reason I'm inclined to do the warning is because I don't think people will notice this otherwise. If this really isn't a problem then it won't matter; if it's a *big* problem then we'll at least know about it. I'm thinking of an undocumented GUC to control the threshold, but I assume no one else would be on board with that? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
On 2014-11-12 11:36:14 -0500, Robert Haas wrote: > On Wed, Nov 12, 2014 at 11:19 AM, Andres Freund > wrote: > > The question is whether the library is actually loaded in that case? > > Because that normally only happens early during startup - which is why > > it's a PGC_BACKEND guc. > > It looks like that does not work. > > [rhaas pgsql]$ PGOPTIONS='-c local_preload_libraries=auto_explain' psql > psql (9.5devel) > Type "help" for help. > > rhaas=# select * from pg_background_result(pg_background_launch('show > auto_explain.log_min_duration')) as (x text); > ERROR: unrecognized configuration parameter "auto_explain.log_min_duration" > CONTEXT: background worker, pid 31316 > > So, there's more to be done here. Rats. We could just say having PGC_BACKEND guc's that aren't set to their config files aren't supported for anything parallel. I find that a reasonable thing - otherwise pooling of bgworkers and all that will likely be out. And it's not that there are that many important PGC_BACKEND gucs. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_time and standby_mode
On 11/07/2014 02:03 PM, Josh Berkus wrote: > But, like I said, there's a serviceable workaround. Some update on this. We've seen a problem in production with this setup which I can't reproduce as a test case, but which may jog Heikki's memory for something to fix. 1. Recover master to 2014-11-10 12:10:00 2. Recover replica to 2014-11-10 12:10:00, with pause_at_recovery_target 3. reconfigure recovery.conf for streaming replication and restart the replica 4. get a fatal error for replication, because the replica is ahead of the master on timeline1 What *appears* to be happening is that the pause_at_recovery_target, followed by the restart, on the replica causes it to advance one commit on timeline 1. But *not all the time*; this doesn't happen in my pgbench-based tests. There's a workaround for the user (they just restore the replica to 5 minutes earlier), but I'm thinking this is a minor bug somewhere. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback
On Mon, Nov 10, 2014 at 7:19 PM, wrote: >> On Fri, Oct 31, 2014 at 5:46 PM, wrote: >> >> > We seem to be going in circles. You suggested having two options, >> >> > --feedback, and --fsync, which is almost exactly what Furuya posted >> >> > originally. I objected to that, because I think that user interface >> >> is >> >> > too complicated. Instead, I suggested having just a single option >> >> > called --synchronous, or even better, have no option at all and >> >> > have the server tell the client if it's participating in >> >> > synchronous replication, and have pg_receivexlog automatically >> >> > fsync when it is, and not otherwise [1]. That way you don't need >> to >> >> > expose any new options to the user. What did you think of that idea? >> >> >> >> I think it's pretty weird to make the fsync behavior of the client >> is >> >> controlled by the server. >> >> >> >> I also don't think that fsync() on the client side is useless in >> >> asynchronous replication. Yeah, it's true that there are no >> >> *guarantees* with asynchronous replication, but the bound on how long >> >> the data can take to get out to disk is a heck of a lot shorter if >> >> you fsync frequently than if you don't. And on the flip side, that >> >> has a performance impact. >> >> >> >> So I actually think the design you proposed is not as good as what >> >> was proposed by Furuya and Simon. But I don't feel incredibly >> >> strongly about it. >> > >> > Thanks for lots of comments!! >> > >> > I fixed the patch. >> > As a default, it behave like a walreceiver. >> > Same as walreceiver, it fsync and send a feedback after fsync. >> >> On second thought, flipping the default behavior seems not worthwhile >> here. >> Which might surprise the existing users and cause some troubles to them. >> I'd like to back to the Heikki's original suggestion like just adding >> --synchronous option. That is, only when --synchronous is specified, WAL >> is flushed and feedback is sent back as soon as WAL is received. >> >> I changed the patch heavily in that way. Please find the attached patch. >> By default, pg_receivexlog flushes WAL data only when WAL file is closed. >> If --synchronous is specified, like the standby's WAL receiver, sync >> commands are issued as soon as there is WAL data which has not been flushed >> yet. Also status packets are sent back to the server just after WAL data >> is flushed whatever --status-interval is set to. I added the description >> of this behavior to the doc. >> >> Thought? > > I think it's as you pointed out. > Thank you for the patch. > I did a review of the patch. > There was no problem. So I'm thinking to push the patch. Does anyone object to that? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
On Wed, Nov 12, 2014 at 11:19 AM, Andres Freund wrote: > The question is whether the library is actually loaded in that case? > Because that normally only happens early during startup - which is why > it's a PGC_BACKEND guc. It looks like that does not work. [rhaas pgsql]$ PGOPTIONS='-c local_preload_libraries=auto_explain' psql psql (9.5devel) Type "help" for help. rhaas=# select * from pg_background_result(pg_background_launch('show auto_explain.log_min_duration')) as (x text); ERROR: unrecognized configuration parameter "auto_explain.log_min_duration" CONTEXT: background worker, pid 31316 So, there's more to be done here. Rats. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
On 2014-11-12 11:15:26 -0500, Robert Haas wrote: > On Mon, Nov 10, 2014 at 4:58 PM, Andres Freund wrote: > > No, that's not what I was thinking of. Consider: > > > > export pg_libdir=$(pg_config --libdir) > > mkdir $pg_libdir/plugins > > ln -s $pg_libdir/auto_explain.so $pg_libdir/plugins/ > > PG_OPTIONS='-c local_preload_libraries=auto_explain' psql > > > > and use pg_background() (or something else using this infrastructure) in > > that context. > > Without having reread your relevant code I'd expect a: > > ERROR: 55P02: parameter "local_preload_libraries" cannot be set after > > connection start > > because the code would try to import the "user session's" > > local_preload_libraries values into the background process - after that > > actually already has done its initialization. > > Thanks for these pointers. Your directions turn out to be wrong in > detail, because it's PGOPTIONS not PG_OPTIONS, and the plugins > directory needs to be under $(pg_config --libdir)/postgresql, not just > $(pg_config --libdir). But it was enough to get me on the right > track. Long story short, it seems to work: Sorry, was extracting it from a larger local script... > [rhaas ~]$ PGOPTIONS='-c local_preload_libraries=auto_explain' psql > psql (9.5devel) > Type "help" for help. > > rhaas=# show local_preload_libraries; > local_preload_libraries > - > auto_explain > (1 row) > > rhaas=# select * from pg_background_result(pg_background_launch('show > local_preload_libraries')) as (x text); > x > -- > auto_explain > (1 row) The question is whether the library is actually loaded in that case? Because that normally only happens early during startup - which is why it's a PGC_BACKEND guc. > > How do you define 'SAFE' here? Safe against concurrent SQL level > > activity? Safe against receiving arbitrary data? > > The first one. The code powering the background worker is just as > trustworthy as any other part of the backend, so it can be subverted > if you load malicious C code into the backend, but not otherwise. Ah, good. Because the latter sounds quite hard, if not impossible lest we recreate send/recv ;) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
On Mon, Nov 10, 2014 at 4:58 PM, Andres Freund wrote: > No, that's not what I was thinking of. Consider: > > export pg_libdir=$(pg_config --libdir) > mkdir $pg_libdir/plugins > ln -s $pg_libdir/auto_explain.so $pg_libdir/plugins/ > PG_OPTIONS='-c local_preload_libraries=auto_explain' psql > > and use pg_background() (or something else using this infrastructure) in > that context. > Without having reread your relevant code I'd expect a: > ERROR: 55P02: parameter "local_preload_libraries" cannot be set after > connection start > because the code would try to import the "user session's" > local_preload_libraries values into the background process - after that > actually already has done its initialization. Thanks for these pointers. Your directions turn out to be wrong in detail, because it's PGOPTIONS not PG_OPTIONS, and the plugins directory needs to be under $(pg_config --libdir)/postgresql, not just $(pg_config --libdir). But it was enough to get me on the right track. Long story short, it seems to work: [rhaas ~]$ PGOPTIONS='-c local_preload_libraries=auto_explain' psql psql (9.5devel) Type "help" for help. rhaas=# show local_preload_libraries; local_preload_libraries - auto_explain (1 row) rhaas=# select * from pg_background_result(pg_background_launch('show local_preload_libraries')) as (x text); x -- auto_explain (1 row) This is what I expected, and I think what we want. Background workers enlisted for parallelism (or pg_background) need to be able to have the same values for GUCs as the user backend that started them, and our definition of setting things "at backend start" needs to be expansive enough to include stuff we pull out of a dynamic shared memory segment shortly thereafter. >> It should end up with the same values there as the active session, not >> the current one from postgresql.conf. But we want to verify that's >> the behavior we actually get. Right? > > But that's also something worthwile to check. Mmph. There's a problem here. I tried changing local_preload_libraries='auto_explain' in postgresql.conf and then sending PGC_SIGHUP. A session started before that change does indeed create a worker with that value still empty, but a session started after that change also creates a worker with that value still empty. Oops. Not sure why that's happening yet, will investigate. >> What would be even better is to find some way to MAKE IT SAFE to send >> the undecoded tuple. I'm not sure what that would look like. > > How do you define 'SAFE' here? Safe against concurrent SQL level > activity? Safe against receiving arbitrary data? The first one. The code powering the background worker is just as trustworthy as any other part of the backend, so it can be subverted if you load malicious C code into the backend, but not otherwise. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unintended restart after recovery error
On Wed, Nov 12, 2014 at 6:52 PM, Antonin Houska wrote: > While looking at postmaster.c:reaper(), one problematic case occurred to me. > > > 1. Startup process signals PMSIGNAL_RECOVERY_STARTED. > > 2. Checkpointer process is forked and immediately dies. > > 3. reaper() catches this failure, calls HandleChildCrash() and thus sets > FatalError to true. > > 4. Startup process exits with non-zero status code too - either due to SIGQUIT > received from HandleChildCrash or due to some other failure of the startup > process itself. However, FatalError is already set, because of the previous > crash of the checkpointer. Thus reaper() does not set RecoveryError. > > 5. As RecoverError failed to be set to true, postmaster will try to restart > the cluster, although it apparently should not. Why shouldn't postmaster restart the cluster in that case? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] what does this mean: "running xacts with xcnt == 0"
On 2014-11-12 11:56:01 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > Hi, > > > > On 2014-11-12 09:03:40 -0500, Peter Eisentraut wrote: > > > Could someone translate this detail message to English: > > > > > > ereport(LOG, > > > (errmsg("logical decoding found consistent point at > > > %X/%X", > > > (uint32) (lsn >> 32), (uint32) lsn), > > > errdetail("running xacts with xcnt == 0"))); > > > > It means there a xl_running_xacts record was encountered that had xcnt = > > 0 - allowing logical decoding to find a consistent start point > > > > > (or downgrade to debug message, if appropriate)? > > > > The message generally is quite relevant, as the process of finding a > > consistent start point can take quite a while. we don't really have a > > nice way to make errdetail() only be logged on a certain severity level > > as far as I am aware off. > > Can we do just the errmsg() and remove with the errdetail? No, I really don't want to do that. When trying to see whether logical replication started that's imo quite an importantdetail. Especially when first seing ereport(LOG, (errmsg("logical decoding found initial starting point at %X/%X", (uint32) (lsn >> 32), (uint32) lsn), errdetail_plural("%u transaction needs to finish.", "%u transactions need to finish.", builder->running.xcnt, (uint32) builder->running.xcnt))); Btw, Peter, why did you add a (uint32) to one, but not both, builder->running.xcnt references? > > So maybe 'Encountered xl_running_xacts record with xcnt = 0.'? > > That's not very user-facing, is it -- I mean, why bother the user with > the names of structs and members thereof? It seems better to describe > what the condition is; something like "found point in time with no > running transaction". Maybe "point in time" should be "WAL record" > instead. Is that really a win in clarity? When analyzing a problem I'd much rather have a concrete hint than something fuzzy. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Race condition between hot standby and restoring a FPW
Heikki Linnakangas writes: > On 11/12/2014 05:20 PM, Tom Lane wrote: >> On reconsideration I think the "RBM_ZERO returns page already locked" >> alternative may be the less ugly. That has the advantage that any code >> that doesn't get updated will fail clearly and reliably. > Yeah, I'm leaning to that approach as well. It's made more ugly by the > fact that you sometimes need a cleanup lock on the buffer, so the caller > will somehow need to specify whether to get a cleanup lock or a normal > exclusive lock. Maybe add yet another mode, RBM_ZERO_WITH_CLEANUP_LOCK. > Could also rename RBM_ZERO to e.g. RBM_ZERO_AND_LOCK, to make any code > that's not updated to break even more obviously, at compile-time. Yeah, I was considering suggesting changing the name of the mode too. +1 for solving the lock-type problem with 2 modes. (You could also consider leaving RBM_ZERO in place with its current semantics, but I think what you've shown here is that there is no safe way to use it, so probably that's not what we should do.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Race condition between hot standby and restoring a FPW
On 11/12/2014 05:20 PM, Tom Lane wrote: Heikki Linnakangas writes: On 11/12/2014 04:56 PM, Tom Lane wrote: Not great either. What about an RBM_NOERROR mode that is like RBM_ZERO in terms of handling error conditions, but does not forcibly zero the page if it's already valid? Anyway, you don't want to read the page from disk, just to check if it's already valid. Oh, good point. (Note that when the page is already in the buffer-cache, RBM_ZERO already doesn't zero the page. So this race condition only happens when the page isn't in the buffer cache yet). Right. On reconsideration I think the "RBM_ZERO returns page already locked" alternative may be the less ugly. That has the advantage that any code that doesn't get updated will fail clearly and reliably. Yeah, I'm leaning to that approach as well. It's made more ugly by the fact that you sometimes need a cleanup lock on the buffer, so the caller will somehow need to specify whether to get a cleanup lock or a normal exclusive lock. Maybe add yet another mode, RBM_ZERO_WITH_CLEANUP_LOCK. Could also rename RBM_ZERO to e.g. RBM_ZERO_AND_LOCK, to make any code that's not updated to break even more obviously, at compile-time. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Race condition between hot standby and restoring a FPW
On 11/12/2014 05:08 PM, Robert Haas wrote: On Wed, Nov 12, 2014 at 7:39 AM, Heikki Linnakangas wrote: 2. When ReadBufferExtended doesn't find the page in cache, it returns the buffer in !BM_VALID state (i.e. still in I/O in-progress state). Require the caller to call a second function, after locking the page, to finish the I/O. This seems like a reasonable approach. If you tilt your head the right way, zeroing a page and restoring a backup block are the same thing: either way, you want to "read" the block into shared buffers without actually reading it, so that you can overwrite the prior contents with something else. So, you could fix this by adding a new mode, RBM_OVERWRITE, and passing the new page contents as an additional argument to ReadBufferExtended, which would then memcpy() that data into place where RBM_ZERO calls MemSet() to zero it. Yes, that would be quite a clean API. However, there's a problem with locking, when the redo routine modifies multiple pages. Currently, you lock the page first, and replace the page with the new contents while holding the lock. With RBM_OVERWRITE, the new page contents would sneak into the buffer before RestoreBackupBlock has acquired the lock on the page, and another backend might pin and lock the page before RestoreBackupBlock does. The page contents would be valid, but they might not be consistent with other buffers yet. The redo routine might be doing an atomic operation that spans multiple pages, by holding the locks on all the pages until it's finished with all the changes, but the backend would see a partial result. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Race condition between hot standby and restoring a FPW
Heikki Linnakangas writes: > On 11/12/2014 04:56 PM, Tom Lane wrote: >> Not great either. What about an RBM_NOERROR mode that is like RBM_ZERO >> in terms of handling error conditions, but does not forcibly zero the page >> if it's already valid? > Anyway, you don't want to read the page from disk, just to check if it's > already valid. Oh, good point. > (Note that when the page is already in the buffer-cache, RBM_ZERO > already doesn't zero the page. So this race condition only happens when > the page isn't in the buffer cache yet). Right. On reconsideration I think the "RBM_ZERO returns page already locked" alternative may be the less ugly. That has the advantage that any code that doesn't get updated will fail clearly and reliably. With the other thing, you need some additional back channel for the caller to know whether it must complete the I/O, and it's not obvious what will happen if it fails to do that (except that it will be bad). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On 2014-11-12 10:13:18 -0500, Robert Haas wrote: > On Tue, Nov 11, 2014 at 4:27 AM, Andres Freund wrote: > > The more important thing here is that I see little chance of this > > getting in before Heikki's larger rework of the wal format gets > > in. Since that'll change everything around anyay I'm unsure how much > > point there is to iterate till that's done. I know that sucks, but I > > don't see much of an alternative. > > Why not do this first? Heikki's patch seems quite far from being > ready to commit at this point - it significantly increases WAL volume > and reduces performance. Heikki may well be able to fix that, but I > don't know that it's a good idea to make everyone else wait while he > does. Because it imo builds the infrastructure to do the compression more sanely. I.e. provide proper space to store information about the compressedness of the blocks and such. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Race condition between hot standby and restoring a FPW
On 11/12/2014 04:56 PM, Tom Lane wrote: Heikki Linnakangas writes: There's a race condition between a backend running queries in hot standby mode, and restoring a full-page image from a WAL record. It's present in all supported versions. I can think of two ways to fix this: 1. Have ReadBufferExtended lock the page in RBM_ZERO mode, before returning it. That makes the API inconsistent, as the function would sometimes lock the page, and sometimes not. Ugh. 2. When ReadBufferExtended doesn't find the page in cache, it returns the buffer in !BM_VALID state (i.e. still in I/O in-progress state). Require the caller to call a second function, after locking the page, to finish the I/O. Not great either. What about an RBM_NOERROR mode that is like RBM_ZERO in terms of handling error conditions, but does not forcibly zero the page if it's already valid? Isn't that exactly what RBM_ZERO_ONERROR does? Anyway, you don't want to read the page from disk, just to check if it's already valid. We stopped doing that in 8.2 (commit 8c3cc86e7b688b0efe5ec6ce4f4342c2883b1db5), and it gave a big speedup to recovery. (Note that when the page is already in the buffer-cache, RBM_ZERO already doesn't zero the page. So this race condition only happens when the page isn't in the buffer cache yet). - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Tue, Nov 11, 2014 at 4:27 AM, Andres Freund wrote: > The more important thing here is that I see little chance of this > getting in before Heikki's larger rework of the wal format gets > in. Since that'll change everything around anyay I'm unsure how much > point there is to iterate till that's done. I know that sucks, but I > don't see much of an alternative. Why not do this first? Heikki's patch seems quite far from being ready to commit at this point - it significantly increases WAL volume and reduces performance. Heikki may well be able to fix that, but I don't know that it's a good idea to make everyone else wait while he does. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Race condition between hot standby and restoring a FPW
On Wed, Nov 12, 2014 at 7:39 AM, Heikki Linnakangas wrote: > 2. When ReadBufferExtended doesn't find the page in cache, it returns the > buffer in !BM_VALID state (i.e. still in I/O in-progress state). Require the > caller to call a second function, after locking the page, to finish the I/O. This seems like a reasonable approach. If you tilt your head the right way, zeroing a page and restoring a backup block are the same thing: either way, you want to "read" the block into shared buffers without actually reading it, so that you can overwrite the prior contents with something else. So, you could fix this by adding a new mode, RBM_OVERWRITE, and passing the new page contents as an additional argument to ReadBufferExtended, which would then memcpy() that data into place where RBM_ZERO calls MemSet() to zero it. I'm not sure whether we want to complicate the API that way, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Race condition between hot standby and restoring a FPW
Heikki Linnakangas writes: > There's a race condition between a backend running queries in hot > standby mode, and restoring a full-page image from a WAL record. It's > present in all supported versions. > I can think of two ways to fix this: > 1. Have ReadBufferExtended lock the page in RBM_ZERO mode, before > returning it. That makes the API inconsistent, as the function would > sometimes lock the page, and sometimes not. Ugh. > 2. When ReadBufferExtended doesn't find the page in cache, it returns > the buffer in !BM_VALID state (i.e. still in I/O in-progress state). > Require the caller to call a second function, after locking the page, to > finish the I/O. Not great either. What about an RBM_NOERROR mode that is like RBM_ZERO in terms of handling error conditions, but does not forcibly zero the page if it's already valid? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] what does this mean: "running xacts with xcnt == 0"
Andres Freund wrote: > Hi, > > On 2014-11-12 09:03:40 -0500, Peter Eisentraut wrote: > > Could someone translate this detail message to English: > > > > ereport(LOG, > > (errmsg("logical decoding found consistent point at %X/%X", > > (uint32) (lsn >> 32), (uint32) lsn), > > errdetail("running xacts with xcnt == 0"))); > > It means there a xl_running_xacts record was encountered that had xcnt = > 0 - allowing logical decoding to find a consistent start point > > > (or downgrade to debug message, if appropriate)? > > The message generally is quite relevant, as the process of finding a > consistent start point can take quite a while. we don't really have a > nice way to make errdetail() only be logged on a certain severity level > as far as I am aware off. Can we do just the errmsg() and remove with the errdetail? > So maybe 'Encountered xl_running_xacts record with xcnt = 0.'? That's not very user-facing, is it -- I mean, why bother the user with the names of structs and members thereof? It seems better to describe what the condition is; something like "found point in time with no running transaction". Maybe "point in time" should be "WAL record" instead. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] what does this mean: "running xacts with xcnt == 0"
Hi, On 2014-11-12 09:03:40 -0500, Peter Eisentraut wrote: > Could someone translate this detail message to English: > > ereport(LOG, > (errmsg("logical decoding found consistent point at %X/%X", > (uint32) (lsn >> 32), (uint32) lsn), > errdetail("running xacts with xcnt == 0"))); It means there a xl_running_xacts record was encountered that had xcnt = 0 - allowing logical decoding to find a consistent start point > (or downgrade to debug message, if appropriate)? The message generally is quite relevant, as the process of finding a consistent start point can take quite a while. we don't really have a nice way to make errdetail() only be logged on a certain severity level as far as I am aware off. So maybe 'Encountered xl_running_xacts record with xcnt = 0.'? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] what does this mean: "running xacts with xcnt == 0"
Could someone translate this detail message to English: ereport(LOG, (errmsg("logical decoding found consistent point at %X/%X", (uint32) (lsn >> 32), (uint32) lsn), errdetail("running xacts with xcnt == 0"))); (or downgrade to debug message, if appropriate)? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failback to old master
On Tue, Nov 11, 2014 at 11:52 PM, Maeldron T. wrote: > As far as I remember (I can’t test it right now but I am 99% sure) promoting > the slave makes it impossible to connect the old master to the new one > without making a base_backup. The reason is the timeline change. It complains. A safely shut down master (-m fast is safe) can be safely restarted as a slave to the newly promoted master. Fast shutdown shuts down all normal connections, does a shutdown checkpoint and then waits for this checkpoint to be replicated to all active streaming clients. Promoting slave to master creates a timeline switch, that prior to version 9.3 was only possible to replicate using the archive mechanism. As of version 9.3 you don't need to configure archiving to follow timeline switches, just add a recovery.conf to the old master to start it up as a slave and it will fetch everything it needs from the new master. In case of a unsafe shut down (crash) it is possible that you have WAL lying around that was not streamed out to the slave. In this case the old master will request recovery from a point after the timeline switch and the new master will reply with an error. So it is safe to try re-adding a crashed master as a slave, but this might fail. Success is more likely when the whole operating system went down, as then it's somewhat likely that any WAL got streamed out before it made it to disk. In general my suggestion is to avoid slave promotion by removal of recovery.conf, it's too easy to get confused and end up with hard to diagnose data corruption. In your example, if for example B happens to disconnect at WAL position x1 and remains disconnected while shutdown on A occurred at WAL position x2 it will be missing the WAL interval A(x1..x2). Now B is restarted as master from position x1, generates some new WAL past x2, then A is restarted as slave and starts streaming at x2 as to the best of it's knowledge that was where things left off. At this point the slave A is corrupted, you have x1..x2 changes from A that are not on the master and are also missing some changes that are on the master. Wrong data and/or crashes ensue. Always use the promotion mechanism because then you are likely to get errors when something is screwy. Unfortunately it's still possible to end up in a corrupted state with no errors, as timeline identifiers are sequential integers, not GUID's, but at least it's significantly harder. Regards, Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On 10/11/14 14:53, Robert Haas wrote: On Mon, Nov 10, 2014 at 8:39 AM, Petr Jelinek wrote: I did the calculation above wrong btw, it's actually 20 bytes not 24 bytes per record, I am inclined to just say we can live with that. If you do it as 20 bytes, you'll have to do some work to squeeze out the alignment padding. I'm inclined to think it's fine to have a few extra padding bytes here; someone might want to use those for something in the future, and they probably don't cost much. I did get around the alignment via memcpy, so it is still 20bytes. Since we agreed that the (B) case is not really feasible and we are doing the (C), I also wonder if extradata should be renamed to nodeid (even if it's not used at this point as nodeid). And then there is question about the size of it, since the nodeid itself can live with 2 bytes probably ("64k of nodes ought to be enough for everybody" ;) ). Or leave the extradata as is but use as reserved space for future use and not expose it at this time on SQL level at all? I vote for calling it node-ID, and for allowing at least 4 bytes for it. Penny wise, pound foolish. Ok, I went this way. Anyway here is v8 version of the patch, I think I addressed all the concerns mentioned, it's also rebased against current master (BRIN commit added some conflicts). Brief list of changes: - the commit timestamp record now stores timestamp, lsn and nodeid - added code to disallow turning track_commit_timestamp on with too small pagesize - the get interfaces error out when track_commit_timestamp is off - if the xid passed to get interface is out of range -infinity timestamp is returned (I think it's bad idea to throw errors here as the valid range is not static and same ID can start throwing errors between calls theoretically) - renamed the sql interfaces to pg_xact_commit_timestamp, pg_xact_commit_timestamp_data and pg_last_committed_xact, they don't expose the nodeid atm, I personally am not big fan of the "xact" but it seems more consistent with existing naming - documented pg_resetxlog changes and make all the pg_resetxlog options alphabetically ordered - committs is not used anymore, it's commit_ts (and CommitTs in camelcase), I think it's not really good idea to spell the timestamp everywhere as some interface then get 30+ chars long names... - added WAL logging of the track_commit_timestamp GUC - added alternative expected output of the regression test so that it works with make installcheck when track_commit_timestamp is on - added C interface to set default nodeid for current backend - several minor comment and naming adjustments mostly suggested by Michael -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services diff --git a/contrib/Makefile b/contrib/Makefile index b37d0dd..e331297 100644 --- a/contrib/Makefile +++ b/contrib/Makefile @@ -50,6 +50,7 @@ SUBDIRS = \ spi \ tablefunc \ tcn \ + test_committs \ test_decoding \ test_parser \ test_shm_mq \ diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c index 3b8241b..f0a023f 100644 --- a/contrib/pg_upgrade/pg_upgrade.c +++ b/contrib/pg_upgrade/pg_upgrade.c @@ -423,8 +423,10 @@ copy_clog_xlog_xid(void) /* set the next transaction id and epoch of the new cluster */ prep_status("Setting next transaction ID and epoch for new cluster"); exec_prog(UTILITY_LOG_FILE, NULL, true, - "\"%s/pg_resetxlog\" -f -x %u \"%s\"", - new_cluster.bindir, old_cluster.controldata.chkpnt_nxtxid, + "\"%s/pg_resetxlog\" -f -x %u -c %u \"%s\"", + new_cluster.bindir, + old_cluster.controldata.chkpnt_nxtxid, + old_cluster.controldata.chkpnt_nxtxid, new_cluster.pgdata); exec_prog(UTILITY_LOG_FILE, NULL, true, "\"%s/pg_resetxlog\" -f -e %u \"%s\"", diff --git a/contrib/pg_xlogdump/rmgrdesc.c b/contrib/pg_xlogdump/rmgrdesc.c index 9397198..e0af3cf 100644 --- a/contrib/pg_xlogdump/rmgrdesc.c +++ b/contrib/pg_xlogdump/rmgrdesc.c @@ -10,6 +10,7 @@ #include "access/brin_xlog.h" #include "access/clog.h" +#include "access/committs.h" #include "access/gin.h" #include "access/gist_private.h" #include "access/hash.h" diff --git a/contrib/test_committs/.gitignore b/contrib/test_committs/.gitignore new file mode 100644 index 000..1f95503 --- /dev/null +++ b/contrib/test_committs/.gitignore @@ -0,0 +1,5 @@ +# Generated subdirectories +/log/ +/isolation_output/ +/regression_output/ +/tmp_check/ diff --git a/contrib/test_committs/Makefile b/contrib/test_committs/Makefile new file mode 100644 index 000..2240749 --- /dev/null +++ b/contrib/test_committs/Makefile @@ -0,0 +1,45 @@ +# Note: because we don't tell the Makefile there are any regression tests, +# we have to clean those result files explicitly +EXTRA_CLEAN = $(pg_regress_clean_files) ./regression_output ./isolation_output + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +in
Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan
On Tue, Nov 11, 2014 at 7:48 PM, Kouhei Kaigai wrote: > Isn't provolatile = PROVOLATILE_IMMUTABLE sufficient? There are certainly things that are parallel-safe that are not immutable. It might be the case that everything immutable is parallel-safe. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Race condition between hot standby and restoring a FPW
There's a race condition between a backend running queries in hot standby mode, and restoring a full-page image from a WAL record. It's present in all supported versions. RestoreBackupBlockContents does this: buffer = XLogReadBufferExtended(bkpb.node, bkpb.fork, bkpb.block, RBM_ZERO); Assert(BufferIsValid(buffer)); if (get_cleanup_lock) LockBufferForCleanup(buffer); else LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE); If the page is not in buffer cache yet, and a backend reads and locks the buffer after the above XLogReadBufferExtended call has zeroed it, but before it has locked it, the backend sees an empty page. The principle of fixing that is straightforward: the zeroed page should not be visible to others, even momentarily. Unfortunately, I think that's going to require an API change to ReadBufferExtended(RBM_ZERO) :-(. I can think of two ways to fix this: 1. Have ReadBufferExtended lock the page in RBM_ZERO mode, before returning it. That makes the API inconsistent, as the function would sometimes lock the page, and sometimes not. 2. When ReadBufferExtended doesn't find the page in cache, it returns the buffer in !BM_VALID state (i.e. still in I/O in-progress state). Require the caller to call a second function, after locking the page, to finish the I/O. Anyone have a better idea? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Wed, Nov 12, 2014 at 12:40 AM, Tom Lane wrote: > Fujii Masao writes: >> On Tue, Nov 11, 2014 at 10:14 PM, Robert Haas wrote: >>> Not to kibitz too much after-the-fact, but wouldn't it be better to >>> give this a name that has "GIN" in it somewhere? > >> Maybe. gin_pending_list_cleanup_size? gin_pending_list_limit? Better name? > > gin_pending_list_limit sounds good to me. OK, barring any objection, I will rename the reloption and GUC to gin_pending_list_limit. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] how to determine clause for new keyword, how to add new clause in gram.y
Please tell me, if want to add 'selectivity' keyword, which clause will it categorize ? Can you please tell me in detail, how should i modify gram.y, as I am new to parser writing ? and use it thereafter. Thank you. -- Pankaj A. Bagul
Re: [HACKERS] inherit support for foreign tables
Hi Fujita-san, I reviewed fdw-chk-3 patch. Here are my comments Sanity 1. The patch applies on the latest master using "patch but not by git apply 2. it compiles clean 3. Regression run is clean, including the contrib module regressions Tests --- 1. The tests added in file_fdw module look good. We should add tests for CREATE TABLE with CHECK CONSTRAINT also. 2. For postgres_fdw we need tests to check the behaviour in case the constraints mismatch between the remote table and its local foreign table declaration in case of INSERT, UPDATE and SELECT. 3. In the testcases for postgres_fdw it seems that you have forgotten to add statement after SET constraint_exclusion to 'partition' 4. In test foreign_data there are changes to fix the diffs caused by these changes like below ALTER FOREIGN TABLE ft1 DROP CONSTRAINT no_const; -- ERROR -ERROR: "ft1" is not a table +ERROR: constraint "no_const" of relation "ft1" does not exist ALTER FOREIGN TABLE ft1 DROP CONSTRAINT IF EXISTS no_const; -ERROR: "ft1" is not a table +NOTICE: constraint "no_const" of relation "ft1" does not exist, skipping ALTER FOREIGN TABLE ft1 DROP CONSTRAINT ft1_c1_check; -ERROR: "ft1" is not a table +ERROR: constraint "ft1_c1_check" of relation "ft1" does not exist Earlier when constraints were not supported for FOREIGN TABLE, these tests made sure the same functionality. So, even though the corresponding constraints were not created on the table (in fact it didn't allow the creation as well). Now that the constraints are allowed, I think the tests for "no_const" (without IF EXISTS) and "ft1_c1_check" are duplicating the same testcase. May be we should review this set of statement in the light of new functionality. Code and implementation -- The usage of NO INHERIT and NOT VALID with CONSTRAINT on foreign table is blocked, but corresponding documentation entry doesn't mention so. Since foreign tables can not be inherited NO INHERIT option isn't applicable to foreign tables and the constraints on the foreign tables are declarative, hence NOT VALID option is also not applicable. So, I agree with what the code is doing, but that should be reflected in documentation with this explanation. Rest of the code modifies the condition checks for CHECK CONSTRAINTs on foreign tables, and it looks good to me. On Fri, Nov 7, 2014 at 5:31 PM, Etsuro Fujita wrote: > (2014/11/07 14:57), Kyotaro HORIGUCHI wrote: > >> Here are separated patches. > > fdw-chk.patch - CHECK constraints on foreign tables > fdw-inh.patch - table inheritance with foreign tables > > The latter has been created on top of [1]. > [1] > http://www.postgresql.org/message-id/540da168.3040...@lab.ntt.co.jp > >>> To be exact, it has been created on top of [1] and fdw-chk.patch. >>> >> I tried both patches on the current head, the newly added >> parameter to analyze_rel() hampered them from applying but it is >> easy to fix seemingly and almost all the other part was applied >> cleanly. >> > > Thanks for the review! > > By the way, are these the result of simply splitting of your last >> patch, foreign_inherit-v15.patch? >> >> http://www.postgresql.org/message-id/53feef94.6040...@lab.ntt.co.jp >> > > The answer is "no". > > The result of apllying whole-in-one version and this splitted >> version seem to have many differences. Did you added even other >> changes? Or do I understand this patch wrongly? >> > > As I said before, I splitted the whole-in-one version into three: 1) CHECK > constraint patch (ie fdw-chk.patch), 2) table inheritance patch (ie > fdw-inh.patch) and 3) path reparameterization patch (not posted). In > addition to that, I slightly modified #1 and #2. > > IIUC, #3 would be useful not only for the inheritance cases but for union > all cases. So, I plan to propose it independently in the next CF. > > I noticed that the latter disallows TRUNCATE on inheritance trees that contain at least one child foreign table. But I think it would be better to allow it, with the semantics that we quietly ignore the child foreign tables and apply the operation to the child plain tables, which is the same semantics as ALTER COLUMN SET STORAGE on such inheritance trees. Comments welcome. >>> >>> Done. And I've also a bit revised regression tests for both >>> patches. Patches attached. >>> >> > I rebased the patches to the latest head. Here are updated patches. > > Other changes: > > * fdw-chk-3.patch: the updated patch revises some ereport messages a > little bit. > > * fdw-inh-3.patch: I noticed that there is a doc bug in the previous > patch. The updated patch fixes that, adds a bit more docs, and revises > regression tests in foreign_data.sql a bit further. > > > Thanks, > > Best regards, > Etsuro Fujita > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscriptio
Re: [HACKERS] Unintended restart after recovery error
Antonin Houska wrote: > While looking at postmaster.c:reaper(), one problematic case occurred to me. > > > 1. Startup process signals PMSIGNAL_RECOVERY_STARTED. > > 2. Checkpointer process is forked and immediately dies. > > 3. reaper() catches this failure, calls HandleChildCrash() and thus sets > FatalError to true. > > 4. Startup process exits with non-zero status code too - either due to SIGQUIT > received from HandleChildCrash or due to some other failure of the startup > process itself. However, FatalError is already set, because of the previous > crash of the checkpointer. Thus reaper() does not set RecoveryError. > > 5. As RecoverError failed to be set to true, postmaster will try to restart > the cluster, although it apparently should not. More common case occurred to me as soon as I sent the previous mail: any process of standby cluster has died. Thus the proposed fix would make restart_after_crash (GUC) completely ineffective for standbys. I'm not sure if that's desired. Question is whether RecoveryError should reflect problems during any kind of recovery, or just during crash recovery. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Unintended restart after recovery error
While looking at postmaster.c:reaper(), one problematic case occurred to me. 1. Startup process signals PMSIGNAL_RECOVERY_STARTED. 2. Checkpointer process is forked and immediately dies. 3. reaper() catches this failure, calls HandleChildCrash() and thus sets FatalError to true. 4. Startup process exits with non-zero status code too - either due to SIGQUIT received from HandleChildCrash or due to some other failure of the startup process itself. However, FatalError is already set, because of the previous crash of the checkpointer. Thus reaper() does not set RecoveryError. 5. As RecoverError failed to be set to true, postmaster will try to restart the cluster, although it apparently should not. I could simulate the problem using the following changes (and by installing recovery.conf into DATA directory): diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 99f702c..0cbd1c1 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6178,6 +6178,12 @@ StartupXLOG(void) SetForwardFsyncRequests(); SendPostmasterSignal(PMSIGNAL_RECOVERY_STARTED); bgwriterLaunched = true; + + /* +* Accidental delay, ensuring that checkpointer's crash is caught +* by PM first. +*/ + pg_usleep(500L); } /* diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index 6c814ba..4585119 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -194,6 +194,8 @@ CheckpointerMain(void) sigjmp_buf local_sigjmp_buf; MemoryContext checkpointer_context; + ereport(FATAL, (errmsg("early failure"))); + CheckpointerShmem->checkpointer_pid = MyProcPid; /* This works for me as a fix: diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 6220a8e..0fb13bb 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -2573,15 +2573,11 @@ reaper(SIGNAL_ARGS) * After PM_STARTUP, any unexpected exit (including FATAL exit) of * the startup process is catastrophic, so kill other children, * and set RecoveryError so we don't try to reinitialize after -* they're gone. Exception: if FatalError is already set, that -* implies we previously sent the startup process a SIGQUIT, so -* that's probably the reason it died, and we do want to try to -* restart in that case. +* they're gone. */ if (!EXIT_STATUS_0(exitstatus)) { - if (!FatalError) - RecoveryError = true; + RecoveryError = true; HandleChildCrash(pid, exitstatus, _("startup process")); continue; -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan
On Wed, Nov 12, 2014 at 1:24 PM, David Rowley wrote: > > On Tue, Nov 11, 2014 at 9:29 PM, Simon Riggs > wrote: > >> >> This plan type is widely used in reporting queries, so will hit the >> mainline of BI applications and many Mat View creations. >> This will allow SELECT count(*) FROM foo to go faster also. >> >> > We'd also need to add some infrastructure to merge aggregate states > together for this to work properly. This means that could also work for > avg() and stddev etc. For max() and min() the merge functions would likely > just be the same as the transition functions. > > It might make sense to make a new planner operator which can be responsible for pulling from each of the individual parallel Agg nodes and then aggregating over the results. A couple of things that might be worth considering are whether we want to enforce using parallel aggregation or let planner decide if it wants to do a parallel aggregate or go with a single plan. For eg, the average estimated size of groups might be one thing that planner may consider while deciding between a parallel and a single execution plan. I dont see merging states as an easy problem, and should perhaps be tackled apart from this thread. Also, do we want to allow parallelism only with GroupAggs? Regards, Atri