Re: [HACKERS] FW: REVIEW: Allow formatting in log_line_prefix
On Fri, Sep 20, 2013 at 12:48 AM, Peter Eisentraut pete...@gmx.net wrote: Something is weird in your latest patch. The header is: diff -u -r b/postgresql/doc/src/sgml/config.sgml a/postgresql/doc/src/sgml/config.sgml --- b/postgresql/doc/src/sgml/config.sgml 2013-09-09 23:40:52.356371191 +1200 +++ a/postgresql/doc/src/sgml/config.sgml 2013-09-19 22:13:26.236681468 +1200 This doesn't apply with patch -p1 or -p0. (You need -p2.) Your previous patch in this thread seemed OK. You might want to check what you did there. I moved the source around and I've patched against it again. New patch attached. David log_line_formatting_v0.3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Thu, Sep 19, 2013 at 11:32 AM, Fujii Masao-2 [via PostgreSQL] ml-node+s1045698n5771565...@n5.nabble.com wrote: On Thu, Sep 19, 2013 at 2:25 PM, samthakur74 [hidden email]http://user/SendEmail.jtp?type=nodenode=5771565i=0 wrote: I got the segmentation fault when I tested the case where the least-executed query statistics is discarded, i.e., when I executed different queries more than pg_stat_statements.max times. I guess that the patch might have a bug. Thanks, will try to fix it. pg_stat_statements--1.1.sql should be removed. Yes will do that + entrystructfieldqueryid/structfield/entry + entrytypebigint/type/entry + entry/entry + entryUnique value of each representative statement for the current statistics session. + This value will change for each new statistics session./entry What does statistics session mean? The time period when statistics are gathered by statistics collector without being reset. So the statistics session continues across normal shutdowns, but in case of abnormal situations like crashes, format upgrades or statistics being reset for any other reason, a new time period of statistics collection starts i.e. a new statistics session. The queryid value generation is linked to statistics session so emphasize the fact that in case of crashes,format upgrades or any situation of statistics reset, the queryid for the same queries will also change. I'm afraid that this behavior narrows down the use case of queryid very much. For example, since the queryid of the same query would not be the same in the master and the standby servers, we cannot associate those two statistics by using the queryid. The queryid changes through the crash recovery, so we cannot associate the query statistics generated before the crash with that generated after the crash recovery even if the query is the same. Yes, these are limitations in this approach. The other approaches suggested included 1. Expose query id hash value as is, in the view, but document the fact that it will be unstable between releases 2. Expose query id hash value via an undocumented function and let more expert users decided if they want to use it. The approach of using statistics session id to generate queryid is a compromise between not exposing it at all and exposing it without warning the users of unstable hash value from query tree between releases. This is not directly related to the patch itself, but why does the queryid need to be calculated based on also the statistics session? If we expose hash value of query tree, without using statistics session, it is possible that users might make wrong assumption that this value remains stable across version upgrades, when in reality it depends on whether the version has make changes to query tree internals. So to explicitly ensure that users do not make this wrong assumption, hash value generation use statistics session id, which is newly created under situations described above. Will update documentation clearly explain the term statistics session in this context Yep, that's helpful! Regards, Sameer -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5771701.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: [HACKERS] FW: REVIEW: Allow formatting in log_line_prefix
David Rowley wrote: I moved the source around and I've patched against it again. New patch attached. Thank you, marked as ready for committer. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor inheritance/check bug: Inconsistent behavior
On Thu, Sep 19, 2013 at 10:28 PM, Robert Haas robertmh...@gmail.com wrote: Marking this as Ready for committer. Thanks a ton for reviewing the patch. I rewrote the comments for this patch and fixed the incorrect formatting in parse_relation.c. It used spaces instead of tabs, which is why if you look at the patch file you'll see that the alignment looked off. See new version, attached. Thanks, Attached version looks better. However, I have a few residual questions: 1. Does copy.c also need to be fixed? I see that it also calls ExecConstraints(), and I don't see it setting tableOid anywhere. We certainly want COPY and INSERT to behave the same way. I have missed it by confusing it with OID, as OID is set in COPY. I think along with COPY, it needs to set in ATRewriteTable() as well, because during rewrite also we check constraints. I will send an updated patch after point-2 is concluded. 2. Should constraints also allow access to the oid column? Right now you can do that but it doesn't seem to work, e.g.: rhaas=# create table foo (a int, check (oid::integer % 2 = 0)) with oids; CREATE TABLE I have tried various combinations, it is giving error at my end. postgres=# create table tbl_with_oid(c1 int) With OIDS; CREATE TABLE postgres=# alter table tbl_with_oid add check (oid 16403); ERROR: system column oid reference in check constraint is invalid postgres=# alter table tbl_with_oid add check (oid =0); ERROR: system column oid reference in check constraint is invalid postgres=# alter table tbl_with_oid add check (oid::integer %2 =0); ERROR: system column oid reference in check constraint is invalid postgres=# create table foo (a int, check (oid::integer % 2 = 0)) with oids; ERROR: system column oid reference in check constraint is invalid Just prohibiting that might be fine; it doesn't seem that useful anyway. Currently only tableOid is allowed, other system columns should error out due to below check: + /* In constraint check, no system column is allowed except tableOid */ + if (pstate-p_expr_kind == EXPR_KIND_CHECK_CONSTRAINT + attnum InvalidAttrNumber attnum != TableOidAttributeNumber) But if we're setting the table OID, maybe we should set the OID too, and then just allow this. It can be done for OID as well. I don't have any strong reason for either doing or not doing it, if you think it can be of use then we can add it. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor inheritance/check bug: Inconsistent behavior
On Fri, Sep 20, 2013 at 12:57 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Thu, Sep 19, 2013 at 10:28 PM, Robert Haas robertmh...@gmail.com wrote: Marking this as Ready for committer. Thanks a ton for reviewing the patch. I rewrote the comments for this patch and fixed the incorrect formatting in parse_relation.c. It used spaces instead of tabs, which is why if you look at the patch file you'll see that the alignment looked off. See new version, attached. Thanks, Attached version looks better. However, I have a few residual questions: 1. Does copy.c also need to be fixed? I see that it also calls ExecConstraints(), and I don't see it setting tableOid anywhere. We certainly want COPY and INSERT to behave the same way. I have missed it by confusing it with OID, as OID is set in COPY. I think along with COPY, it needs to set in ATRewriteTable() as well, because during rewrite also we check constraints. I will send an updated patch after point-2 is concluded. 2. Should constraints also allow access to the oid column? Right now you can do that but it doesn't seem to work, e.g.: rhaas=# create table foo (a int, check (oid::integer % 2 = 0)) with oids; CREATE TABLE I have tried various combinations, it is giving error at my end. postgres=# create table tbl_with_oid(c1 int) With OIDS; CREATE TABLE postgres=# alter table tbl_with_oid add check (oid 16403); ERROR: system column oid reference in check constraint is invalid postgres=# alter table tbl_with_oid add check (oid =0); ERROR: system column oid reference in check constraint is invalid postgres=# alter table tbl_with_oid add check (oid::integer %2 =0); ERROR: system column oid reference in check constraint is invalid postgres=# create table foo (a int, check (oid::integer % 2 = 0)) with oids; ERROR: system column oid reference in check constraint is invalid I am sorry, I just realized after pressing send button that you want to say the above point without considering above patch. Just prohibiting that might be fine; it doesn't seem that useful anyway. Currently only tableOid is allowed, other system columns should error out due to below check: + /* In constraint check, no system column is allowed except tableOid */ + if (pstate-p_expr_kind == EXPR_KIND_CHECK_CONSTRAINT + attnum InvalidAttrNumber attnum != TableOidAttributeNumber) But if we're setting the table OID, maybe we should set the OID too, and then just allow this. It can be done for OID as well. I don't have any strong reason for either doing or not doing it, if you think it can be of use then we can add it. One more thing, I think it will be better to update information in docs, probably in Create Table page where CHECK constraints are explained and where DDL constraints are explained at link: http://www.postgresql.org/docs/devel/static/ddl-constraints.html With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dump/Reload broken with relocatable extensions
On 09/19/2013 11:40 PM, Dimitri Fontaine wrote: Vik Fearing vik.fear...@dalibo.com writes: I don't know if this has been discussed before, a cursory search of the archives didn't turn up anything interesting. I perhaps didn't put in the right keywords. For others not to spend too much time on this: it seems like a problem with the extension not abiding by the rules about its relocatable property and the @extschema@ thingy. http://www.postgresql.org/docs/9.3/static/extend-extensions.html#AEN54999 I can't get this to work. If I modify my function to be CREATE FUNCTION no_accents(text) RETURNS boolean AS 'select $1 = unaccent($1);' LANGUAGE sql STABLE STRICT SET search_path = '@extschema@'; then I get d=# create extension unaccent; ERROR: function unaccent(text) does not exist LINE 1: select $1 = unaccent($1); ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. QUERY: select $1 = unaccent($1); If I modify it to be CREATE FUNCTION no_accents(text) RETURNS boolean AS 'select $1 = unaccent($1);' LANGUAGE sql STABLE STRICT; ALTER FUNCTION no_accents(text) SET search_path = '@extschema@'; then I get the same restore problem I originally described. What am I doing wrong? -- Vik -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Fri, Sep 20, 2013 at 1:11 AM, Daniel Farina dan...@fdr.io wrote: I think the n-call underestimation propagation may not be quite precise for various detailed reasons (having to do with 'sticky' queries) and to make it precise is probably more work than it's worth. And, on more reflection, I'm also having a hard time imaging people intuiting that value usefully. So, here's a version removing that. I forgot about removal of the relevant SGML, amended here in v6. pg_stat_statements-identification-v6.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assertions in PL/PgSQL
On 16 September 2013 03:43, Marko Tiikkaja ma...@joh.to wrote: On 2013-09-15 23:23, Jaime Casanova wrote: If using ASSERT as keyword is not acceptable, not that i agree but in case. What about using RAISE EXCEPTION WHEN (condition) I was going to suggest the same idea: Extend RAISE syntax without introducing new keywords. Something like: RAISE assert_exception WHEN assert_condition ... where assert_exception is a new exception label which maps to a new internal sqlstate. I think it would be extremely surprising if a command like that got optimized away based on a GUC, so I don't think that would be a good idea. In pl_gram.y, in the rule stmt_raise, determine that this RAISE is for ASSERT, and then return NULL if plpgsql_curr_compile-enable_assertions is false. Isn't this possible ? Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/**mailpref/pgsql-hackershttp://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
On Fri, Sep 20, 2013 at 08:58:53AM +0900, Tatsuo Ishii wrote: For example, CREATE TABLE t1(t NCHAR(10)) will succeed if NCHAR is UTF-8 and database encoding is UTF-8. Even succeed if NCHAR is SHIFT-JIS and database encoding is UTF-8 because there is a conversion between UTF-8 and SHIFT-JIS. However will not succeed if NCHAR is SHIFT-JIS and database encoding is ISO-8859-1 because there's no conversion between them. As far as I can tell the whole reason for introducing NCHAR is to support SHIFT-JIS, there hasn't been call for any other encodings, that I can remember anyway. So rather than this whole NCHAR thing, why not just add a type sjistext, and a few type casts and call it a day... Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] Patch for fail-back without fresh backup
Attached patch combines documentation patch and source-code patch. I have had a stab at reviewing the documentation. Have a look. --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1749,6 +1749,50 @@ include 'filename' /listitem /varlistentry + varlistentry id=guc-synchronous-transfer xreflabel=synchronous_transfer + termvarnamesynchronous_transfer/varname (typeenum/type)/term + indexterm + primaryvarnamesynchronous_transfer/ configuration parameter/primary + /indexterm + listitem + para +This parameter controls the synchronous nature of WAL transfer and +maintains file system level consistency between master server and +standby server. It specifies whether master server will wait for file +system level change (for example : modifying data page) before +the corresponding WAL records are replicated to the standby server. + /para + para +Valid values are literalcommit/, literaldata_flush/ and +literalall/. The default value is literalcommit/, meaning +that master will only wait for transaction commits, this is equivalent +to turning off literalsynchronous_transfer/ parameter and standby +server will behave as a quotesynchronous standby / in +Streaming Replication. For value literaldata_flush/, master will +wait only for data page modifications but not for transaction +commits, hence the standby server will act as quoteasynchronous +failback safe standby/. For value literal all/, master will wait +for data page modifications as well as for transaction commits and +resultant standby server will act as quotesynchronous failback safe +standby/.The wait is on background activities and hence will not create performance overhead. + To configure synchronous failback safe standby +xref linkend=guc-synchronous-standby-names should be set. + /para + /listitem + /varlistentry @@ -2258,14 +2302,25 @@ include 'filename'/indexterm listitem para -Specifies a comma-separated list of standby names that can support -firsttermsynchronous replication/, as described in -xref linkend=synchronous-replication. -At any one time there will be at most one active synchronous standby; -transactions waiting for commit will be allowed to proceed after -this standby server confirms receipt of their data. -The synchronous standby will be the first standby named in this list -that is both currently connected and streaming data in real-time +Specifies a comma-separated list of standby names. If this parameter +is set then standby will behave as synchronous standby in replication, +as described in xref linkend=synchronous-replication or synchronous +failback safe standby, as described in xref linkend=failback-safe. +At any time there will be at most one active standby; when standby is +synchronous standby in replication, transactions waiting for commit +will be allowed to proceed after this standby server confirms receipt +of their data. But when standby is synchronous failback safe standby +data page modifications as well as transaction commits will be allowed +to proceed only after this standby server confirms receipt of their data. +If this parameter is set to empty value and +xref linkend=guc-synchronous-transfer is set to literaldata_flush/ +then standby is called as asynchronous failback safe standby and only +data page modifications will wait before corresponding WAL record is +replicated to standby. + /para + para +Synchronous standby in replication will be the first standby named in +this list that is both currently connected and streaming data in real-time (as shown by a state of literalstreaming/literal in the link linkend=monitoring-stats-views-table literalpg_stat_replication//link view). --- a/doc/src/sgml/high-availability.sgml +++ b/doc/src/sgml/high-availability.sgml + + sect2 id=failback-safe + titleSetting up failback safe standby/title + + indexterm zone=high-availability + primarySetting up failback safe standby/primary + /indexterm + + para + PostgreSQL streaming replication offers durability, but if the master crashes and +a particular WAL record is unable to reach to standby server, then that +WAL record is present on master server but not on standby server. +In such a case master is ahead of standby server in term of WAL records and data in database. +This leads to file-system level inconsistency between master and standby server. +For example a heap page update on the master might not have been reflected on standby when
Re: [HACKERS] Assertions in PL/PgSQL
On 9/20/13 12:09 PM, Amit Khandekar wrote: On 16 September 2013 03:43, Marko Tiikkaja ma...@joh.to wrote: I think it would be extremely surprising if a command like that got optimized away based on a GUC, so I don't think that would be a good idea. In pl_gram.y, in the rule stmt_raise, determine that this RAISE is for ASSERT, and then return NULL if plpgsql_curr_compile-enable_assertions is false. Isn't this possible ? Of course it's possible. But I, as a PostgreSQL user writing PL/PgSQL code, would be extremely surprised if this new cool option to RAISE didn't work for some reason. If we use ASSERT the situation is different; most people will realize it's a new command and works differently from RAISE. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
On Thu, Sep 19, 2013 at 06:31:38PM -0400, Steve Singer wrote: I think there is agreement that better (as in more obscure) operators than === and !== need to be picked and we need to find a place in the user-docs to warn users of the behaviour of this operators. Hannu has proposed *==* binary equal, surely very equal by any other definition as wall !==? maybe not equal -- binary inequal, may still be equal by other comparison methods It's a pity operators must be non-alpha and can't be named. Something like: SELECT foo OPERATOR(byte_equivalent) bar; is simultaneously obscure, yet clear. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] Assertions in PL/PgSQL
On 9/19/13 9:09 PM, Robert Haas wrote: Personally, I'm pretty skeptical about the value of adding dedicated syntax for this. I mean, I'll be the first to admit that PL/pgsql is a clunky and awkward language. But somebody's always proposing something that they think will make it better, and I feel somehow that if we accept all of those proposals at face value, we'll just end up with a mess. So IMHO the bar for adding new syntax to PL/pgsql should be reasonably high. YMMV, of course, and probably does. I see where you're coming from, and agree, to an extent. The issue of how this is spelled is somewhat secondary for me. I think ASSERT is probably as good as anything. But let's not kid ourselves: even reserving this word only in PL/pgsql WILL break things for some users, and there are LOTS of people out there with LOTS of procedural code. Every tiny backward-incompatibility reduces by just a little bit the percentage of those people who can upgrade and increases the delay before they do. This is an area where past experience has made me quite wary. The thing is, what this means is that to add a new feature to the language, you have to make the syntax so damn ugly that no one wants to use it (see row_count, for example) or you will break some poor user's function. And now we got all this cool functionality which nobody wants to use, and the language itself actually gets progressively worse. All this is starting to sound like it's already too late to make PL/PgSQL better, and we should just start afresh. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql tab completion for updatable foreign tables
Okay, are you adding this to the september commitfest? OK, I've done that. I think that it's too late for 9.3. +1 for idea. I have tested patch and got surprising results with Cent-OS Patch is working fine for Cent-OS 6.2 and RHEL 6.3 But is is giving problem on Cent-OS 6.3 (tab complete for local tables but not for foreign tables) I have used following script: Two postgres servers are running by using ports 5432 and 5433. Server with port 5432 has postgres_fdw installed and will interact with the remote server running under port 5433. psql -p 5433 -c CREATE TABLE aa_remote (a int, b int) postgres postgres=# CREATE EXTENSION postgres_fdw; postgres=# CREATE SERVER postgres_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost', port '5433', dbname 'postgres'); postgres=# CREATE USER MAPPING FOR PUBLIC SERVER postgres_server OPTIONS (password ''); postgres=# CREATE FOREIGN TABLE aa_foreign (a int, b int) SERVER postgres_server OPTIONS (table_name 'aa_remote'); postgres=# INSERT into aa_foreign values (1,2); But while doing any operation on aa_foreign tab do not complete on Cent-OS 6.3 (tab complete for local tables but not for foreign tables) Is that a problem ? Regards, Samrat Revagade
Re: [HACKERS] logical changeset generation v6
Hi, On 2013-09-19 12:05:35 -0400, Robert Haas wrote: No question. I'm not saying that that optimization shouldn't go in right after the main patch does, but IMHO right now there are too many things going in the 0004 patch to discuss them all simultaneously. I'd like to find a way of splitting this up that will let us block-and-tackle individual pieces of it, even we end up committing them all one right after the other. Fine with me. I was critized for splitting up stuff too much before ;) Expect a finer-grained series. But that raises an interesting question: why is the overhead so bad? I mean, this shouldn't be any worse than having a series of transactions running concurrently with pgbench that take a snapshot and hold it for as long as it takes the decoding process to decode the most-recently committed transaction. Pgbench really slows down scarily if there are some slightly longer running transactions around... Is the issue here that we can't advance xmin until we've fsync'd the fruits of decoding down to disk? Basically yes. We only advance the xmin of the slot so far that we could still build a valid snapshot to decode the first transaction not confirmed to have been synced to disk by the client. If so, that's mighty painful. But we'd really only need to hold back xmin in that situation when some catalog change has occurred meanwhile, which for pgbench means never. So something seems fishy here. It's less simple than that. We need to protect against concurrent DDL producing deleted rows that we will still need. We need HeapTupleStisfiesVacuum() to return HEAPTUPLE_RECENTLY_DEAD not HEAPTUPLE_DEAD for such rows, right? The way to do that is to guarantee that if TransactionIdDidCommit(xmax) is true, TransactionIdPrecedes(xmax, OldestXmin) is also true. So, we need to peg OldestXmin (as passed to HTSV) to the xid of the oldest transaction we're still decoding. I am not sure how you could do that iff somewhere in the future DDL has started since there's no interlock preventing anyone against doing so. - It still bothers me that we're going to have mandatory slots for logical replication and no slots for physical replication. If people think this needs to be a general facility from the start, I can be convinced that way, but I think there's so much to discuss around the semantics and different usecases that I'd much prefer to discuss that later. I'm worried that if we don't know how the physical replication slots are going to work, they'll end up being randomly different from the logical replication slots, and that'll be an API wart which we'll have a hard time getting rid of later. Hm. I actually think that minus some s/Logical//g and a mv won't be much need to change on the slot interface itself. What we need for physical rep is basically to a) store the position up to where the primary has fsynced the WAL b) store the xmin horizon the standby currently has. Sure, we can store more stats (most of pg_stat_replication, perhaps some more) but that's not functionally critical and not hard to extend. The points I find daunting are the semantics, like: * How do we control whether a standby is allowed prevent WAL file removal. What if archiving is configured? * How do we control whether a standby is allowed to peg xmin? * How long do we peg an xmin/wal file removal if the standby is gone * How does the userinterface look to remove a slot if a standby is gone * How do we decide/control which commands use a slot in which cases? - What is the purpose of (Un)SuspendDecodingSnapshots? It seems that should be explained somewhere. I have my doubts about how safe that is. I'll document the details if they aren't right now. Consider what happens if somebody does something like: VACUUM FULL pg_am;. If we were to build the relation descriptor of pg_am in an historical snapshot, as you coin it, we'd have the wrong filenode in there. And consequently any future lookups in pg_am will open a file that doesn't exist. That problem only exist for non-nailed relations that are accessed during decoding. But if it's some user table flagged with the terribly-named treat_as_catalog_table flag, then they could have not only changed the relfilenode but also the tupledesc. And then you can't just wave your hands at the problem. Heh. Well cought. There's a comment about that somewhere... Those are problematic, my plan so far is to throw my hands up and forbid alter tables that rewrite those. I know you don't like that flag and especially it's name. I am open to suggestions to a) rename it b) find a better solution. I am pretty sure a) is possible but I have severe doubts about any realistic b). Yes, I don't like it either. I am not sure what to replace it with though. It's easy enough to fit something in GetCatalogSnapshot() and I don't have a problem with that, but I am less happy with adding code like that to GetSnapshotData() for
[HACKERS] SSI freezing bug
Hi, Prompted by Andres Freund's comments on my Freezing without Write I/O patch, I realized that there's there's an existing bug in the way predicate locking handles freezing (or rather, it doesn't handle it). When a tuple is predicate-locked, the key of the lock is ctid+xmin. However, when a tuple is frozen, its xmin is changed to FrozenXid. That effectively invalidates any predicate lock on the tuple, as checking for a lock on the same tuple later won't find it as the xmin is different. Attached is an isolationtester spec to demonstrate this. - Heikki # Test predicate locks with freezing # # This test has two serializable transactions. Both select two rows # from the table, and then update one of them. # If these were serialized (run one at a time), the transaction that # runs later would see one of the rows to be updated. # # Any overlap between the transactions must cause a serialization failure. # # Normally that works, but freezing a tuple screws up predicate locking. # After freezing a tuple, any predicate locks on it are effectively lost. setup { CREATE TABLE test (i int PRIMARY KEY, t text); INSERT INTO test VALUES (5, 'apple'), (7, 'pear'), (11, 'banana'); } teardown { DROP TABLE test; } session s1 setup { BEGIN ISOLATION LEVEL SERIALIZABLE; set enable_seqscan=off;} step r1 { SELECT * FROM test WHERE i IN (5, 7) } step w1 { UPDATE test SET t = 'pear_xact1' WHERE i = 7 } step c1 { COMMIT; } session s2 setup { BEGIN ISOLATION LEVEL SERIALIZABLE; set enable_seqscan=off;} step r2 { SELECT * FROM test WHERE i IN (5, 7) } step w2 { UPDATE test SET t = 'apple_xact2' WHERE i = 5 } step c2 { COMMIT; } session freezer step freeze { VACUUM FREEZE test; } # This should cause an serialization error - and it does if you remove the # freeze step. permutation r1 r2 freeze w1 w2 c1 c2 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory
Hi, On 2013-09-19 11:44:34 -0400, Robert Haas wrote: On Wed, Sep 18, 2013 at 1:42 PM, Andres Freund and...@2ndquadrant.com wrote: --- /dev/null +++ b/src/backend/storage/ipc/dsm.c +#define PG_DYNSHMEM_STATE_FILE PG_DYNSHMEM_DIR /state +#define PG_DYNSHMEM_NEW_STATE_FILE PG_DYNSHMEM_DIR /state.new Hm, I guess you dont't want to add it to global/ or so because of the mmap implementation where you presumably scan the directory? Yes, and also because I thought this way would make it easier to teach things like pg_basebackup (or anybody's home-brew scripts) to just skip that directory completely. Actually, I was wondering if we ought to have a directory under pgdata whose explicit charter it was to contain files that shouldn't be copied as part of a base backup. pg_do_not_back_this_up. Wondered exactly about that as soon as you've mentioned pg_basebackup. pg_local/? + /* Determine size for new control segment. */ + maxitems = PG_DYNSHMEM_FIXED_SLOTS + + PG_DYNSHMEM_SLOTS_PER_BACKEND * MaxBackends; It seems likely that MaxConnections would be sufficient? I think we could argue about the best way to set this until the cows come home, but I don't think it probably matters much at this point. We can always change the formula later as we gain experience. However, I don't have a principled reason for assuming that only user-connected backends will create dynamic shared memory segments. Hm, yes. I had MaxBackends down as MaxConnections + autovacuum stuff; but max_worker_processes are in there now, so you're right that doesn't make sense. +/* + * Read and parse the state file. + * Perhaps CRC32 the content? I don't see the point. If the file contents are garbage that happens to look like a number, we'll go oh, there isn't any such segment or oh, there is such a segment but it doesn't look like a control segment, so forget it. There are a lot of things we really ought to be CRCing to avoid corruption risk, but I can't see how this is remotely one of them. I was worried about a partially written file or containing contents from two different postmaster cycles, but it's actually far too small for that... I initially had thought you'd write the contents of the entire shared control segment there, not just it's id. + /* Create or truncate the file. */ + statefd = open(PG_DYNSHMEM_NEW_STATE_FILE, O_RDWR|O_CREAT|O_TRUNC, 0600); Doesn't this need a | PG_BINARY? It's a text file. Do we need PG_BINARY anyway? I'd say yes. Non binary mode stuff on windows does stuff like transforming LF = CRLF on reading/writing, which makes sizes not match up and similar ugliness. Imo there's little reason to use non-binary mode for anything written for postgres' own consumption. Why are you using open() and not BasicOpenFile or even OpenTransientFile? Because those don't work in the postmaster. Oh, that's news to me. Seems strange, especially for BasicOpenFile. + /* Write contents. */ + snprintf(statebuf, PG_DYNSHMEM_STATE_BUFSIZ, %lu\n, + (unsigned long) dsm_control_handle); Why are we upcasting the length of dsm_control_handle here? Also, doesn't this need the usual UINT64_FORMAT thingy? dsm_handle is an alias for uint32. Is that always exactly an unsigned int or can it sometimes be an unsigned long? I thought the latter, so couldn't figure out how to write this portably without casting to a type that explicitly matched the format string. That should always be an unsigned int on platforms we support. Note that we've printed TransactionIds that way (i.e. %u) for a long time and they are a uint32 as well. Not sure whether it's sensible to only LOG in these cases. After all there's something unexpected happening. The robustness argument doesn't count since we're already shutting down. I see no point in throwing an error. The fact that we're having trouble cleaning up one dynamic shared memory segment doesn't mean we shouldn't try to clean up others, or that any remaining postmaster shutdown hooks shouldn't be executed. Well, it means we'll do a regular shutdown instead of PANICing and *not* writing a checkpoint. If something has corrupted our state to the point we cannot unregister shared memory we registered, something has gone terribly wrong. Quite possibly we've scribbled over our control structures or such. In that case it's not proper to do a normal shutdown, we're quite possibly writing bad data. + ereport(ERROR, + (errcode(ERRCODE_INTERNAL_ERROR), + errmsg(dynamic shared memory control segment is not valid))); Imo that's a PANIC or at the very least a FATAL. Sure, that's a tempting option, but it doesn't seem to serve any very necessary point. There's no data corruption problem if we proceed here. Most
Re: [HACKERS] SSI freezing bug
Hi, On 2013-09-20 13:55:36 +0300, Heikki Linnakangas wrote: When a tuple is predicate-locked, the key of the lock is ctid+xmin. However, when a tuple is frozen, its xmin is changed to FrozenXid. That effectively invalidates any predicate lock on the tuple, as checking for a lock on the same tuple later won't find it as the xmin is different. Attached is an isolationtester spec to demonstrate this. Do you have any idea to fix that besides keeping the xmin horizon below the lowest of the xids that are predicate locked? Which seems nasty to compute and is probably not trivial to fit into the procarray.c machinery? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI freezing bug
On 2013-09-20 13:53:04 +0200, Andres Freund wrote: Hi, On 2013-09-20 13:55:36 +0300, Heikki Linnakangas wrote: When a tuple is predicate-locked, the key of the lock is ctid+xmin. However, when a tuple is frozen, its xmin is changed to FrozenXid. That effectively invalidates any predicate lock on the tuple, as checking for a lock on the same tuple later won't find it as the xmin is different. Attached is an isolationtester spec to demonstrate this. Do you have any idea to fix that besides keeping the xmin horizon below the lowest of the xids that are predicate locked? Which seems nasty to compute and is probably not trivial to fit into the procarray.c machinery? A better solution probably is to promote tuple-level locks if they exist to a relation level one upon freezing I guess? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pg_sleep(interval)
Here is a review of the pg_sleep(INTERVAL) patch version 1: - patch applies cleanly on current head - the function documentation is updated and clear - the function definition relies on a SQL function to call the underlying pg_sleep(INT) function I'm fine with this, especially as if someone calls this function, he/she is not in a hurry:-) - this one-liner implementation is straightforward - the provided feature is simple, and seems useful. - configure/make/make check work ok However : - the new function is *not* tested anywhere! I would suggest simply to replace some pg_sleep(int) instances by corresponding pg_sleep(interval) instances in the non regression tests. - some concerns have been raised that it breaks pg_sleep(TEXT) which currently works thanks to the implicit TEXT - INT cast. I would suggest to add pg_sleep(TEXT) explicitely, like: CREATE FUNCTION pg_sleep(TEXT) RETURNS VOID VOLATILE STRICT AS $$ select pg_sleep($1::INTEGER) $$ LANGUAGE SQL; That would be another one liner, to update the documentation and to add some tests as well! ISTM that providing pg_sleep(TEXT) cleanly resolves the upward-compatibility issue raised. -- Fabien -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assertions in PL/PgSQL
On Fri, Sep 20, 2013 at 6:24 AM, Marko Tiikkaja ma...@joh.to wrote: The issue of how this is spelled is somewhat secondary for me. I think ASSERT is probably as good as anything. But let's not kid ourselves: even reserving this word only in PL/pgsql WILL break things for some users, and there are LOTS of people out there with LOTS of procedural code. Every tiny backward-incompatibility reduces by just a little bit the percentage of those people who can upgrade and increases the delay before they do. This is an area where past experience has made me quite wary. The thing is, what this means is that to add a new feature to the language, you have to make the syntax so damn ugly that no one wants to use it (see row_count, for example) or you will break some poor user's function. And now we got all this cool functionality which nobody wants to use, and the language itself actually gets progressively worse. All this is starting to sound like it's already too late to make PL/PgSQL better, and we should just start afresh. To some extent I agree that PL/pgsql is hopeless. I think there are some things we can do to improve it, but most of what gets proposed at least in this forum strikes me as tinkering around the edges, and it can't make up for fundamentally bad language design decisions. Part of the problem, of course, is that most programming languages don't get re-released every year. It's not that it would be OK for C to suddenly reserve a bunch of new keywords; it's that they don't try. And when they do release no language versions (like C99) some people (like us) don't adopt them, for fear of being unable to run our code on older systems. Such considerations apply with equal force to PL/pgsql, but it gets a new release every year rather than every decade, so the problems are magnified. The other part of the problem is that the language isn't designed from the beginning to be extensible. In Perl, for example, they chose to mark variables with a leading $, @, or % and functions with a leading . That last marking has largely fallen into desuetude, but the point is that - to the extent that you do have and use such markers - you can add new keywords without breaking anything. Some languages can also distinguish keywords positionally; for example, ABORT doesn't need to be reserved in PostgreSQL's SQL dialect because it can only appear as a command at the beginning of a line, and it can't be a column, type, or function name in that position. Such an approach might even work ASSERT in PL/pgsql, if there's a clean way to disambiguate vs. the assignment syntax. But even if we can make that work, we're going to continue to face this problem with each new language extension. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Where to load modules from?
On Thu, Sep 19, 2013 at 5:54 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-09-19 22:56:52 +0200, Dimitri Fontaine wrote: Robert Haas robertmh...@gmail.com writes: I think I'd prefer a GUC that allows specifying multiple directories that are searched in order to a single symlinked directory. Why? I ask because I have the opposite preference, based on the precedent of pg_xlog. Because I want to specify multiple paths. E.g. one with modules for a specific postgres version, one for the cluster and one for my development directory. Now we could recursively search a directory that contains symlinks to directories, but that seems ugly. I see. My main hesitation is around security. I feel somehow that changing a GUC to trojan the system would be easier for a remote user to accomplish than having to replace a directory with a symlink. In an effort to reach consensus, what about having both, with the GUC being empty by default? That way you have a default per-cluster place where to stuff binaries to be loaded, and a GUC to manage finer settings if needs be. Well, we can have the guc have a default value of $datadir/pg_lib or such. But using two independent mechanisms seems like a bad idea to me. Heartily agreed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Where to load modules from?
On 2013-09-20 08:06:56 -0400, Robert Haas wrote: On Thu, Sep 19, 2013 at 5:54 PM, Andres Freund and...@2ndquadrant.com wrote: Because I want to specify multiple paths. E.g. one with modules for a specific postgres version, one for the cluster and one for my development directory. Now we could recursively search a directory that contains symlinks to directories, but that seems ugly. I see. My main hesitation is around security. I feel somehow that changing a GUC to trojan the system would be easier for a remote user to accomplish than having to replace a directory with a symlink. If they can change a PGC_POSTMASTER GUC, they already can easily enough do: shared_preload_libraries='/path/to/my/bad/so.so' that's already allowed. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Where to load modules from?
On Fri, Sep 20, 2013 at 8:10 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-09-20 08:06:56 -0400, Robert Haas wrote: On Thu, Sep 19, 2013 at 5:54 PM, Andres Freund and...@2ndquadrant.com wrote: Because I want to specify multiple paths. E.g. one with modules for a specific postgres version, one for the cluster and one for my development directory. Now we could recursively search a directory that contains symlinks to directories, but that seems ugly. I see. My main hesitation is around security. I feel somehow that changing a GUC to trojan the system would be easier for a remote user to accomplish than having to replace a directory with a symlink. If they can change a PGC_POSTMASTER GUC, they already can easily enough do: shared_preload_libraries='/path/to/my/bad/so.so' that's already allowed. OK. Well, in that case, it seems we wouldn't be opening any new doors. So... our usual comma-separated GUC syntax? Empty means no extra places to search. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor inheritance/check bug: Inconsistent behavior
On Fri, Sep 20, 2013 at 3:41 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Fri, Sep 20, 2013 at 12:57 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Thu, Sep 19, 2013 at 10:28 PM, Robert Haas robertmh...@gmail.com wrote: Marking this as Ready for committer. Thanks a ton for reviewing the patch. I rewrote the comments for this patch and fixed the incorrect formatting in parse_relation.c. It used spaces instead of tabs, which is why if you look at the patch file you'll see that the alignment looked off. See new version, attached. Thanks, Attached version looks better. However, I have a few residual questions: 1. Does copy.c also need to be fixed? I see that it also calls ExecConstraints(), and I don't see it setting tableOid anywhere. We certainly want COPY and INSERT to behave the same way. I have missed it by confusing it with OID, as OID is set in COPY. I think along with COPY, it needs to set in ATRewriteTable() as well, because during rewrite also we check constraints. I will send an updated patch after point-2 is concluded. 2. Should constraints also allow access to the oid column? Right now you can do that but it doesn't seem to work, e.g.: rhaas=# create table foo (a int, check (oid::integer % 2 = 0)) with oids; CREATE TABLE I have tried various combinations, it is giving error at my end. postgres=# create table tbl_with_oid(c1 int) With OIDS; CREATE TABLE postgres=# alter table tbl_with_oid add check (oid 16403); ERROR: system column oid reference in check constraint is invalid postgres=# alter table tbl_with_oid add check (oid =0); ERROR: system column oid reference in check constraint is invalid postgres=# alter table tbl_with_oid add check (oid::integer %2 =0); ERROR: system column oid reference in check constraint is invalid postgres=# create table foo (a int, check (oid::integer % 2 = 0)) with oids; ERROR: system column oid reference in check constraint is invalid I am sorry, I just realized after pressing send button that you want to say the above point without considering above patch. Just prohibiting that might be fine; it doesn't seem that useful anyway. Currently only tableOid is allowed, other system columns should error out due to below check: + /* In constraint check, no system column is allowed except tableOid */ + if (pstate-p_expr_kind == EXPR_KIND_CHECK_CONSTRAINT + attnum InvalidAttrNumber attnum != TableOidAttributeNumber) But if we're setting the table OID, maybe we should set the OID too, and then just allow this. It can be done for OID as well. I don't have any strong reason for either doing or not doing it, if you think it can be of use then we can add it. One more thing, I think it will be better to update information in docs, probably in Create Table page where CHECK constraints are explained and where DDL constraints are explained at link: http://www.postgresql.org/docs/devel/static/ddl-constraints.html Yes, agreed. So, are you going to prepare a new version with documentation and addressing the other points mentioned? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator
* Kevin Grittner (kgri...@ymail.com) wrote: ... like just refreshing a view so that the new contents are the same as what you would see if you re-ran the query defining the matview. I've heard this notion a few times of wanting to make sure that what you get from running the query matches the matview. While that makes sense when the equality operator and what-you-see actually match, it doesn't when they don't because the what-you-see ends up being non-deterministic and can change based on the order the datums are seen in during the query processing which can change with different data ordering on disk and even due to simply different plans for the same data, I believe. Consider a GROUP BY with a citext column as one of the key fields. You're going to get whatever value the aggregate happened to come across first when building the HashAgg. Having the binary equality operator doesn't help that and seems like it could even result in change storms happening due to a different plan when the actual data didn't change. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Freezing without write I/O
On Thu, Sep 19, 2013 at 6:27 PM, Ants Aasma a...@cybertec.at wrote: I'm tackling similar issues in my patch. What I'm thinking is that we should change our existing supposedly atomic accesses to be more explicit and make the API compatible with C11 atomics[1]. For now I think the changes should be something like this: * Have separate typedefs for atomic variants of datatypes (I don't think we have a whole lot of them). With C11 atomics support, this would change to typedef _Atomic TransactionId AtomicTransactionId; What's the point of this? * Require that pg_atomic_init(type, var, val) be used to init atomic variables. Initially it would just pass through to assignment, as all supported platforms can do 32bit atomic ops. C11 compatible compilers will delegate to atomic_init(). I don't think this is a bad idea for decoration, but again I'm not sure how much fundamental value it adds. If it makes it easier for people to write code that works in C11 and fails on C89, we lose. * Create atomic read and write macros that look something like: #define pg_atomic_load(type, var) (*((volatile type*) (var))) and #define pg_atomic_store(type, var, val) do { \ *((volatile type*) (var)) = (val); \ } while(0) C11 would pass through to the compilers implementation with relaxed memory ordering. Same comment. * Rename pg_read_barrier()/pg_write_barrier()/pg_memory_barrier() to pg_acquire_fence()/pg_release_fence()/pg_acq_rel_fence(). Herb Sutter makes a convincing argument why loosening up the barrier semantics is the sane choice here. [2] C11 support can then pass though to atomic_thread_fence(). I am entirely unconvinced that we need this. Some people use acquire + release fences, some people use read + write fences, and there are all combinations (read-acquire, read-release, read-acquire-release, write-acquire, write-release, write-acquire-release, both-acquire, both-release, both-acquire-release). I think we're going to have enough trouble deciding between the primitives we already have, let alone with a whole mess of new ones. Mistakes will only manifest themselves on certain platforms and in many cases the races are so tight that actual failures are very unlikely to be reserved in regression testing. Personally, I think the biggest change that would help here is to mandate that spinlock operations serve as compiler fences. That would eliminate the need for scads of volatile references all over the place. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Where to load modules from?
Robert Haas robertmh...@gmail.com writes: So... our usual comma-separated GUC syntax? Empty means no extra places to search. Sounds pretty good to me. The only advantage of using an initdb place would have been the opportunity to actually register modules and WAL log that step so that the standby pg_modules directory gets filled automatically. I realise that might be another discussion and patch entirely. I'll prepare a patch using GUCs just doing the bare minimum for now, allowing to load modules from GUC directed places. Regards, -- Dimitri Fontaine06 63 07 10 78 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
Steve Singer st...@ssinger.info wrote: On 09/12/2013 06:27 PM, Kevin Grittner wrote: Attached is a patch for a bit of infrastructure I believe to be necessary for correct behavior of REFRESH MATERIALIZED VIEW CONCURRENTLY as well as incremental maintenance of matviews. Here is attempt at a review of the v1 patch. Thanks for looking at it. There has been a heated discussion on list about if we even want this type of operator. It's not a question of whether we want to allow operators which define comparisons between objects in a non-standard way. We already have 11 such cases; this would add one more to make 12. In all cases there are different operators for making a comparison that return potentially different results from the default operators, to support uses that are specific to that type. I think there is agreement that better (as in more obscure) operators than === and !== need to be picked and we need to find a place in the user-docs to warn users of the behaviour of this operators. Hannu has proposed *==* binary equal, surely very equal by any other definition as wall !==? maybe not equal -- binary inequal, may still be equal by other comparison methods and no one has yet objected to this. I do. The issue is that there are a great many places that there are multiple definitions of equality. We generally try to use something which tries to convey something about the nature of the operation, since the fact that it can have different results is a given. There is nothing in those operators that says binary image comparison. I thought about prepending a hash character in front of normal operators, because that somehow suggests binary operations to my eye (although I have no idea whether it does so for anyone else), but those operators are already used for an alternative set of comparisons for intervals, with a different meaning. I'm still trying to come up with something I really like. My vote would be to update the patch to use those operator names and then figure out where we can document them. It it means we have to section describing the equal operator for records as well then maybe we should do that so we can get on with things. Code Review The record_image_eq and record_image_cmp functions are VERY similar. It seems to me that you could have the meet of these functions into a common function (that isn't exposed via SQL) that then behaves differently in 2 or 3 spots based on a parameter that controls if it detoasts the tuple for the compare or returns false on the equals. Did you look at the record_eq and record_cmp functions which exist before this patch? Is there a reason we should do it one way for the default operators and another way for this alternative? Do you think we should change record_eq and record_cmp to do things the way you suggest? Beyond that I don't see any issues with the code. You call out a question about two records being compared have a different number of columns should it always be an error, or only an error when they match on the columns they do have in common. The current behaviour is select * FROM r1,r2 where r1==r2; a | b | a | b | c ---+---+---+---+--- 1 | 1 | 1 | 2 | 1 (1 row) update r2 set b=1; UPDATE 1 test=# select * FROM r1,r2 where r2==r1; ERROR: cannot compare record types with different numbers of columns This seems like a bad idea to me. I don't like that I get a type comparison error only sometimes based on the values of the data in the column. If I'm a user testing code that compares two records with this operator and I test my query with 1 value pair then and it works then I'd naively expect to get a true or false on all other value sets of the same record type. Again, this is a result of following the precedent set by the existing record comparison operators. test=# create table r1 (c1 int, c2 int); CREATE TABLE test=# create table r2 (c1 int, c2 int, c3 int); CREATE TABLE test=# insert into r1 values (1, 2); INSERT 0 1 test=# insert into r2 values (3, 4, 5); INSERT 0 1 test=# select * from r1 join r2 on r1 r2; c1 | c2 | c1 | c2 | c3 ++++ 1 | 2 | 3 | 4 | 5 (1 row) test=# update r1 set c1 = 3, c2 = 4; UPDATE 1 test=# select * from r1 join r2 on r1 r2; ERROR: cannot compare record types with different numbers of columns Would be in favor of forcing a change to the record comparison operators which have been in use for years? If not, is there a good reason to have them behave differently in this regard? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freezing without write I/O
Hi, I agree with most of what you said - I think that's a littlebit too much change for too little benefit. On 2013-09-20 08:32:29 -0400, Robert Haas wrote: Personally, I think the biggest change that would help here is to mandate that spinlock operations serve as compiler fences. That would eliminate the need for scads of volatile references all over the place. The effectively already do, don't they? It's an external, no-inlineable function call (s_lock, not the actual TAS). And even if it were to get inlined via LTO optimization, the inline assembler we're using is doing the __asm__ __volatile__ (..., memory) dance. That's a full compiler barrier. The non-asm implementations call to OS/compiler primitives that are also fences. In the case I brougth up here there is no spinlock or something similar. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Where to load modules from?
On 2013-09-20 14:35:31 +0200, Dimitri Fontaine wrote: Robert Haas robertmh...@gmail.com writes: So... our usual comma-separated GUC syntax? Empty means no extra places to search. +1. The only advantage of using an initdb place would have been the opportunity to actually register modules and WAL log that step so that the standby pg_modules directory gets filled automatically. -many I realise that might be another discussion and patch entirely. Yes. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freezing without write I/O
On Fri, Sep 20, 2013 at 8:40 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-09-20 08:32:29 -0400, Robert Haas wrote: Personally, I think the biggest change that would help here is to mandate that spinlock operations serve as compiler fences. That would eliminate the need for scads of volatile references all over the place. The effectively already do, don't they? It's an external, no-inlineable function call (s_lock, not the actual TAS). And even if it were to get inlined via LTO optimization, the inline assembler we're using is doing the __asm__ __volatile__ (..., memory) dance. That's a full compiler barrier. The non-asm implementations call to OS/compiler primitives that are also fences. In the case I brougth up here there is no spinlock or something similar. Well, that doesn't match my previous discussions with Tom Lane, or this comment: * Another caution for users of these macros is that it is the caller's * responsibility to ensure that the compiler doesn't re-order accesses * to shared memory to precede the actual lock acquisition, or follow the * lock release. Typically we handle this by using volatile-qualified * pointers to refer to both the spinlock itself and the shared data * structure being accessed within the spinlocked critical section. * That fixes it because compilers are not allowed to re-order accesses * to volatile objects relative to other such accesses. That's not to disagree with your point. If TAS is a compiler barrier, then we oughta be OK. Unless something can migrate into the spot between a failed TAS and the call to s_lock? But I'm pretty sure that we've repeatedly had to change code to keep things from falling over in this area, see e.g. commits fa72121594534dda6cc010f0bf6b3e8d22987526, 07eeb9d109c057854b20707562ce517efa591761, d3fc362ec2ce1cf095950dddf24061915f836c22, and 584f818bef68450d23d1b75afbaf19febe38fd37 (the last apparently a live bug). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assertions in PL/PgSQL
2013/9/20 Robert Haas robertmh...@gmail.com On Fri, Sep 20, 2013 at 6:24 AM, Marko Tiikkaja ma...@joh.to wrote: The issue of how this is spelled is somewhat secondary for me. I think ASSERT is probably as good as anything. But let's not kid ourselves: even reserving this word only in PL/pgsql WILL break things for some users, and there are LOTS of people out there with LOTS of procedural code. Every tiny backward-incompatibility reduces by just a little bit the percentage of those people who can upgrade and increases the delay before they do. This is an area where past experience has made me quite wary. The thing is, what this means is that to add a new feature to the language, you have to make the syntax so damn ugly that no one wants to use it (see row_count, for example) or you will break some poor user's function. And now we got all this cool functionality which nobody wants to use, and the language itself actually gets progressively worse. All this is starting to sound like it's already too late to make PL/PgSQL better, and we should just start afresh. To some extent I agree that PL/pgsql is hopeless. I think there are some things we can do to improve it, but most of what gets proposed at least in this forum strikes me as tinkering around the edges, and it can't make up for fundamentally bad language design decisions. Part of the problem, of course, is that most programming languages don't get re-released every year. It's not that it would be OK for C to suddenly reserve a bunch of new keywords; it's that they don't try. And when they do release no language versions (like C99) some people (like us) don't adopt them, for fear of being unable to run our code on older systems. Such considerations apply with equal force to PL/pgsql, but it gets a new release every year rather than every decade, so the problems are magnified. The other part of the problem is that the language isn't designed from the beginning to be extensible. In Perl, for example, they chose to mark variables with a leading $, @, or % and functions with a leading . That last marking has largely fallen into desuetude, but the point is that - to the extent that you do have and use such markers - you can add new keywords without breaking anything. Some languages can also distinguish keywords positionally; for example, ABORT doesn't need to be reserved in PostgreSQL's SQL dialect because it can only appear as a command at the beginning of a line, and it can't be a column, type, or function name in that position. Such an approach might even work ASSERT in PL/pgsql, if there's a clean way to disambiguate vs. the assignment syntax. But even if we can make that work, we're going to continue to face this problem with each new language extension. PL/pgSQL had a ADA completeness, uniformity and beauty newer. But it is not too bad, and one new specialized statement doesn't kill us. A proposed functionality is often used and we have not tools (macros) how to implement it simply. we support a conditions for few statement - so enhancing RAISE statement is possible so some form of RAISE EXCEPTION WHEN NOT FOUND AND use_assrts USING message = 'there are no some'; but this form is relative long and less readable (can be difficult find asserts in code and separate it from custom exceptions). I am fully for some variant of ASSERT statement. The benefit is higher than cost. ASSERT keyword is simply, readable - and I can accept it, if we found a syntax for complete functionality (although I prefer a PRAGMA introduction). Regards Pavel -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Assertions in PL/PgSQL
On 09/20/2013 01:59 PM, Robert Haas wrote: The other part of the problem is that the language isn't designed from the beginning to be extensible. In Perl, for example, they chose to mark variables with a leading $, @, or % and functions with a leading . That last marking has largely fallen into desuetude, but the point is that - to the extent that you do have and use such markers - you can add new keywords without breaking anything. Some languages can also distinguish keywords positionally; for example, ABORT doesn't need to be reserved in PostgreSQL's SQL dialect because it can only appear as a command at the beginning of a line, and it can't be a column, type, or function name in that position. Such an approach might even work ASSERT in PL/pgsql, if there's a clean way to disambiguate vs. the assignment syntax. But even if we can make that work, we're going to continue to face this problem with each new language extension. Perhaps we could use the pragma approach here and add some types of new functionality in omments --#ASSERT . or even --#pragma ASSERT . It is still not guaranteed to be 100% compatible, but at least changing comments should be relatively safe way for fixing your functions And you could have another pragma to disable some pragmas which you could SET in GUC (global, session or per function) for extra ugliness ;) -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Fri, Sep 20, 2013 at 3:40 PM, Sameer Thakur samthaku...@gmail.comwrote: Attached patch combines documentation patch and source-code patch. I have had a stab at reviewing the documentation. Have a look. Thanks. Attached patch implements suggestions in documentation. But comments from Fujii-san still needs to be implemented . We will implement them soon. synchronous_transfer_v9.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Where to load modules from?
Robert Haas escribió: On Sun, Sep 15, 2013 at 10:51 AM, Peter Eisentraut pete...@gmx.net wrote: On Sun, 2013-09-15 at 16:09 +0200, Dimitri Fontaine wrote: Peter Eisentraut pete...@gmx.net writes: It shouldn't be in the commit fest if it has no patch. What should I do if my goal is to get community consensus on the best way to solve a problem, and want to start the discussion with some proposals? Post it to the pgsql-hackers list. The idea of using the CommitFest process to request design review was floated at one of the last couple of developer meetings in Ottawa. Personally, I'm for it. I did it for minmax indexes on CF1 and nobody complained. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor inheritance/check bug: Inconsistent behavior
On Fri, Sep 20, 2013 at 5:53 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Sep 20, 2013 at 3:41 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Fri, Sep 20, 2013 at 12:57 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Thu, Sep 19, 2013 at 10:28 PM, Robert Haas robertmh...@gmail.com wrote: Marking this as Ready for committer. Thanks a ton for reviewing the patch. I rewrote the comments for this patch and fixed the incorrect formatting in parse_relation.c. It used spaces instead of tabs, which is why if you look at the patch file you'll see that the alignment looked off. See new version, attached. Thanks, Attached version looks better. However, I have a few residual questions: 1. Does copy.c also need to be fixed? I see that it also calls ExecConstraints(), and I don't see it setting tableOid anywhere. We certainly want COPY and INSERT to behave the same way. I have missed it by confusing it with OID, as OID is set in COPY. I think along with COPY, it needs to set in ATRewriteTable() as well, because during rewrite also we check constraints. I will send an updated patch after point-2 is concluded. 2. Should constraints also allow access to the oid column? Right now you can do that but it doesn't seem to work, e.g.: rhaas=# create table foo (a int, check (oid::integer % 2 = 0)) with oids; CREATE TABLE I have tried various combinations, it is giving error at my end. postgres=# create table tbl_with_oid(c1 int) With OIDS; CREATE TABLE postgres=# alter table tbl_with_oid add check (oid 16403); ERROR: system column oid reference in check constraint is invalid postgres=# alter table tbl_with_oid add check (oid =0); ERROR: system column oid reference in check constraint is invalid postgres=# alter table tbl_with_oid add check (oid::integer %2 =0); ERROR: system column oid reference in check constraint is invalid postgres=# create table foo (a int, check (oid::integer % 2 = 0)) with oids; ERROR: system column oid reference in check constraint is invalid I am sorry, I just realized after pressing send button that you want to say the above point without considering above patch. Just prohibiting that might be fine; it doesn't seem that useful anyway. Currently only tableOid is allowed, other system columns should error out due to below check: + /* In constraint check, no system column is allowed except tableOid */ + if (pstate-p_expr_kind == EXPR_KIND_CHECK_CONSTRAINT + attnum InvalidAttrNumber attnum != TableOidAttributeNumber) But if we're setting the table OID, maybe we should set the OID too, and then just allow this. It can be done for OID as well. I don't have any strong reason for either doing or not doing it, if you think it can be of use then we can add it. One more thing, I think it will be better to update information in docs, probably in Create Table page where CHECK constraints are explained and where DDL constraints are explained at link: http://www.postgresql.org/docs/devel/static/ddl-constraints.html Yes, agreed. So, are you going to prepare a new version with documentation and addressing the other points mentioned? Handling for OID is not clear, shall we allow it or not in check constraint? In my current patch OID will not be allowed in check constraint. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
Steve, Thanks for providing a summary. * Steve Singer (st...@ssinger.info) wrote: The arguments for this patch are * We want the materialized view to return the same value as would be returned if the query were executed directly. This not only means that the values should be the same according to a datatypes = operator but that they should appear the same 'to the eyeball'. With the cases where the equality operator doesn't match the 'to the eyeball' appearance, this really seems to be a pipe dream to me, unless we *also* define an ordering or similar which would then change the existing semantics for end users (which might be reasonable), but I'm not sure how we could do that without input from the type- iow, I don't think we can say well, we'll order by byte representation. It's also going to possibly be a performance hit since we're going to have to do both an equality op call during a HashAgg/HashJoin/whatever and have to do a which one is bigger of these two equal things, and then I wonder if we'd need to allow users to somehow specify I don't want the bigger of the equal things, I want the smaller, in this query for this equality check. I'd really like to see how we're going to provide for that when the user is doing a GROUP BY without breaking something else or causing problems with later SQL spec revisions. * Supporting the materialized view refresh check via SQL makes a lot of sense but doing that requires exposing something via SQL ... which we don't currently expose for the queries that we already support users running today. Users seem to generally be accepting of that too, even though what they end up with in the output isn't necessairly consistent from query to query. The issue here is that we're trying to make the mat view look like what the query would do when run at the same time, which is a bit of a losing battle, imv. * If we adequately document what we mean by record_image_identical and the operator we pick for this then users shouldn't be surprised at what they get if they use this That's a bit over-simplistic, really. We do try to keep to the POLA (principle of least astonishment) and that's not going to be easy here. I think there is agreement that better (as in more obscure) operators than === and !== need to be picked and we need to find a place in the user-docs to warn users of the behaviour of this operators. Hannu has proposed I'm a bit on the fence about this, after having discussed my concerns with Robert at PostgresOpen. If we're going to expose and support these at the SQL level, and we can figure out some semantics and consistency for using them that isn't totally baffling, then perhaps having them as simple and clear operator names would be reasonable. One concern here is if the SQL spec might decide some day that '===' is a useful operator for something else (perhaps even they look the same when cast to text). *==* binary equal, surely very equal by any other definition as wall !==? maybe not equal -- binary inequal, may still be equal by other comparison methods Those look alright to me also though, but we'd need to work out the other operation names, right? And then figure out if we want to use those other operators (less-than, greater-than, etc) when doing equality operations to figure out which equal value to return to the user. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Minor inheritance/check bug: Inconsistent behavior
On Fri, Sep 20, 2013 at 9:43 AM, Amit Kapila amit.kapil...@gmail.com wrote: Handling for OID is not clear, shall we allow it or not in check constraint? In my current patch OID will not be allowed in check constraint. I vote for allowing it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator
Stephen Frost sfr...@snowman.net wrote: * Kevin Grittner (kgri...@ymail.com) wrote: ... like just refreshing a view so that the new contents are the same as what you would see if you re-ran the query defining the matview. I've heard this notion a few times of wanting to make sure that what you get from running the query matches the matview. While that makes sense when the equality operator and what-you-see actually match, it doesn't when they don't because the what-you-see ends up being non-deterministic and can change based on the order the datums are seen in during the query processing which can change with different data ordering on disk and even due to simply different plans for the same data, I believe. That's a fair point to some extent. Where notions of equality differ, it is not always non-deterministic, but it can be. For citext you are correct. For a sum() of numeric data, the number of decimal positions will be the largest value seen; the value present in the query results will not vary by order of rows scanned or by plan. The result of this is that with the patch, an incremental refresh of a matview will always match what the query returned at the time it was run (there is no *correctness* problem) but if someone uses a query with non-deterministic results they may have a lot of activity on a concurrent refresh even if there was no change to the underlying data -- so you could have a *performance* penalty in cases where the query returns something different, compared to leaving the old equal but not the same results. Consider a GROUP BY with a citext column as one of the key fields. You're going to get whatever value the aggregate happened to come across first when building the HashAgg. Having the binary equality operator doesn't help that and seems like it could even result in change storms happening due to a different plan when the actual data didn't change. Yup. A person who wants to GROUP BY a citext value for a matview might want to fold it to a consistent capitalization to avoid that issue. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor inheritance/check bug: Inconsistent behavior
On Fri, Sep 20, 2013 at 7:17 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Sep 20, 2013 at 9:43 AM, Amit Kapila amit.kapil...@gmail.com wrote: Handling for OID is not clear, shall we allow it or not in check constraint? In my current patch OID will not be allowed in check constraint. I vote for allowing it. Okay, I shall send the updated patch by tomorrow. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
Stephen Frost sfr...@snowman.net wrote: The issue here is that we're trying to make the mat view look like what the query would do when run at the same time, which is a bit of a losing battle, imv. If we're not doing that, I don't see the point of matviews. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
On Friday, September 20, 2013, Kevin Grittner wrote: Stephen Frost sfr...@snowman.net javascript:; wrote: The issue here is that we're trying to make the mat view look like what the query would do when run at the same time, which is a bit of a losing battle, imv. If we're not doing that, I don't see the point of matviews. When taken out of context, I can see how that might not come across correctly. I was merely pointing out that it's a losing battle in the context of types which have equality operators which claim equalness but cast to text with results that don't match there. Thanks, Stephen
Re: [HACKERS] SSI freezing bug
Andres Freund and...@2ndquadrant.com wrote: On 2013-09-20 13:55:36 +0300, Heikki Linnakangas wrote: When a tuple is predicate-locked, the key of the lock is ctid+xmin. However, when a tuple is frozen, its xmin is changed to FrozenXid. That effectively invalidates any predicate lock on the tuple, as checking for a lock on the same tuple later won't find it as the xmin is different. Attached is an isolationtester spec to demonstrate this. Do you have any idea to fix that besides keeping the xmin horizon below the lowest of the xids that are predicate locked? Which seems nasty to compute and is probably not trivial to fit into the procarray.c machinery? A better solution probably is to promote tuple-level locks if they exist to a relation level one upon freezing I guess? That would work. A couple other ideas would be to use the oldest serializable xmin which is being calculated in predicate.c to either prevent freezing of newer transaction or to summarize predicate locks (using the existing SLRU-based summarization functions) which use xmin values for xids which are being frozen. The transactions must already be committed, and so are eligible for summarization. I'm not sure which is best. Will review, probably this weekend. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql tab completion for updatable foreign tables
On 20 September 2013 11:29, Samrat Revagade revagade.sam...@gmail.com wrote: Okay, are you adding this to the september commitfest? OK, I've done that. I think that it's too late for 9.3. +1 for idea. I have tested patch and got surprising results with Cent-OS Patch is working fine for Cent-OS 6.2 and RHEL 6.3 But is is giving problem on Cent-OS 6.3 (tab complete for local tables but not for foreign tables) I have used following script: Two postgres servers are running by using ports 5432 and 5433. Server with port 5432 has postgres_fdw installed and will interact with the remote server running under port 5433. psql -p 5433 -c CREATE TABLE aa_remote (a int, b int) postgres postgres=# CREATE EXTENSION postgres_fdw; postgres=# CREATE SERVER postgres_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost', port '5433', dbname 'postgres'); postgres=# CREATE USER MAPPING FOR PUBLIC SERVER postgres_server OPTIONS (password ''); postgres=# CREATE FOREIGN TABLE aa_foreign (a int, b int) SERVER postgres_server OPTIONS (table_name 'aa_remote'); postgres=# INSERT into aa_foreign values (1,2); But while doing any operation on aa_foreign tab do not complete on Cent-OS 6.3 (tab complete for local tables but not for foreign tables) Is that a problem ? Hmm. It works for me. What does pg_relation_is_updatable() return for your foreign table? Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
Stephen Frost sfr...@snowman.net wrote: On Friday, September 20, 2013, Kevin Grittner wrote: Stephen Frost sfr...@snowman.net wrote: The issue here is that we're trying to make the mat view look like what the query would do when run at the same time, which is a bit of a losing battle, imv. If we're not doing that, I don't see the point of matviews. When taken out of context, I can see how that might not come across correctly. I was merely pointing out that it's a losing battle in the context of types which have equality operators which claim equalness but cast to text with results that don't match there. I think the patch I've submitted wins that battle. The only oddity is that if someone uses a query for a matview which can provide results with rows which are equal to previous result rows under the default record opclass but different under this patch's opclass, RMVC could update to the latest query results when someone thinks that might not be necessary. Workarounds would include using a query with deterministic results (like using the upper() or lower() function on a grouped citext column in the result set) or not using the CONCURRENTLY option. Neither seems too onerous. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freezing without write I/O
On 2013-09-20 08:54:26 -0400, Robert Haas wrote: On Fri, Sep 20, 2013 at 8:40 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-09-20 08:32:29 -0400, Robert Haas wrote: Personally, I think the biggest change that would help here is to mandate that spinlock operations serve as compiler fences. That would eliminate the need for scads of volatile references all over the place. The effectively already do, don't they? It's an external, no-inlineable function call (s_lock, not the actual TAS). And even if it were to get inlined via LTO optimization, the inline assembler we're using is doing the __asm__ __volatile__ (..., memory) dance. That's a full compiler barrier. The non-asm implementations call to OS/compiler primitives that are also fences. In the case I brougth up here there is no spinlock or something similar. Well, that doesn't match my previous discussions with Tom Lane, or this comment: * Another caution for users of these macros is that it is the caller's * responsibility to ensure that the compiler doesn't re-order accesses * to shared memory to precede the actual lock acquisition, or follow the * lock release. Typically we handle this by using volatile-qualified * pointers to refer to both the spinlock itself and the shared data * structure being accessed within the spinlocked critical section. * That fixes it because compilers are not allowed to re-order accesses * to volatile objects relative to other such accesses. To me that doesn't really make much sense to be honest.Note that the next paragraph tells us that * On platforms with weak memory ordering, the TAS(), TAS_SPIN(), and * S_UNLOCK() macros must further include hardware-level memory fence * instructions to prevent similar re-ordering at the hardware level. * TAS() and TAS_SPIN() must guarantee that loads and stores issued after * the macro are not executed until the lock has been obtained. Conversely, * S_UNLOCK() must guarantee that loads and stores issued before the macro * have been executed before the lock is released. so, TAS has to work as a memory barrier if the architecture doesn't have strong cache coherency guarantees itself. But memory barriers have to be compiler barriers? That's not to disagree with your point. If TAS is a compiler barrier, then we oughta be OK. Unless something can migrate into the spot between a failed TAS and the call to s_lock? We're talking compiler barriers right. In that case failure or success do not play a role, or am I missing something? But I'm pretty sure that we've repeatedly had to change code to keep things from falling over in this area, see e.g. commits 584f818bef68450d23d1b75afbaf19febe38fd37 (the last apparently a live bug). d3fc362ec2ce1cf095950dddf24061915f836c22, and I've quickly checked those out, and things looked mighty different back then. And incidentally several of the inline assembly implementations back then didn't specify that they clobber memory (which is what makes it a compiler barrier). fa72121594534dda6cc010f0bf6b3e8d22987526, Not sure here. Several of the inline assembly bits where changed to clobber memory, but not all. 07eeb9d109c057854b20707562ce517efa591761, Hm. Those mostly look to be overly cautios to me. I think we should go through the various implementations and make sure they are actual compiler barriers and then change the documented policy. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator
* Kevin Grittner (kgri...@ymail.com) wrote: The result of this is that with the patch, an incremental refresh of a matview will always match what the query returned at the time it was run (there is no *correctness* problem) but if someone uses a query with non-deterministic results they may have a lot of activity on a concurrent refresh even if there was no change to the underlying data -- so you could have a *performance* penalty in cases where the query returns something different, compared to leaving the old equal but not the same results. You mean 'at the time of the incremental refresh', right? Yet that may or may not match running that query directly by an end-user as the plan that a user might get for the entire query could be different than what is run for an incremental update, or due to statistics updates, etc. Consider a GROUP BY with a citext column as one of the key fields. You're going to get whatever value the aggregate happened to come across first when building the HashAgg. Having the binary equality operator doesn't help that and seems like it could even result in change storms happening due to a different plan when the actual data didn't change. Yup. A person who wants to GROUP BY a citext value for a matview might want to fold it to a consistent capitalization to avoid that issue. I'm trying to figure out why that's a perfectly acceptable solution for users running views with GROUP BYs, but apparently it isn't sufficient for mat views? In other words, you would suggest telling users sorry, you can't rely on the value returned by a GROUP BY on that citext column using a normal view, but we're going to try and do better for mat views. I don't intend the above to imply that we should never update values in mat views when we can do so in a reliable way to ensure that the value matches what a view would return. This matches our notion of UPDATE, where we will still UPDATE a value even if the old value and the new value are equal according to the type's equality operator, when the conditional for the UPDATE is using a reliable type (eg: integer). Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] record identical operator
On 2013-09-20 10:51:46 -0400, Stephen Frost wrote: I'm trying to figure out why that's a perfectly acceptable solution for users running views with GROUP BYs, but apparently it isn't sufficient for mat views? Err, because users wrote a GROUP BY? They haven't (neccessarily) in the cases of the matviews we're talking about? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator
Stephen Frost sfr...@snowman.net wrote: * Kevin Grittner (kgri...@ymail.com) wrote: The result of this is that with the patch, an incremental refresh of a matview will always match what the query returned at the time it was run (there is no *correctness* problem) but if someone uses a query with non-deterministic results they may have a lot of activity on a concurrent refresh even if there was no change to the underlying data -- so you could have a *performance* penalty in cases where the query returns something different, compared to leaving the old equal but not the same results. You mean 'at the time of the incremental refresh', right? Yet that may or may not match running that query directly by an end-user as the plan that a user might get for the entire query could be different than what is run for an incremental update, or due to statistics updates, etc. I'm confused. The refresh *does* run the query. Sure, if the query is run again it could return different results. I'm missing the point here. Consider a GROUP BY with a citext column as one of the key fields. You're going to get whatever value the aggregate happened to come across first when building the HashAgg. Having the binary equality operator doesn't help that and seems like it could even result in change storms happening due to a different plan when the actual data didn't change. Yup. A person who wants to GROUP BY a citext value for a matview might want to fold it to a consistent capitalization to avoid that issue. I'm trying to figure out why that's a perfectly acceptable solution for users running views with GROUP BYs, but apparently it isn't sufficient for mat views? In other words, you would suggest telling users sorry, you can't rely on the value returned by a GROUP BY on that citext column using a normal view, but we're going to try and do better for mat views. Again, I'm lost. If they don't do something to make the result deterministic, it could be different on each run of the VIEW, and on each REFRESH of the matview. I don't see why that is an argument for trying to suppress the effort needed make the matview match the latest run of the query. I don't intend the above to imply that we should never update values in mat views when we can do so in a reliable way to ensure that the value matches what a view would return. This matches our notion of UPDATE, where we will still UPDATE a value even if the old value and the new value are equal according to the type's equality operator, when the conditional for the UPDATE is using a reliable type (eg: integer). Well, we provide a trigger function to suppress the UPDATE operation if the old and new values are identical -- in terms of what is stored. We do not attempt to use the default btree equals operator to suppress updates to different values in some circumstances. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator
* Andres Freund (and...@2ndquadrant.com) wrote: On 2013-09-20 10:51:46 -0400, Stephen Frost wrote: I'm trying to figure out why that's a perfectly acceptable solution for users running views with GROUP BYs, but apparently it isn't sufficient for mat views? Err, because users wrote a GROUP BY? They haven't (neccessarily) in the cases of the matviews we're talking about? Sure; my thinking was going back to what Hannu had suggested where we have a mechanism to see if the value was updated (using xmin or similar) and then update it in the mat view in that case, without actually doing a comparison at all. That wouldn't necessairly work for the GROUP BY case, but that situation doesn't work reliably today anyway. If we solve the GROUP BY case in SQL for these types then we could use the same for mat views, but we've been happy enough to ignore the issue thus far. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Freezing without write I/O
Andres Freund escribió: Hi, On 2013-09-19 14:42:19 +0300, Heikki Linnakangas wrote: On 18.09.2013 16:22, Andres Freund wrote: * Why can we do a GetOldestXmin(allDbs = false) in BeginXidLSNRangeSwitch()? Looks like a bug. I think I got the arguments backwards, was supposed to be allDbs = true and ignoreVacuum = false. I'm not sure if vacuums could be ignored, but 'false' is the safe option. Not sure either... The ignoreVacuum flag specifies to ignore backends running non-full VACUUM, that is, processes that are known never to generate new Xids, never obtain Xmin, and never to insert tuples anywhere. With all these restrictions, I think it's safe to use ignoreVacuum=true here. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator
On 2013-09-20 11:05:06 -0400, Stephen Frost wrote: * Andres Freund (and...@2ndquadrant.com) wrote: On 2013-09-20 10:51:46 -0400, Stephen Frost wrote: I'm trying to figure out why that's a perfectly acceptable solution for users running views with GROUP BYs, but apparently it isn't sufficient for mat views? Err, because users wrote a GROUP BY? They haven't (neccessarily) in the cases of the matviews we're talking about? Sure; my thinking was going back to what Hannu had suggested where we have a mechanism to see if the value was updated (using xmin or similar) and then update it in the mat view in that case, without actually doing a comparison at all. VACUUM, HOT pruning. Have fun. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator
* Kevin Grittner (kgri...@ymail.com) wrote: Stephen Frost sfr...@snowman.net wrote: You mean 'at the time of the incremental refresh', right? Yet that may or may not match running that query directly by an end-user as the plan that a user might get for the entire query could be different than what is run for an incremental update, or due to statistics updates, etc. I'm confused. The refresh *does* run the query. Sure, if the query is run again it could return different results. I'm missing the point here. Perhaps I'm assuming things are farther along than they are.. I was assumed that 'incremental mat view' updates were actuallly 'partial'- iow, that it keeps track of the rows in the mat view which are impacted by the set of rows in the base relations and then runs specific queries to pull out and operate on the base-rel set-of-rows to update the specific mat-view-rows. If it's running the whole query again then it's certainly more likely to get the same results that the user did, but it's not a guarantee and that's only a happy coincidence while we're still doing the whole query approach (which I hope we'd eventually progress beyond..). I'm trying to figure out why that's a perfectly acceptable solution for users running views with GROUP BYs, but apparently it isn't sufficient for mat views? In other words, you would suggest telling users sorry, you can't rely on the value returned by a GROUP BY on that citext column using a normal view, but we're going to try and do better for mat views. Again, I'm lost. Perhaps my reply to Andres will help clear that up? If they don't do something to make the result deterministic, it could be different on each run of the VIEW, and on each REFRESH of the matview. I don't see why that is an argument for trying to suppress the effort needed make the matview match the latest run of the query. Is this really just run the new query and look if the old and new row are different wrt 'incremental' view updates? If so, I think we're trying to design something here that we're going to totally break very shortly when we move beyond that kind of sledgehammer approach to incremental mat view updates and I'd rather we figure out a solution which will work beyond that.. I don't intend the above to imply that we should never update values in mat views when we can do so in a reliable way to ensure that the value matches what a view would return. This matches our notion of UPDATE, where we will still UPDATE a value even if the old value and the new value are equal according to the type's equality operator, when the conditional for the UPDATE is using a reliable type (eg: integer). Well, we provide a trigger function to suppress the UPDATE operation if the old and new values are identical -- in terms of what is stored. We do not attempt to use the default btree equals operator to suppress updates to different values in some circumstances. This feels like we're trying to figure out how to create a key for a whole row based on the binary representation of that row and then use that to check if we should update a given row or not. This seems a bit radical, but perhaps that's what we should just do then, rather than invent binary equivilance operators for what-things-look-like-now for this- extract the row columns out using their binary _send functions and then hash the result to build a key that you can use. At least then we're using a documented (well, we could probably improve on this, but still) and stable representation of the data that at least some of our users already deal with. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [PERFORM] encouraging index-only scans
On Thu, Sep 19, 2013 at 6:59 PM, Andres Freund and...@2ndquadrant.com wrote: The reason I suggested keeping track of the xids of unremovable tuples is that the current logic doesn't handle that at all. We just unconditionally set n_dead_tuples to zero after a vacuum even if not a single row could actually be cleaned out. Which has the effect that we will not start a vacuum until enough bloat (or after changing this, new inserts) has collected to start vacuum anew. Which then will do twice the work. Resetting n_dead_tuples to the actual remaining dead tuples wouldn't do much good either - we would just immediately trigger a new vacuum the next time we check, even if the xmin horizon is still the same. One idea would be to store the xmin we used for the vacuum somewhere. Could we make that part of the pgstats infrastructure? Or store it in a new pg_class column? Then we could avoid re-triggering until it advances. Or, maybe better, we could remember the oldest XID that we weren't able to remove due to xmin considerations and re-trigger when the horizon passes that point. However, I do have one concern: it might lead to excessive index-vacuuming. Right now, we skip the index vac step only if there ZERO dead tuples are found during the heap scan. Even one dead tuple (or line pointer) will cause an index vac cycle, which may easily be excessive. So I further propose that we introduce a threshold for index-vac; so that we only do index vac cycle if the number of dead tuples exceeds, say 0.1% of the table size. Yes, that's a pretty valid concern. But we can't really do it that easily. a) We can only remove dead line pointers when we know there's no index pointing to it anymore. Which we only know after the index has been removed. b) We cannot check the validity of an index pointer if there's no heap tuple for it. Sure, we could check whether we're pointing to a dead line pointer, but the random io costs of that are prohibitive. Now, we could just mark line pointers as dead and not mark that page as all-visible and pick it up again on the next vacuum cycle. But that would suck long-term. I think the only real solution here is to store removed tuples tids (i.e. items where we've marked as dead) somewhere. Whenever we've found sufficient tuples to-be-removed from indexes we do phase 2. I don't really agree with that. Yes, we could make that change, and yes, it might be better than what we're doing today, but it would be complex and have its own costs. And it doesn't mean that lesser steps are without merit. A vacuum pass over the heap buys us a LOT of space for reuse even without touching the indexes: we don't reclaim the line pointers, but we do reclaim the space for the tuples themselves, which is a big deal. So being able to do that more frequently without causing problems has a lot of value, I think. The fact that we get to set all-visible bits along the way makes future vacuums cheaper, and makes index scans work better, so that's good too. And the first vacuum to find a dead tuple will dirty the page to truncate it to a dead line pointer, while any subsequent revisits prior to the index vac cycle will only examine the page without dirtying it. All in all, just leaving the page to be caught be a future vacuum doesn't seem that bad to me, at least for a first cut. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator
* Andres Freund (and...@2ndquadrant.com) wrote: On 2013-09-20 11:05:06 -0400, Stephen Frost wrote: Sure; my thinking was going back to what Hannu had suggested where we have a mechanism to see if the value was updated (using xmin or similar) and then update it in the mat view in that case, without actually doing a comparison at all. VACUUM, HOT pruning. Have fun. Yea, clearly oversimplified, but I do expect that we're going to reach a point where we're looking at the rows being updated in the base rels and which rows they map to in the view and then marking those rows as needing to be updated. That whole mechanism doesn't depend on this are-they-binary-equal approach and is what I had anticipated as the path we'd be going down in the future. The above is also what I recall had been discussed at the hackers meeting, along with some ideas/papers about how to specifically implement partial updates, hence my assumption that was what we were talking about.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Assertions in PL/PgSQL
Pavel Stehule escribió: PL/pgSQL had a ADA completeness, uniformity and beauty newer. But it is not too bad, and one new specialized statement doesn't kill us. A proposed functionality is often used and we have not tools (macros) how to implement it simply. we support a conditions for few statement - so enhancing RAISE statement is possible Extending RAISE is one option. Another option is to decorate BEGIN and END with an assertion option; and the assertion would be checked when the block is entered (in BEGIN) or finished (in END). BEGIN ASSERT (a = 1) WITH (name = a_is_one) a := a + 1; END; BEGIN ASSERT (a 0) a := a + 1; END ASSERT (a = 2) WITH (name = a_is_two); This would play nice with loops too, where the assertion is checked on every iteration. And you can have empty blocks if you want the assertion to be standalone in the middle of some block. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freezing without write I/O
On 2013-09-20 16:47:24 +0200, Andres Freund wrote: I think we should go through the various implementations and make sure they are actual compiler barriers and then change the documented policy. From a quick look * S_UNLOCK for PPC isn't a compiler barrier * S_UNLOCK for MIPS isn't a compiler barrier * I don't know enough about unixware (do we still support that as a platform even) to judge * True64 Alpha I have no clue about * PA-RISCs tas() might not be a compiler barrier for !GCC * PA-RISCs S_UNLOCK might not be a compiler barrier * HP-UX !GCC might not * IRIX 5 seems to be a compiler barrier * SINIX - I don't care * AIX PPC - compiler barrier * Sun - TAS is implemented in external assembly, normal function call, compiler barrier * Win(32|64) - compiler barrier * Generic S_UNLOCK *NOT* necessarily a compiler barrier. Ok, so I might have been a bit too optimistic... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assertions in PL/PgSQL
2013/9/20 Alvaro Herrera alvhe...@2ndquadrant.com Pavel Stehule escribió: PL/pgSQL had a ADA completeness, uniformity and beauty newer. But it is not too bad, and one new specialized statement doesn't kill us. A proposed functionality is often used and we have not tools (macros) how to implement it simply. we support a conditions for few statement - so enhancing RAISE statement is possible Extending RAISE is one option. Another option is to decorate BEGIN and END with an assertion option; and the assertion would be checked when the block is entered (in BEGIN) or finished (in END). BEGIN ASSERT (a = 1) WITH (name = a_is_one) a := a + 1; END; BEGIN ASSERT (a 0) a := a + 1; END ASSERT (a = 2) WITH (name = a_is_two); This would play nice with loops too, where the assertion is checked on every iteration. And you can have empty blocks if you want the assertion to be standalone in the middle of some block. it can works, but it looks too strange -1 Pavel -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] [PERFORM] encouraging index-only scans
On 2013-09-20 11:30:26 -0400, Robert Haas wrote: On Thu, Sep 19, 2013 at 6:59 PM, Andres Freund and...@2ndquadrant.com wrote: The reason I suggested keeping track of the xids of unremovable tuples is that the current logic doesn't handle that at all. We just unconditionally set n_dead_tuples to zero after a vacuum even if not a single row could actually be cleaned out. Which has the effect that we will not start a vacuum until enough bloat (or after changing this, new inserts) has collected to start vacuum anew. Which then will do twice the work. Resetting n_dead_tuples to the actual remaining dead tuples wouldn't do much good either - we would just immediately trigger a new vacuum the next time we check, even if the xmin horizon is still the same. One idea would be to store the xmin we used for the vacuum somewhere. Could we make that part of the pgstats infrastructure? Or store it in a new pg_class column? Then we could avoid re-triggering until it advances. Or, maybe better, we could remember the oldest XID that we weren't able to remove due to xmin considerations and re-trigger when the horizon passes that point. I suggested a slightly more complex variant of this upthread: http://archives.postgresql.org/message-id/20130907053449.GE626072%40alap2.anarazel.de Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] encouraging index-only scans
On Fri, Sep 20, 2013 at 11:51 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-09-20 11:30:26 -0400, Robert Haas wrote: On Thu, Sep 19, 2013 at 6:59 PM, Andres Freund and...@2ndquadrant.com wrote: The reason I suggested keeping track of the xids of unremovable tuples is that the current logic doesn't handle that at all. We just unconditionally set n_dead_tuples to zero after a vacuum even if not a single row could actually be cleaned out. Which has the effect that we will not start a vacuum until enough bloat (or after changing this, new inserts) has collected to start vacuum anew. Which then will do twice the work. Resetting n_dead_tuples to the actual remaining dead tuples wouldn't do much good either - we would just immediately trigger a new vacuum the next time we check, even if the xmin horizon is still the same. One idea would be to store the xmin we used for the vacuum somewhere. Could we make that part of the pgstats infrastructure? Or store it in a new pg_class column? Then we could avoid re-triggering until it advances. Or, maybe better, we could remember the oldest XID that we weren't able to remove due to xmin considerations and re-trigger when the horizon passes that point. I suggested a slightly more complex variant of this upthread: http://archives.postgresql.org/message-id/20130907053449.GE626072%40alap2.anarazel.de Ah, yeah. Sorry, I forgot about that. Personally, I'd try the simpler version first. But I think whoever takes the time to implement this will probably get to pick. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freezing without write I/O
Hi On Fri, Sep 20, 2013 at 5:11 PM, Andres Freund and...@2ndquadrant.comwrote: On 2013-09-20 16:47:24 +0200, Andres Freund wrote: I think we should go through the various implementations and make sure they are actual compiler barriers and then change the documented policy. From a quick look * S_UNLOCK for PPC isn't a compiler barrier * S_UNLOCK for MIPS isn't a compiler barrier * I don't know enough about unixware (do we still support that as a platform even) to judge * True64 Alpha I have no clue about * PA-RISCs tas() might not be a compiler barrier for !GCC * PA-RISCs S_UNLOCK might not be a compiler barrier * HP-UX !GCC might not * IRIX 5 seems to be a compiler barrier * SINIX - I don't care * AIX PPC - compiler barrier * Sun - TAS is implemented in external assembly, normal function call, compiler barrier * Win(32|64) - compiler barrier * Generic S_UNLOCK *NOT* necessarily a compiler barrier. Ok, so I might have been a bit too optimistic... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range types do not display in pg_stats
On 09/19/2013 02:55 PM, Mike Blackwell wrote: Interesting. Is this a 9.3 issue? I ran the above against my 9.2.4 server and got no rows in pg_stats. Did I miss something? Yeah, that was on 9.3. I think the issue on 9.2 is the same, it just expresses differently. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range types do not display in pg_stats
Robert, It probably has to do with the CASE stakind stuff in the definition of the pg_stats view. Try \d+ pg_stats to see what I mean. Ok, if this is not a known bug, I'll see if I can work up a fix. No promises, given the hairyness ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Looking for information on our elephant
On 09/19/2013 04:16 PM, Tatsuo Ishii wrote: Hi, I'm Looking for information on our elephant: especially how/why we chose elephant as our mascot. What I've been told: Our first logo was one Jan designed, which was a elephant in a diamond. While the idea was nice, it looked terrible, so when Marc and Greg launched PostreSQL Inc. they hired an actual desinger who created the blue elephant we have now. This design was then contributed to the community. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freezing without write I/O
Hi, IMO it's a bug if S_UNLOCK is a not a compiler barrier. Moreover for volatile remember: https://www.securecoding.cert.org/confluence/display/seccode/DCL17-C.+Beware+of+miscompiled+volatile-qualified+variables Who is double checking compiler output? :) regards Didier On Fri, Sep 20, 2013 at 5:11 PM, Andres Freund and...@2ndquadrant.comwrote: On 2013-09-20 16:47:24 +0200, Andres Freund wrote: I think we should go through the various implementations and make sure they are actual compiler barriers and then change the documented policy. From a quick look * S_UNLOCK for PPC isn't a compiler barrier * S_UNLOCK for MIPS isn't a compiler barrier * I don't know enough about unixware (do we still support that as a platform even) to judge * True64 Alpha I have no clue about * PA-RISCs tas() might not be a compiler barrier for !GCC * PA-RISCs S_UNLOCK might not be a compiler barrier * HP-UX !GCC might not * IRIX 5 seems to be a compiler barrier * SINIX - I don't care * AIX PPC - compiler barrier * Sun - TAS is implemented in external assembly, normal function call, compiler barrier * Win(32|64) - compiler barrier * Generic S_UNLOCK *NOT* necessarily a compiler barrier. Ok, so I might have been a bit too optimistic... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
On 09/20/2013 08:38 AM, Kevin Grittner wrote: Did you look at the record_eq and record_cmp functions which exist before this patch? Is there a reason we should do it one way for the default operators and another way for this alternative? Do you think we should change record_eq and record_cmp to do things the way you suggest? I think the record_eq and record_cmp functions would be better if they shared code as well, but I don't think changing that should be part of this patch, you aren't otherwise touching those functions. I know some people dislike code that is switch based and prefer duplication, my preference is to avoid duplication. This seems like a bad idea to me. I don't like that I get a type comparison error only sometimes based on the values of the data in the column. If I'm a user testing code that compares two records with this operator and I test my query with 1 value pair then and it works then I'd naively expect to get a true or false on all other value sets of the same record type. Again, this is a result of following the precedent set by the existing record comparison operators. test=# create table r1 (c1 int, c2 int); CREATE TABLE test=# create table r2 (c1 int, c2 int, c3 int); CREATE TABLE test=# insert into r1 values (1, 2); INSERT 0 1 test=# insert into r2 values (3, 4, 5); INSERT 0 1 test=# select * from r1 join r2 on r1 r2; c1 | c2 | c1 | c2 | c3 ++++ 1 | 2 | 3 | 4 | 5 (1 row) test=# update r1 set c1 = 3, c2 = 4; UPDATE 1 test=# select * from r1 join r2 on r1 r2; ERROR: cannot compare record types with different numbers of columns Would be in favor of forcing a change to the record comparison operators which have been in use for years? If not, is there a good reason to have them behave differently in this regard? -- I hadn't picked up on that you copied that part of the behaviour from the existing comparison operators. No we would need a really good reason for changing the behaviour of the comparison operators and I don't think we have that. I agree that the binary identical operators should behave similarly to the existing comparison operators and error out on the column number mismatch after comparing the columns that are present in both. Steve Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
On Thu, Sep 19, 2013 at 7:58 PM, Tatsuo Ishii is...@postgresql.org wrote: What about limiting to use NCHAR with a database which has same encoding or compatible encoding (on which the encoding conversion is defined)? This way, NCHAR text can be automatically converted from NCHAR to the database encoding in the server side thus we can treat NCHAR exactly same as CHAR afterward. I suppose what encoding is used for NCHAR should be defined in initdb time or creation of the database (if we allow this, we need to add a new column to know what encoding is used for NCHAR). For example, CREATE TABLE t1(t NCHAR(10)) will succeed if NCHAR is UTF-8 and database encoding is UTF-8. Even succeed if NCHAR is SHIFT-JIS and database encoding is UTF-8 because there is a conversion between UTF-8 and SHIFT-JIS. However will not succeed if NCHAR is SHIFT-JIS and database encoding is ISO-8859-1 because there's no conversion between them. I think the point here is that, at least as I understand it, encoding conversion and sanitization happens at a very early stage right now, when we first receive the input from the client. If the user sends a string of bytes as part of a query or bind placeholder that's not valid in the database encoding, it's going to error out before any type-specific code has an opportunity to get control. Look at textin(), for example. There's no encoding check there. That means it's already been done at that point. To make this work, someone's going to have to figure out what to do about *that*. Until we have a sketch of what the design for that looks like, I don't see how we can credibly entertain more specific proposals. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator
Dimitri Fontaine escribió: My understanding is that if you choose citext then you don't care at all about the case, so that the two relations actually yield the same results for the right definition of same here: the citext one. For the record, I don't think citext means that the user doesn't care about the case; it only means they want the comparisons to be case-insensitive, but case should nonetheless be preserved. That is, case-insensitive, case-preserving. A parallel is MS-DOS file name semantics. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
On Thu, Sep 19, 2013 at 6:42 PM, MauMau maumau...@gmail.com wrote: National character types support may be important to some potential users of PostgreSQL and the popularity of PostgreSQL, not me. That's why national character support is listed in the PostgreSQL TODO wiki. We might be losing potential users just because their selection criteria includes national character support. We'd have to go back and search the archives to figure out why that item was added to the TODO, but I'd be surprised if anyone ever had it in mind to create additional types that behave just like existing types but with different names. I don't think that you'll be able to get consensus around that path on this mailing list. I am not keen to introduce support for nchar and nvarchar as differently-named types with identical semantics. Similar examples already exist: - varchar and text: the only difference is the existence of explicit length limit - numeric and decimal - int and int4, smallint and int2, bigint and int8 - real/double precison and float I agree that the fact we have both varchar and text feels like a wart. The other examples mostly involve different names for the same underlying type, and so are different from what you are asking for here. I understand your feeling. The concern about incompatibility can be eliminated by thinking the following way. How about this? - NCHAR can be used with any database encoding. - At first, NCHAR is exactly the same as CHAR. That is, implementation-defined character set described in the SQL standard is the database character set. - In the future, the character set for NCHAR can be selected at database creation like Oracle's CREATE DATABAWSE NATIONAL CHARACTER SET AL16UTF16. The default it the database set. Hmm. So under that design, a database could support up to a total of two character sets, the one that you get when you say 'foo' and the other one that you get when you say n'foo'. I guess we could do that, but it seems a bit limited. If we're going to go to the trouble of supporting multiple character sets, why not support an arbitrary number instead of just two? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Could ANALYZE estimate bloat?
On 09/20/2013 11:59 AM, Josh Berkus wrote: Hackers, I've been tinkering with a number of table bloat checks, and it occurred to me that the problm is that these are all approximations based on overall gross statistics, and as such highly inaccurate. It seems like would could have ANALYZE, while sampling from the table, also check the amount of dead space per page and use that as an estimate of the % of dead space overall. We'd still need something else for indexes, but this seems like it would be a good start. No? I think this is a great idea. -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FW: REVIEW: Allow formatting in log_line_prefix
On Fri, Sep 20, 2013 at 3:15 AM, Albe Laurenz laurenz.a...@wien.gv.at wrote: David Rowley wrote: I moved the source around and I've patched against it again. New patch attached. Thank you, marked as ready for committer. /* + * helper function for processing the format string in + * log_line_prefix() + * This function returns NULL if it finds something which + * it deems invalid for the log_line_prefix string. + */ Comments should be formatted as a single paragraph. If you want multiple paragraphs, you need to include a line that's blank except for the leading *. +static const char +*process_log_prefix_padding(const char *p, int *ppadding) The asterisk should be on the previous line, separated from char by a space. + appendStringInfo(buf, %*ld, padding, log_line_number); Is %* supported by all versions of printf() on every platform we support? Earlier there was some discussion of performance. Was that tested? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Could ANALYZE estimate bloat?
Hackers, I've been tinkering with a number of table bloat checks, and it occurred to me that the problm is that these are all approximations based on overall gross statistics, and as such highly inaccurate. It seems like would could have ANALYZE, while sampling from the table, also check the amount of dead space per page and use that as an estimate of the % of dead space overall. We'd still need something else for indexes, but this seems like it would be a good start. No? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Extend namespace of valid guc names
On Thu, Sep 19, 2013 at 6:19 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-09-19 14:57:53 -0400, Robert Haas wrote: On Tue, Sep 17, 2013 at 6:56 PM, Andres Freund and...@2ndquadrant.com wrote: I think that ship has long since sailed. postgresql.conf has allowed foo.bar style GUCs via custom_variable_classes for a long time, and these days we don't even require that but allow them generally. Also, SET, postgres -c, and SELECT set_config() already don't have the restriction to one dot in the variable name. Well, we should make it consistent, but I'm not sure that settles the question of which direction to go. I suggested making it consistent in either form elsewhere in this thread, Tom wasn't convinced. I think because of backward compatibility concerns we hardly can restrict the valid namespace in all forms to the most restrictive form (i.e. guc-file.l). Quite some people use GUCs as variables. Maybe we can forbid the more absurd variants (SET ... = 3; SELECT set_config('...', '1', true)), but restricting it entirely seems to cause pain for no reason. Yeah, I don't think I understand Tom's reasoning. I mean, why shouldn't there be ONE rule for what can be a GUC? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pg_sleep(interval)
On Fri, Sep 20, 2013 at 7:59 AM, Fabien COELHO coe...@cri.ensmp.fr wrote: - the new function is *not* tested anywhere! I would suggest simply to replace some pg_sleep(int) instances by corresponding pg_sleep(interval) instances in the non regression tests. - some concerns have been raised that it breaks pg_sleep(TEXT) which currently works thanks to the implicit TEXT - INT cast. I would suggest to add pg_sleep(TEXT) explicitely, like: CREATE FUNCTION pg_sleep(TEXT) RETURNS VOID VOLATILE STRICT AS $$ select pg_sleep($1::INTEGER) $$ LANGUAGE SQL; That would be another one liner, to update the documentation and to add some tests as well! ISTM that providing pg_sleep(TEXT) cleanly resolves the upward-compatibility issue raised. I think that's ugly and I'm not one bit convinced it will resolve all the upgrade-compatibility issues. Realistically, all sleeps are going to be reasonably well measured in seconds anyway. If you want to sleep for some other interval, convert that interval to a number of seconds first. Another problem is that, as written, this is vulnerable to search_path hijacking attacks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pg_sleep(interval)
Hello Robert, - some concerns have been raised that it breaks pg_sleep(TEXT) which currently works thanks to the implicit TEXT - INT cast. I would suggest to add pg_sleep(TEXT) explicitely, like: CREATE FUNCTION pg_sleep(TEXT) RETURNS VOID VOLATILE STRICT AS $$ select pg_sleep($1::INTEGER) $$ LANGUAGE SQL; That would be another one liner, to update the documentation and to add some tests as well! ISTM that providing pg_sleep(TEXT) cleanly resolves the upward-compatibility issue raised. I think that's ugly and I'm not one bit convinced it will resolve all the upgrade-compatibility issues. Realistically, all sleeps are going to be reasonably well measured in seconds anyway. I do not know that. From a usual dabatabase point of view, it does not make much sense to slow down a database anyway, and this function is never needed... so it really depends on the use case. If someone want to simulate a long standing transaction to check its effect on some features such as dumping data orreplication or whatever, maybe pg_sleep(INTERVAL '5 hours') is nicer that pg_sleep(18000), if you are not too good at dividing by 60, 3600 or 86400... If you want to sleep for some other interval, convert that interval to a number of seconds first. You could say that for any use of INTERVAL. ISTM that the point if the interval type is just to be more readable than a number of seconds to express a delay. Another problem is that, as written, this is vulnerable to search_path hijacking attacks. Yes, sure. I was not suggesting to create the function directly as above, this is just the test I made to check whether it worked as I thought, i.e. providing a TEXT version works and interacts properly with the INTEGER and INTERVAL versions. My guess is that the function definition would be inserted directly in pg_proc as other pg_* functions at initdb time. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] File_fdw documentation patch to clarify OPTIONS clause
The file_fdw documentation for the OPTIONS clause references the COPY command's documentation. In the case of COPY, the value for some options (e.g. HEADER, OIDS, ...) is optional. While the file_fdw documentation makes no mention of it, the values for all options are required. Attached is a patch to add a note to the docs mentioning this fact. __ *Mike Blackwell | Technical Analyst, Distribution Services/Rollout Management | RR Donnelley* 1750 Wallace Ave | St Charles, IL 60174-3401 Office: 630.313.7818 mike.blackw...@rrd.com http://www.rrdonnelley.com http://www.rrdonnelley.com/ * mike.blackw...@rrd.com* file-fdw-doc.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
On 9/20/13 2:22 PM, Robert Haas wrote: I am not keen to introduce support for nchar and nvarchar as differently-named types with identical semantics. Similar examples already exist: - varchar and text: the only difference is the existence of explicit length limit - numeric and decimal - int and int4, smallint and int2, bigint and int8 - real/double precison and float I agree that the fact we have both varchar and text feels like a wart. The other examples mostly involve different names for the same underlying type, and so are different from what you are asking for here. Also note that we already have NCHAR [VARYING]. It's mapped to char or varchar, respectively, in the parser, just like int, real, etc. are handled. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL renegotiation
Here's the patch I propose to handle renegotiation cleanly. I noticed in testing that SSL_renegotiate_pending() doesn't seem to actually work --- if I throw an ereport(FATAL) at the point where I expect the renegotiation to be complete, it always dies; even if I give it megabytes of extra traffic while waiting for the renegotiation to complete. I suspect this is an OpenSSL bug. Instead, in this patch I check the internal renegotiation counter: grab its current value when starting the renegotiation, and consider it complete when the counter has advanced. This works fine. Another thing not covered by the original code snippet I proposed upthread is to avoid renegotiating when there was a renegotiation in progress. This bug has been observed in the field. Per discussion, I made it close the connection with a FATAL error if the limit is reached and the renegotiation hasn't taken place. To do otherwise is not acceptable for a security PoV. Sean Chittenden mentioned that when retrying the handshake, we should be careful to only do it a few times, not forever, to avoid a malfeasant from grabbing hold of a connection indefinitely. I've added that too, hardcoding the number of retries to 20. Also, I made this code request a renegotiation slightly before the limit is actually reached. I noticed that in some cases some traffic can go by before the renegotiation is actually completed. The difference should be pretty minimal. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services *** a/src/backend/libpq/be-secure.c --- b/src/backend/libpq/be-secure.c *** *** 101,106 char *ssl_crl_file; --- 101,109 */ int ssl_renegotiation_limit; + /* are we in the middle of a renegotiation? */ + static bool in_ssl_renegotiation = false; + #ifdef USE_SSL static SSL_CTX *SSL_context = NULL; static bool ssl_loaded_verify_locations = false; *** *** 321,350 secure_write(Port *port, void *ptr, size_t len) if (port-ssl) { int err; ! if (ssl_renegotiation_limit port-count ssl_renegotiation_limit * 1024L) { SSL_set_session_id_context(port-ssl, (void *) SSL_context, sizeof(SSL_context)); if (SSL_renegotiate(port-ssl) = 0) ereport(COMMERROR, (errcode(ERRCODE_PROTOCOL_VIOLATION), ! errmsg(SSL renegotiation failure))); ! if (SSL_do_handshake(port-ssl) = 0) ! ereport(COMMERROR, ! (errcode(ERRCODE_PROTOCOL_VIOLATION), ! errmsg(SSL renegotiation failure))); ! if (port-ssl-state != SSL_ST_OK) ! ereport(COMMERROR, ! (errcode(ERRCODE_PROTOCOL_VIOLATION), ! errmsg(SSL failed to send renegotiation request))); ! port-ssl-state |= SSL_ST_ACCEPT; ! SSL_do_handshake(port-ssl); ! if (port-ssl-state != SSL_ST_OK) ! ereport(COMMERROR, ! (errcode(ERRCODE_PROTOCOL_VIOLATION), ! errmsg(SSL renegotiation failure))); ! port-count = 0; } wloop: --- 324,384 if (port-ssl) { int err; + static long renegotiation_count = 0; ! /* ! * If SSL renegotiations are enabled and we're getting close to the ! * limit, start one now; but avoid it if there's one already in ! * progress. Request the renegotiation before the limit has actually ! * expired; we give it 1% of the specified limit, or 1kB, whichever is ! * greater. ! */ ! if (ssl_renegotiation_limit !in_ssl_renegotiation ! port-count Min((ssl_renegotiation_limit - 1) * 1024L, ! (long) rint(ssl_renegotiation_limit * 1024L * 0.99))) { + in_ssl_renegotiation = true; + SSL_set_session_id_context(port-ssl, (void *) SSL_context, sizeof(SSL_context)); if (SSL_renegotiate(port-ssl) = 0) ereport(COMMERROR, (errcode(ERRCODE_PROTOCOL_VIOLATION), ! errmsg(SSL failure during renegotiation start))); ! else ! { ! int handshake; ! int retries = 20; ! ! /* ! * The way we determine that a renegotiation has completed is ! * by observing OpenSSL's internal renegotiation counter. ! * Memoize it when starting the renegotiation, and assume that ! * the renegotiation is complete when the counter advances. ! * ! * OpenSSL provides SSL_renegotiation_pending(), but this ! * doesn't seem to work in testing. ! */ ! renegotiation_count = SSL_num_renegotiations(port-ssl); ! ! /* ! * A handshake can fail, so be prepared to retry it, but only a ! * few times ! */ ! for (;;) ! { ! handshake = SSL_do_handshake(port-ssl); ! if (handshake 0) ! break; /* done */ ! ereport(COMMERROR, ! (errcode(ERRCODE_PROTOCOL_VIOLATION), ! errmsg(SSL handshake failure on renegotiation, retrying))); ! if (retries-- = 0) ! ereport(FATAL, ! (errcode(ERRCODE_PROTOCOL_VIOLATION), ! errmsg(unable to complete SSL handshake))); ! } ! }
Re: [HACKERS] gaussian distribution pgbench
KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: I create gaussinan distribution pgbench patch that can access records with gaussian frequency. And I submit this commit fest. Thanks! I have moved this to the Open CommitFest, though. https://commitfest.postgresql.org/action/commitfest_view/open You had accidentally added to the CF In Progress. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Could ANALYZE estimate bloat?
Josh, * Josh Berkus (j...@agliodbs.com) wrote: I've been tinkering with a number of table bloat checks, and it occurred to me that the problm is that these are all approximations based on overall gross statistics, and as such highly inaccurate. Greg Smith and I discussed some improvements around this area @Open. I'd suggest you talk with him about the ideas that he has. It seems like would could have ANALYZE, while sampling from the table, also check the amount of dead space per page and use that as an estimate of the % of dead space overall. We'd still need something else for indexes, but this seems like it would be a good start. No? Err, I thought that was one of the things like ANALYZE *did* collect through the 'live tuples' number? Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] logical changeset generation v6
On Thu, Sep 19, 2013 at 7:43 AM, Andres Freund and...@2ndquadrant.com wrote: More generally, the thing that bugs me about this approach is that logical replication is not really special, but what you've done here MAKES it special. There are plenty of other situations where we are too aggressive about holding back xmin. A single-statement transaction that selects from a single table holds back xmin for the entire cluster, and that is Very Bad. It would probably be unfair to say, well, you have to solve that problem first. But on the other hand, how do we know that the approach you've adopted here isn't going to make the more general problem harder to solve? It seems invasive at a pretty low level. I agree that it's invasive, but I am doubtful that pegging the xmin in a more granular fashion precludes this kind of optimization. We might have to generalize what Andres has done, which could mean eventually throwing it out and starting from scratch, but I have a hard time seeing how that implies an appreciable cost above solving the general problem first (now that Andres has already implemented the RecentGlobalDataXmin thing). As I'm sure you appreciate, the cost of doing the opposite - of solving the general problem first - may be huge: waiting another release for logical changeset generation. The reason why I think it's actually different is that the user actually has control over how long transactions are running on the primary. They don't really control how fast a replication consumer consumes and how often it sends feedback messages. Right. That's about what I said last year. I find the following analogy useful: A logical changeset generation implementation without RecentGlobalDataXmin is kind of like an old-fashioned nuclear reactor, like the one they had at Chernobyl. Engineers have to actively work in order to prevent it from overheating. However, an implementation with RecentGlobalDataXmin is like a modern, much safer nuclear reactor. Engineers have to actively work to keep the reactor heated. Which is to say, with RecentGlobalDataXmin a standby that dies cannot bloat the master too much (almost as with hot_standby_feedback - that too requires active participation from the standby to do harm to the master). Without RecentGlobalDataXmin, the core system and the plugin at the very least need to worry about that case when a standby dies. I have a little bit of feedback that I forgot to mention in my earlier reviews, because I thought it was too trivial then: something about the name pg_receivellog annoys me in a way that the name pg_receivexlog does not. Specifically, it looks like someone meant to type pg_receivelog but fat-fingered it. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gaussian distribution pgbench
Hello Mitsumasa, In the general transaction situation, clients access for all records equally is hard to happen. I think gaussian distribution access patterns are most of transaction petterns in general. My patch realizes neary this access pattern. That is great! I was just looking for something like that! I have not looked at the patch yet, but from the plots you sent, it seems that it is a gaussian distribution over the keys. However this pattern induces stronger cache effects which are maybe not too realistic, because neighboring keys in the middle are more likely to be chosen. It seems to me that this is not desirable. Have you considered adding a randomization layer, that is once you have a key in [1 .. n] centered around n/2, then you perform a pseudo-random transformation into the same domain so that key values are scattered over the whole domain? -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Could ANALYZE estimate bloat?
On 09/20/2013 02:17 PM, Stephen Frost wrote: No? Err, I thought that was one of the things like ANALYZE *did* collect through the 'live tuples' number? Nope. live tuples is updated by the stats collector, NOT by analyze. Also, live vs. dead tuples doesn't tell me how much free *space* is available on pages. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
From: Tatsuo Ishii is...@postgresql.org What about limiting to use NCHAR with a database which has same encoding or compatible encoding (on which the encoding conversion is defined)? This way, NCHAR text can be automatically converted from NCHAR to the database encoding in the server side thus we can treat NCHAR exactly same as CHAR afterward. I suppose what encoding is used for NCHAR should be defined in initdb time or creation of the database (if we allow this, we need to add a new column to know what encoding is used for NCHAR). For example, CREATE TABLE t1(t NCHAR(10)) will succeed if NCHAR is UTF-8 and database encoding is UTF-8. Even succeed if NCHAR is SHIFT-JIS and database encoding is UTF-8 because there is a conversion between UTF-8 and SHIFT-JIS. However will not succeed if NCHAR is SHIFT-JIS and database encoding is ISO-8859-1 because there's no conversion between them. Thanks for the idea, it sounds flexible for wider use. Your cooperation would be much appreciated to devise implementation with as little code as possible. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
From: Valentine Gogichashvili val...@gmail.com the whole NCHAR appeared as hack for the systems, that did not have it from the beginning. It would not be needed, if all the text would be magically stored in UNICODE or UTF from the beginning and idea of character would be the same as an idea of a rune and not a byte. I guess so, too. PostgreSQL has a very powerful possibilities for storing any kind of encoding. So maybe it makes sense to add the ENCODING as another column property, the same way a COLLATION was added? Some other people in this community suggested that. ANd the SQL standard suggests the same -- specifying a character encoding for each column: CHAR(n) CHARASET SET ch. Text operations should work automatically, as in memory all strings will be converted to the database encoding. This approach will also open a possibility to implement custom ENCODINGs for the column data storage, like snappy compression or even BSON, gobs or protbufs for much more compact type storage. Thanks for your idea that sounds interesting, although I don't understand that well. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
From: Martijn van Oosterhout klep...@svana.org As far as I can tell the whole reason for introducing NCHAR is to support SHIFT-JIS, there hasn't been call for any other encodings, that I can remember anyway. Could you elaborate on this, giving some info sources? So rather than this whole NCHAR thing, why not just add a type sjistext, and a few type casts and call it a day... The main reason for supporting NCHAR types is to ease migration from other DBMSs, not requiring DDL changes. So sjistext does not match that purpose. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
From: Robert Haas robertmh...@gmail.com I don't think that you'll be able to get consensus around that path on this mailing list. I agree that the fact we have both varchar and text feels like a wart. Is that right? I don't feel varchar/text case is a wart. I think text was introduced for a positive reason to ease migration from other DBMSs. The manual says: http://www.postgresql.org/docs/current/static/datatype-character.html Although the type text is not in the SQL standard, several other SQL database management systems have it as well. And isn't EnterpriseDB doing similar things for Oracle compatibility, although I'm not sure about the details? Could you share your idea why we won't get consensus? I understand your feeling. The concern about incompatibility can be eliminated by thinking the following way. How about this? - NCHAR can be used with any database encoding. - At first, NCHAR is exactly the same as CHAR. That is, implementation-defined character set described in the SQL standard is the database character set. - In the future, the character set for NCHAR can be selected at database creation like Oracle's CREATE DATABAWSE NATIONAL CHARACTER SET AL16UTF16. The default it the database set. Hmm. So under that design, a database could support up to a total of two character sets, the one that you get when you say 'foo' and the other one that you get when you say n'foo'. I guess we could do that, but it seems a bit limited. If we're going to go to the trouble of supporting multiple character sets, why not support an arbitrary number instead of just two? I agree with you about the arbitrary number. Tatsuo san gave us a good suggestion. Let's consider how to implement that. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
From: Robert Haas robertmh...@gmail.com On Thu, Sep 19, 2013 at 7:58 PM, Tatsuo Ishii is...@postgresql.org wrote: What about limiting to use NCHAR with a database which has same encoding or compatible encoding (on which the encoding conversion is defined)? This way, NCHAR text can be automatically converted from NCHAR to the database encoding in the server side thus we can treat NCHAR exactly same as CHAR afterward. I suppose what encoding is used for NCHAR should be defined in initdb time or creation of the database (if we allow this, we need to add a new column to know what encoding is used for NCHAR). For example, CREATE TABLE t1(t NCHAR(10)) will succeed if NCHAR is UTF-8 and database encoding is UTF-8. Even succeed if NCHAR is SHIFT-JIS and database encoding is UTF-8 because there is a conversion between UTF-8 and SHIFT-JIS. However will not succeed if NCHAR is SHIFT-JIS and database encoding is ISO-8859-1 because there's no conversion between them. I think the point here is that, at least as I understand it, encoding conversion and sanitization happens at a very early stage right now, when we first receive the input from the client. If the user sends a string of bytes as part of a query or bind placeholder that's not valid in the database encoding, it's going to error out before any type-specific code has an opportunity to get control. Look at textin(), for example. There's no encoding check there. That means it's already been done at that point. To make this work, someone's going to have to figure out what to do about *that*. Until we have a sketch of what the design for that looks like, I don't see how we can credibly entertain more specific proposals. OK, I see your point. Let's consider that design. I'll learn the code regarding this. Does anybody, especially Tatsuo san, Tom san, Peter san, have any good idea? Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Tue, Sep 17, 2013 at 9:29 AM, Robert Haas robertmh...@gmail.com wrote: On Sat, Sep 14, 2013 at 6:27 PM, Peter Geoghegan p...@heroku.com wrote: Note that today there is no guarantee that the original waiter for a duplicate-inserting xact to complete will be the first one to get a second chance ProcLockWakeup() only wakes as many waiters from the head of the queue as can all be granted the lock without any conflicts. So I don't think there is a race condition in that path. Right, but what about XactLockTableWait() itself? It only acquires a ShareLock on the xid of the got-there-first inserter that potentially hasn't yet committed/aborted. There will be no conflicts between multiple second-chance-seeking blockers trying to acquire this lock concurrently, and so in fact there is (what I guess you'd consider to be) a race condition in the current btree insertion code. So my earlier point about according an upsert implementation license to optimize ordering of retries across multiple unique indexes -- that it isn't really inconsistent with the current code when dealing with only one unique index insertion -- has not been invalidated. EvalPlanQualFetch() and Do_MultiXactIdWait() also call XactLockTableWait(), for similar reasons. In my patch, the later row locking code used by INSERT...ON DUPLICATE KEY LOCK FOR UPDATE calls XactLockTableWait() too. So far it's been a slippery slope type argument that can be equally well used to argue against some facet of almost any substantial patch ever proposed. I don't completely agree with that characterization, but you do have a point. Obviously, if the differences in the area of interruptibility, starvation, deadlock risk, etc. can be made small enough relative to the status quo can be made small enough, then those aren't reasons to reject the approach. That all seems fair to me. That's the standard that I'd apply as a reviewer myself. But I'm skeptical that you're going to be able to accomplish that, especially without adversely affecting maintainability. I think the way that you're proposing to use lwlocks here is sufficiently different from what the rest of the system does that it's going to be hard to avoid system-wide affects that can't easily be caught during code review; Fair enough. In case it isn't already totally clear to someone, I concede that it isn't going to be workable to hold even shared buffer locks across all these operations. Let's get past that, though. and like Andres, I don't share your skepticism about alternative approaches. Well, I expressed skepticism about one alternative approach in particular, which is the promise tuples approach. Andres seems to think that I'm overly concerned about bloat, but I'm not sure he appreciates why I'm so sensitive to it in this instance. I'll be particularly sensitive to it if value locks need to be held indefinitely rather than there being a speculative grab-the-value-locks attempt (because that increases the window in which another session can necessitate that we retry at row locking time quite considerably - see below). I think the fundamental problem with UPSERT, MERGE, and this proposal is what happens when the conflicting tuple is present but not visible to your scan, either because it hasn't committed yet or because it has committed but is not visible to your snapshot. Yeah, you're right. As I mentioned to Andres already, when row locking happens and there is this kind of conflict, my approach is to retry from scratch (go right back to before value lock acquisition) in the sort of scenario that generally necessitates EvalPlanQual() looping, or to throw a serialization failure where that's appropriate. After an unsuccessful attempt at row locking there could well be an interim wait for another xact to finish, before retrying (at read committed isolation level). This is why I think that value locking/retrying should be cheap, and should avoid bloat if at all possible. Forgive me if I'm making a leap here, but it seems like what you're saying is that the semantics of upsert that one might naturally expect are *arguably* fundamentally impossible, because they entail potentially locking a row that isn't current to your snapshot, and you cannot throw a serialization failure at read committed. I respectfully suggest that that exact definition of upsert isn't a useful one, because other snapshot isolation/MVCC systems operating within the same constraints must have the same issues, and yet they manage to implement something that could be called upsert that people seem happy with. I also discovered, after it was committed, that it didn't help in the way I expected: http://www.postgresql.org/message-id/CA+TgmoY8P3sD=ouvig+xzjmzk5-phunv39rtfyzuqxu8hjt...@mail.gmail.com Well, at the time you didn't also provide raw commit latency benchmark results for your hardware using a tool like pg_test_fsync, which I'd consider absolutely essential to such a discussion. That's
Re: [HACKERS] Could ANALYZE estimate bloat?
* Josh Berkus (j...@agliodbs.com) wrote: Also, live vs. dead tuples doesn't tell me how much free *space* is available on pages. I'm not really sure that you'd get much better from ANALYZE than you get from tracking the inserted/updated/deleted tuples. Collecting that information when VACUUM'ing the table would certainly provide much more accurate results, which could possibly be stored in a page-level bitmap of completely empty pages at the beginning of each 1G segment. Alternatively, the bitmap could be updated during processing instead of waiting for a VACUUM. Greg and I hypothesized that such a bitmap might be used to truncate individual 1G segments in the middle of a relation rather than only at the end, perhaps all the way down to a point where only a header plus the page-level bitmap in the 1G segment are left. This was discussed in context of a VACUUM which used the try-to-elevate-the-lock approach already used to truncate the last 1G segment of the relations, though I've also wondered if it could take page-level locks starting at the end of the 1G segment and walking backwards until it's unable to acquire a lock or a non-empty page is found. Of course, we're aware of the issues around the storage management system and interfaces which might make this entirely unrealistic but it's getting to a point where, I think (not sure about Greg), we need to deal with that in some way to improve on issues like this. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Peter, * Peter Geoghegan (p...@heroku.com) wrote: Forgive me if I'm making a leap here, but it seems like what you're saying is that the semantics of upsert that one might naturally expect are *arguably* fundamentally impossible, because they entail potentially locking a row that isn't current to your snapshot, and you cannot throw a serialization failure at read committed. I respectfully suggest that that exact definition of upsert isn't a useful one, I'm not sure I follow this completely- you're saying that a definition of 'upsert' which includes having to lock rows which aren't in your current snapshot (for reasons stated) isn't a useful one. Is the implication that a useful definition of 'upsert' is that it *doesn't* have to lock rows which aren't in your current snapshot, and if so, then what would the semantics of that upsert look like? because other snapshot isolation/MVCC systems operating within the same constraints must have the same issues, and yet they manage to implement something that could be called upsert that people seem happy with. This I am generally in agreement with, to the extent that 'upsert' is something we really want and we should figure out a way to get there from here, but it wouldn't be the first time that we worked out a better solution than existing implementations. So, another '+1' from me wrt your working this issue and please don't get too discouraged that there's a lot of pressure to find a magic bullet- I think part of it is exactly because everyone wants this and wants it to be better than what's out there today. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] FW: REVIEW: Allow formatting in log_line_prefix
On Sat, Sep 21, 2013 at 7:18 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Sep 20, 2013 at 3:15 AM, Albe Laurenz laurenz.a...@wien.gv.at wrote: David Rowley wrote: I moved the source around and I've patched against it again. New patch attached. Thank you, marked as ready for committer. /* + * helper function for processing the format string in + * log_line_prefix() + * This function returns NULL if it finds something which + * it deems invalid for the log_line_prefix string. + */ Comments should be formatted as a single paragraph. If you want multiple paragraphs, you need to include a line that's blank except for the leading *. +static const char +*process_log_prefix_padding(const char *p, int *ppadding) The asterisk should be on the previous line, separated from char by a space. I've attached another revision of the patch which cleans up the comment for the process_log_prefix_padding function to be more like the comments earlier in the same file. I have also moved the asterisk to the previous line. + appendStringInfo(buf, %*ld, padding, log_line_number); Is %* supported by all versions of printf() on every platform we support? Do you specifically mean %*ld, or %* in general? As I can see various other places where %*s is used in the source by looking at grep -r %\* . But if you do mean specifically %*ld, then we did already use %ld here and since there are places which use %*s, would it be wrong to assume that %*ld is ok? Earlier there was some discussion of performance. Was that tested? I've done some performance tests now. I assume that the processing of the log line prefix would be drowned out by any testing of number of transactions per second, so I put a few lines at the start of send_message_to_server_log(): Which ended up looking like: static void send_message_to_server_log(ErrorData *edata) { StringInfoData buf; int i; float startTime, endTime; startTime = (float)clock()/CLOCKS_PER_SEC; StringInfoData tmpbuf; for (i = 0; i 100; i++) { initStringInfo(tmpbuf); log_line_prefix(tmpbuf, edata); pfree(tmpbuf.data); } endTime = (float)clock()/CLOCKS_PER_SEC; printf(log_line_prefix (%s) in %f seconds\n, Log_line_prefix, endTime - startTime); initStringInfo(buf); ... I am seeing a slow down in the processing of the 2 log_line_prefix strings that I tested with. Here are the results: Patched (%a %u %d %p %t %m %i %e %c %l %s %v %x ) david@ubuntu64:~/9.4/bin$ ./postgres -D /home/david/9.4/data/ log_line_prefix (%a %u %d %p %t %m %i %e %c %l %s %v %x ) in 3.93 seconds 62324 2013-09-20 05:37:30 NZST 2013-09-20 05:37:30.455 NZST 0 523b3656.f374 101 2013-09-20 05:37:26 NZST 0 LOG: database system was shut down at 2013-09-20 05:36:21 NZST log_line_prefix (%a %u %d %p %t %m %i %e %c %l %s %v %x ) in 3.94 seconds 62329 2013-09-20 05:37:38 NZST 2013-09-20 05:37:38.724 NZST 0 523b365a.f379 101 2013-09-20 05:37:30 NZST 0 LOG: autovacuum launcher started log_line_prefix (%a %u %d %p %t %m %i %e %c %l %s %v %x ) in 3.96 seconds 62323 2013-09-20 05:37:38 NZST 2013-09-20 05:37:38.756 NZST 0 523b3656.f373 101 2013-09-20 05:37:26 NZST 0 LOG: database system is ready to accept connections log_line_prefix (%a %u %d %p %t %m %i %e %c %l %s %v %x ) in 4.82 seconds psql david postgres 62331 2013-09-20 05:38:43 NZST 2013-09-20 05:38:43.490 NZST SELECT 22012 523b3688.f37b 101 2013-09-20 05:38:16 NZST 2/4 0 ERROR: division by zero psql david postgres 62331 2013-09-20 05:38:43 NZST 2013-09-20 05:38:43.490 NZST SELECT 22012 523b3688.f37b 102 2013-09-20 05:38:16 NZST 2/4 0 STATEMENT: select 1/0; log_line_prefix (%a %u %d %p %t %m %i %e %c %l %s %v %x ) in 4.69 seconds psql david postgres 62331 2013-09-20 05:39:35 NZST 2013-09-20 05:39:35.900 NZST SELECT 22012 523b3688.f37b 203 2013-09-20 05:38:16 NZST 2/5 0 ERROR: division by zero psql david postgres 62331 2013-09-20 05:39:35 NZST 2013-09-20 05:39:35.900 NZST SELECT 22012 523b3688.f37b 204 2013-09-20 05:38:16 NZST 2/5 0 STATEMENT: select 1/0; Unpatched (%a %u %d %p %t %m %i %e %c %l %s %v %x ) david@ubuntu64:~/9.4/bin$ ./postgres -D /home/david/9.4/data/ log_line_prefix (%a %u %d %p %t %m %i %e %c %l %s %v %x ) in 3.12 seconds 925 2013-09-20 05:45:48 NZST 2013-09-20 05:45:48.483 NZST 0 523b3849.39d 101 2013-09-20 05:45:45 NZST 0 LOG: database system was shut down at 2013-09-20 05:40:47 NZST log_line_prefix (%a %u %d %p %t %m %i %e %c %l %s %v %x ) in 3.10 seconds 924 2013-09-20 05:45:54 NZST 2013-09-20 05:45:54.970 NZST 0 523b3849.39c 101 2013-09-20 05:45:45 NZST 0 LOG: database system is ready to accept connections log_line_prefix (%a %u %d %p %t %m %i %e %c %l %s %v %x ) in 3.12 seconds 931 2013-09-20 05:45:55 NZST 2013-09-20 05:45:55.015 NZST 0 523b384c.3a3 101 2013-09-20
Re: [HACKERS] pgbench progress report improvements
For others following along, Pavel Stehule reviewed this on a different thread: http://www.postgresql.org/message-id/cafj8pravyuhv8x3feewasggvdcijogvlzsavd7rmwn9mw8r...@mail.gmail.com On Tue, Aug 06, 2013 at 10:47:14AM +0200, Fabien wrote: Improve pgbench measurements progress report These changes are loosely coupled; please separate them into several patch files: - Use progress option both under init bench. Activate progress report by default, every 5 seconds. When initializing, --quiet reverts to the old every 100,000 insertions behavior... Patch (1): Change the default from --progress=0 to --progress=5 This has a disadvantage of causing two extra gettimeofday() calls per transaction. That's not cheap on all platforms; a user comparing pgbench results across versions will need to make a point of disabling this again to make his results comparable. Given that threat and uncertainty about which default would be more popular, I think we should drop this part. Patch (2): Make --initialize mode respect --progress. The definition you've chosen for --quiet makes that option contrary to its own name: message-per-100k-tuples is typically more voluminous than your proposed new default of message-per-5s. In fact, since --quiet currently switches from message-per-100k-tuples to message-per-5s, your patch gives it the exact opposite of its present effect. During the 9.3 development cycle, we _twice_ made changes pertaining to --initialize message frequency: http://www.postgresql.org/message-id/flat/20120620.170427.347012755716399568.t-is...@sraoss.co.jp http://www.postgresql.org/message-id/flat/1346472039.18010.10.ca...@vanquo.pezone.net That gives me pause about working through yet another change; we keep burning time on this minor issue. - Measure transaction latency instead of computing it. The previous computed figure does not make sense under throttling, as sleep throttling time was included in the figures. The latency and its standard deviation are printed out under progress and in the final report. Patch (3): Report the standard deviation of latency. Seems potentially useful; see inline comments below. In my limited testing, the interesting latency cases involved stray clusters of transactions showing 10x-100x mean latency. If that's typical, I'd doubt mean/stddev is a great analytical tool. But I have little reason to believe that my experience here was representative. Patch (4): Redefine latency as reported by pgbench and report lag more. If we do this, should we also augment the --log output to contain the figures necessary to reconstruct this calculation? I think that would mean adding fields for the time when the first command of the transaction was sent. - Take thread start time at the beginning of the thread. Otherwise it includes pretty slow thread/fork system start times in the measurements. May help with bug #8326. Patch (5) That theory is sound, but I would like at least one report reproducing that behavior and finding that this change improved it. - Reduce and compensate throttling underestimation bias. This is a consequence of relying on an integer random generator. It was about 0.5% with 1000 distinct values. Multiplier added to compensate the 0.05% bias with 1 distinct integer values. Patch (6) See inline comment below. - Updated documentation help message. Include any relevant documentation and --help updates with the patch necessitating them. If there are errors in the documentation today, put fixes for those in a separate patch (7). Additionally, this patch contains numerous whitespace-only changes. Do not mix those into a patch that makes a functional change. Most of them would be done or undone by the next pgindent run, so I suggest simply dropping them. See the first entry here: https://wiki.postgresql.org/wiki/Submitting_a_Patch#Reasons_your_patch_might_be_returned Patches (5) and (6) are nicely small and free of UI questions. At a minimum, let's try to get those done in this CommitFest. I propose dropping patches (1) and (2). Patches (3) and (4) have open questions, but I think they have good value potential. Sample output under benchmarking with --progress=1 progress: 36.0 s, 10.9 tps, 91.864 +- 124.874 ms lat progress: 37.0 s, 115.2 tps, 8.678 +- 1.792 ms lat x +- y does not convey that y is a standard deviation. I suggest getting the term stddev in there somehow, maybe like this: progress: 37.0 s, 115.2 tps, latency avg=8.678 ms stddev=1.792 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c - -R, --rate=SPEC target rate in transactions per second\n + -R, --rate=NUM target rate in transactions per second\n This would fall under patch (7) if you feel it's needed. @@ -928,14 +931,18 @@ top: * Use inverse transform sampling to randomly generate a delay,