Re: [HACKERS] postgres_fdw behaves oddly
(2014/11/17 19:36), Ashutosh Bapat wrote: Here are my comments about the patch fscan_reltargetlist.patch Thanks for the review! 1. This isn't your change, but we might be able to get rid of assignment 2071 /* Are any system columns requested from rel? */ 2072 scan_plan-fsSystemCol = false; since make_foreignscan() already does that. But I will leave upto you to remove this assignment or not. As you pointed out, the assignment is redundant, but I think that that'd improve the clarity and readability. So, I'd vote for leaving that as is. 2. Instead of using rel-reltargetlist, we should use the tlist passed by caller. This is the tlist expected from the Plan node. For foreign scans it will be same as rel-reltargetlist. Using tlist would make the function consistent with create_*scan_plan functions. I disagree with that for the reasons mentioned below: * For a foreign scan, tlist is *not* necessarily the same as rel-reltargetlist (ie, there is a case where tlist contains all user attributes while rel-reltargetlist contains only attributes actually needed by the query). In such a case it'd be inefficient to use tlist rather than rel-reltargetlist. * I think that it'd improve the readability to match the code with other places that execute similar processing, such as check_index_only() and remove_unused_subquery_outputs(). Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw behaves oddly
(2014/11/17 19:54), Ashutosh Bapat wrote: Here are comments for postgres_fdw-syscol patch. Thanks for the review! Code --- 1. Instead of a single liner comment System columns can't be sent to remote., it might be better to explain why system columns can't be sent to the remote. Done. 2. The comment in deparseVar is single line comment, so it should start and end on the same line i.e. /* and */ should be on the same line. Done. 3. Since there is already a testcase which triggered this particular change, it will good, if we add that to regression in postgres_fdw. Done. Please find attached an updated version of the patch. Thanks, Best regards, Etsuro Fujita *** a/contrib/postgres_fdw/deparse.c --- b/contrib/postgres_fdw/deparse.c *** *** 253,258 foreign_expr_walker(Node *node, --- 253,268 if (var-varno == glob_cxt-foreignrel-relid var-varlevelsup == 0) { + /* + * System columns can't be sent to remote since the values + * for system columns might be different between local and + * remote. + * + * XXX: we should probably send ctid to remote. + */ + if (var-varattno 0) + return false; + /* Var belongs to foreign table */ collation = var-varcollid; state = OidIsValid(collation) ? FDW_COLLATE_SAFE : FDW_COLLATE_NONE; *** *** 1262,1267 deparseVar(Var *node, deparse_expr_cxt *context) --- 1272,1280 if (node-varno == context-foreignrel-relid node-varlevelsup == 0) { + /* Var shouldn't be a system column */ + Assert(node-varattno = 0); + /* Var belongs to foreign table */ deparseColumnRef(buf, node-varno, node-varattno, context-root); } *** a/contrib/postgres_fdw/expected/postgres_fdw.out --- b/contrib/postgres_fdw/expected/postgres_fdw.out *** *** 353,358 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = t1.c2; --- 353,368 Remote SQL: SELECT C 1, c2, c3, c4, c5, c6, c7, c8 FROM S 1.T 1 WHERE ((C 1 = c2)) (3 rows) + -- where with system column conditions + EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.tableoid != 0; +QUERY PLAN + - + Foreign Scan on public.ft1 t1 +Output: c1, c2, c3, c4, c5, c6, c7, c8 +Filter: (t1.tableoid 0::oid) +Remote SQL: SELECT C 1, c2, c3, c4, c5, c6, c7, c8 FROM S 1.T 1 + (4 rows) + -- === -- WHERE with remotely-executable conditions -- === *** a/contrib/postgres_fdw/sql/postgres_fdw.sql --- b/contrib/postgres_fdw/sql/postgres_fdw.sql *** *** 180,185 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = postgres_fdw_a --- 180,187 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 === t1.c2; EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = abs(t1.c2); EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = t1.c2; + -- where with system column conditions + EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.tableoid != 0; -- === -- WHERE with remotely-executable conditions -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format and API changes (9.5)
On Tue, Nov 18, 2014 at 4:31 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: WAL insertion performance = To measure the performance of generating WAL, I ran the wal-update-testsuite.sh that Amit also ran earlier. The cluster was initialized with: shared_buffers=512MB checkpoint_segments=30 fsync=off autovacuum=off full_page_writes=off [results] Summary: No meaningful difference in runtime. If I am seeing that correctly, WAL generated is reduced for all the tests, except for the case of hundred tiny fields where more WAL is generated. Now the duration time seems to be generally reduced, some noise (?) making it sometimes higher. WAL replay performance == To test WAL replay performance, I ran pgbench with WAL archiving enabled, and timed the replay of the generated WAL. I used the attached script, replay-perf-test.sh for that. full_page_writes were disabled, because replaying full page images is quite different from replaying other records. (Performance of full-page images is interesting too, but it's not expected that these WAL format changes make much difference to that). In summary, there is no significant difference in replay performance. The amount of WAL generated is much smaller with the patch. This concludes my performance testing, until someone wants to see some other scenario being tested. I'm happy with the results. I think you can, that's a great study, and this proves to be a gain on many fields. If this goes in, it is going to be one of the largest patches committed ever. $ git diff --stat | tail -n1 91 files changed, 3895 insertions(+), 4305 deletions(-) There are still some XXX blocks here and there in the code.. But nothing really huge, like here: -* checks could go wrong too. +* checks could go wrong too. XXX does this comment still make sense? */ - Assert(xldata-blkno != xldata-blknoNew); + Assert(blkno != blknoNew); Btw, did you do a run with the buffer capture facility and checked for page differences? Except that, after going through the code once again, ISTM that the patch is in a nice state. It may be better to wait for some input from Andres, he may catch some issues I haven't spotted. Regards, -- Michael
Re: [HACKERS] Incorrect initialization of sentPtr in walsender.c
On Thu, Oct 16, 2014 at 9:01 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Thu, Oct 16, 2014 at 7:09 PM, Simon Riggs si...@2ndquadrant.com wrote: I find it confusing that the Lowest pointer value is also Invalid. Valid != Invalid In complement to that, note that I mentioned Invalid should be UINT_MAX for clarity. Worth noticing that this patch has been marked as returned with feedback. -- Michael
Re: [HACKERS] WAL format and API changes (9.5)
On 11/18/2014 10:28 AM, Michael Paquier wrote: Btw, did you do a run with the buffer capture facility and checked for page differences? Yes. That's how I bumped into the bug fixed in c73669c0. That tool has been tremendously helpful. Except that, after going through the code once again, ISTM that the patch is in a nice state. It may be better to wait for some input from Andres, he may catch some issues I haven't spotted. Yeah, I'm sure there are tons of small things. I'll have to take a small break and read through the patch myself after a few days. But I'd like to get this committed pretty soon. I believe everyone agrees with the big picture, even if there's some little tweaking left to do. It's a pain to maintain as a patch, and I believe the FPW compression patch is waiting for this to land. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4
On 17 November 2014 22:27, Tom Lane t...@sss.pgh.pa.us wrote: Another idea that occurred to me is to run a planning cycle in which the actual parameter values are made available to the planner, but as estimates not hard constants (this facility already exists, it's just not being used by plancache.c). This would yield cost estimates that are more safely comparable to the custom plan. But I'm not sure that we'd want to expend yet another planning cycle to do this, nor am I sure that we'd want to use such a plan as The Generic Plan. regards, tom lane Perhaps a somewhat naive idea, I only have the broad picture of how the query planner works, but... What if prepared statements would not store an entirely pinned down version of the query plan, but instead stores a smashed down version of the query plan that still leaves room for choosing some different paths based on key decision criteria? For example, if an input parameter value matches the most common values, choose the sequential scan path (as in the OP's case, IIRC) and if it's not, attempt an index scan. Of course, one implication of doing this is likely a decrease in planning performance (which matters for simple queries), but if it results in better plan choices for complex queries that may be a nett gain. I recently followed an introductory class about neural networks and the decision logic seems to look like the neuron/perceptron pattern. I'm just throwing this out here in case it's a viable option and nobody else in the world thought of this, however unlikely ;) -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
On Mon, Nov 17, 2014 at 1:25 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/11/12 20:04), Ashutosh Bapat wrote: I reviewed fdw-chk-3 patch. Here are my comments Thanks for the review! Tests --- 1. The tests added in file_fdw module look good. We should add tests for CREATE TABLE with CHECK CONSTRAINT also. Agreed. I added the tests, and also changed the proposed tests a bit. 2. For postgres_fdw we need tests to check the behaviour in case the constraints mismatch between the remote table and its local foreign table declaration in case of INSERT, UPDATE and SELECT. Done. 3. In the testcases for postgres_fdw it seems that you have forgotten to add statement after SET constraint_exclusion to 'partition' I added the statement. 4. In test foreign_data there are changes to fix the diffs caused by these changes like below ALTER FOREIGN TABLE ft1 DROP CONSTRAINT no_const; -- ERROR -ERROR: ft1 is not a table +ERROR: constraint no_const of relation ft1 does not exist ALTER FOREIGN TABLE ft1 DROP CONSTRAINT IF EXISTS no_const; -ERROR: ft1 is not a table +NOTICE: constraint no_const of relation ft1 does not exist, skipping ALTER FOREIGN TABLE ft1 DROP CONSTRAINT ft1_c1_check; -ERROR: ft1 is not a table +ERROR: constraint ft1_c1_check of relation ft1 does not exist Earlier when constraints were not supported for FOREIGN TABLE, these tests made sure the same functionality. So, even though the corresponding constraints were not created on the table (in fact it didn't allow the creation as well). Now that the constraints are allowed, I think the tests for no_const (without IF EXISTS) and ft1_c1_check are duplicating the same testcase. May be we should review this set of statement in the light of new functionality. Agreed. I removed the DROP CONSTRAINT ft1_c1_check test, and added a new test for ALTER CONSTRAINT. Code and implementation -- The usage of NO INHERIT and NOT VALID with CONSTRAINT on foreign table is blocked, but corresponding documentation entry doesn't mention so. Since foreign tables can not be inherited NO INHERIT option isn't applicable to foreign tables and the constraints on the foreign tables are declarative, hence NOT VALID option is also not applicable. So, I agree with what the code is doing, but that should be reflected in documentation with this explanation. Rest of the code modifies the condition checks for CHECK CONSTRAINTs on foreign tables, and it looks good to me. Done. Other changes: * Modified one error message that I added in AddRelationNewConstraints, to match the other message there. * Revised other docs a little bit. Attached is an updated version of the patch. I looked at the patch. It looks good now. Once we have the inh patch ready, we can mark this item as ready for commiter. Thanks, Best regards, Etsuro Fujita -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] proposal: plpgsql - Assert statement
On 18 November 2014 01:00, Jim Nasby jim.na...@bluetreble.com wrote: On 11/17/14, 4:58 PM, Simon Riggs wrote: Great, looks good to me, marking as ready for committer. What is wrong with using IF ? It's a hell of a lot wordier. I've previously created a more sophisticated assert framework to allow more control over things, but ended up also using it just for simple sanity checking because it was much nicer than typeing IF THEN RAISE ERROR END IF. Why is that not a requirement for a less wordier form of IF? IF (something) THEN action Why is this problem specific to RAISE? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf
On 31 October 2014 15:18, Petr Jelinek p...@2ndquadrant.com wrote: Attached is the v2 of the patch with the review comments addressed (see below). ... Done, there is now action_at_recovery_target which can be set to either pause, continue or shutdown, defaulting to pause (which is same as old behavior of pause_at_recovery_target defaulting to true). One comment only: I think the actions should be called: pause, promote and shutdown, since continue leads immediately to promotion of the server. I'm good with this patch otherwise. Barring objections I will commit tomorrow. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [TODO] Track number of files ready to be archived in pg_stat_archiver
On 18 November 2014 06:20, Michael Paquier michael.paqu...@gmail.com wrote: the DBA may want to know how long is the queue of WAL files waiting to be archived. Agreed That's IMO something we simply forgot in the first implementation of pg_stat_archiver That's not how it appears to me. ISTM that the information requested is already available, it just needs some minor calculations to work out how many files are required. the most direct way to know that is to count the .ready files in archive_status. ...my earlier point was... pg_stat_archiver already has a column for last_archived_wal and last_failed_wal, so you can already work out how many files there must be between then and now. Perhaps that can be added directly to the view, to assist the user in calculating it. Reading the directory itself to count the file is unnecessary, except as a diagnostic. As soon as we have sent the first file, we will know the queue length at any point afterwards. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: plpgsql - Assert statement
2014-11-18 10:23 GMT+01:00 Simon Riggs si...@2ndquadrant.com: On 18 November 2014 01:00, Jim Nasby jim.na...@bluetreble.com wrote: On 11/17/14, 4:58 PM, Simon Riggs wrote: Great, looks good to me, marking as ready for committer. What is wrong with using IF ? It's a hell of a lot wordier. I've previously created a more sophisticated assert framework to allow more control over things, but ended up also using it just for simple sanity checking because it was much nicer than typeing IF THEN RAISE ERROR END IF. Why is that not a requirement for a less wordier form of IF? IF (something) THEN action statement IF is a control statement - and syntax, pattern for control statements in plpgsql is consistent. I don't want to break it (more, probably it is hardly implemented due problems in bison). PL/pgSQL, PL/SQL, Ada are well designed (in my opinion). Conditional statement has precedent in PL/pgSQL now. We support EXIT and CONTINUE WHEN, so we don't propose a new pattern, only reuse some existing. Regards Pavel Why is this problem specific to RAISE? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] alternative model for handling locking in parallel groups
On Fri, Nov 14, 2014 at 2:29 AM, Robert Haas robertmh...@gmail.com wrote: Discussion of my incomplete group locking patch seems to have converged around two points: (1) Everybody agrees that undetected deadlocks are unacceptable. (2) Nobody agrees with my proposal to treat locks held by group members as mutually non-conflicting. As was probably evident from the emails on the other thread, it was not initially clear to me why you'd EVER want heavyweight locks held by different group members to mutually conflict, but after thinking it over for a while, I started to think of cases where you would definitely want that: 1. Suppose two or more parallel workers are doing a parallel COPY. Each time the relation needs to be extended, one backend or the other will need to take the relation extension lock in Exclusive mode. Clearly, taking this lock needs to exclude both workers in the same group and also unrelated processes. 2. Suppose two or more parallel workers are doing a parallel sequential scan, with a filter condition of myfunc(a.somecol), and that myfunc(a.somecal) updates a tuple in some other table. Access to update that tuple will be serialized using tuple locks, and it's no safer for two parallel workers to do this at the same time than it would be for two unrelated processes to do it at the same time. Won't this be addressed because both updates issued from myfunc() are considered as separate commands, so w.r.t lock it should behave as 2 different updates in same transaction. I think there may be more things to make updates possible via parallel workers apart from tuple lock. On the other hand, I think there are also some cases where you pretty clearly DO want the locks among group members to be mutually non-conflicting, such as: 3. Parallel VACUUM. VACUUM takes ShareUpdateExclusiveLock, so that only one process can be vacuuming a relation at the same time. Now, if you've got several processes in a group that are collaborating to vacuum that relation, they clearly need to avoid excluding each other, but they still need to exclude other people. And in particular, nobody else should get to start vacuuming that relation until the last member of the group exits. So what you want is a ShareUpdateExclusiveLock that is, in effect, shared across the whole group, so that it's only released when the last process exits. 4. Parallel query on a locked relation. Parallel query should work on a table created in the current transaction, or one explicitly locked by user action. It's not acceptable for that to just randomly deadlock, and skipping parallelism altogether, while it'd probably be acceptable for a first version, is not going a good long-term solution. It also sounds buggy and fragile for the query planner to try to guess whether the lock requests in the parallel workers will succeed or fail when issued. Figuring such details out is the job of the lock manager or the parallelism infrastructure, not the query planner. After thinking about these cases for a bit, I came up with a new possible approach to this problem. Suppose that, at the beginning of parallelism, when we decide to start up workers, we grant all of the locks already held by the master to each worker (ignoring the normal rules for lock conflicts). Thereafter, we do everything the same as now, with no changes to the deadlock detector. That allows the lock conflicts to happen normally in the first two cases above, while preventing the unwanted lock conflicts in the second two cases. Here I think we have to consider how to pass the information about all the locks held by master to worker backends. Also I think assuming we have such an information available, still it will be considerable work to grant locks considering the number of locks we acquire [1] (based on Simon's analysis) and the additional memory they require. Finally I think deadlock detector work might also be increased as there will be now more procs to visit. In general, I think this scheme will work, but I am not sure it is worth at this stage (considering initial goal to make parallel workers will be used for read operations). [1] : http://www.postgresql.org/message-id/ca+u5nmjlubgduwjqikt6umqrfmrmrqdhpndb6z5xzdtb0ph...@mail.gmail.com With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] group locking: incomplete patch, just for discussion
On Tue, Nov 18, 2014 at 3:10 PM, Jeff Davis pg...@j-davis.com wrote: On Mon, 2014-11-17 at 14:32 -0500, Robert Haas wrote: To reiterate the basic problem here, if we do nothing at all about the lock manager, a parallel backend can stall trying to grab an AccessExclusiveLock that the user backend alread holds, either because the user backend holds an AccessExclusiveLock as well, or because some other process is waiting for one, we'll deadlock and not notice. My feeling is that we should keep the concept of a group and group leader from your patch, and improve the deadlock detector to make use of that information (looking at the code, it looks doable but not trivial). But unless I am missing something, we should separate out the lock sharing, and defer that until later. +1 to initially have something like you describe and may be an additional mechanism to grant the non-conflicting locks which are already held by master backend to child backends. I understand that allowing additional non-conflicting locks could lead to some problem if write operations are allowed via child backends as is described as point 1 in Robert's another mail [1]. However I think as initially we want to allow read only operations via child backends, this should be okay and later on if we want to support write via child backends, we might want to consider having an additional property with lock which will allow lock manager or deadlock detector to consider them separately, so that those locks won't be granted to child backends if there is conflict of same with parent. [1]: http://www.postgresql.org/message-id/ca+tgmoygplojo+lg1bebos8gdjwjtq0qdmxsyj4ihfyj11t...@mail.gmail.com With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] pg_basebackup vs. Windows and tablespaces
Amit Kapila wrote: On Sun, Nov 16, 2014 at 6:15 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: I think the fact that we use symlinks is an implementation detail; aren't them actually junction points, not symlinks, in some Windows cases? Right, but they provide same functionality as symlinks and now we are even planing to provide this feature for both linux and windows as both Tom and Robert seems to feel, it's better that way. Anyhow, I think naming any entity generally differs based on individual's perspective, so we can go with the name which appeals to more people. In case, nobody else has any preference, I will change it to what both of us can agree upon (either 'tablespace catalog', 'tablespace_info' ...). Well, I have made my argument. Since you're the submitter, feel free to select what you think is the best name. One use case mentioned upthread is having the clone be created in the same machine as the source server. Does your proposal help with it? That use case got addressed with -T option with which user can relocate tablespace directory (Commit: fb05f3c; pg_basebackup: Add support for relocating tablespaces) Okay. As far as I know, -T only works for plain mode, right? I wonder if we should make -T modify the tablespace catalog, so that the resulting file in tar output outputs names mangled per the map; that would make -T work in tar mode too. Does that make sense? (Maybe it already works that way? I didn't research.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] BRIN and PageIndexDeleteNoCompact
BRIN added a new function, PageIndexDeleteNoCompact(), which is like PageIndexMultiDelete but does not remove unused line pointers. However, all the callers pass it just a single offset. So the callers would really be more interested in efficiently squeezing out a single tuple from a page, like PageIndexTupleDelete() does, than a bulk operation. PageIndexDeleteNoCompact() is not optimal for the single item case. Note that PageIndexMultiDelete() first checks if if the number of removed items is small (= 2), and just calls PageIndexTupleDelete in a loop in that case. How about replacing PageIndexDeleteNoCompact() with something more like PageIndexTupleDelete()? It ought to be both faster and simpler. PS. The comment above PageIndexDeleteNoCompact says that unused items at the end of the array are removed. But looking at the code, I don't see it doing that. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: plpgsql - Assert statement
On 11/18/2014 04:23 AM, Simon Riggs wrote: On 18 November 2014 01:00, Jim Nasby jim.na...@bluetreble.com wrote: On 11/17/14, 4:58 PM, Simon Riggs wrote: Great, looks good to me, marking as ready for committer. What is wrong with using IF ? It's a hell of a lot wordier. I've previously created a more sophisticated assert framework to allow more control over things, but ended up also using it just for simple sanity checking because it was much nicer than typeing IF THEN RAISE ERROR END IF. Why is that not a requirement for a less wordier form of IF? IF (something) THEN action Why is this problem specific to RAISE? Please, no. The use of closed form rather than open form IF statements is one of the things Ada (and by inheritance PLPGSQL) got right. Frankly, I find this whole proposal, and all the suggested alternatives, somewhat ill-conceived. PLPGSQL is a wordy language. If you want something more terse, use something else. Adding these sorts of syntactic sugar warts onto the language doesn't seem like a terribly good way to proceed. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : REINDEX SCHEMA
On 23 October 2014 00:21, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Attached patch is latest version patch I modified above. Also, I noticed I had forgotten to add the patch regarding document of reindexdb. Please don't use pg_catalog in the regression test. That way we will need to update the expected file whenever a new catalog is added, which seems pointless. Maybe create a schema with a couple of tables specifically for this, instead. These patches look fine to me. Don't see anybody objecting either. Are you looking to commit them, or shall I? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: plpgsql - Assert statement
2014-11-18 17:08 GMT+01:00 Simon Riggs si...@2ndquadrant.com: On 18 November 2014 12:29, Pavel Stehule pavel.steh...@gmail.com wrote: Why is that not a requirement for a less wordier form of IF? IF (something) THEN action statement IF is a control statement - and syntax, pattern for control statements in plpgsql is consistent. I don't want to break it (more, probably it is hardly implemented due problems in bison). PL/pgSQL, PL/SQL, Ada are well designed (in my opinion). Conditional statement has precedent in PL/pgSQL now. We support EXIT and CONTINUE WHEN, so we don't propose a new pattern, only reuse some existing. I commend your wish to improve PL/pgSQL, I'm sorry to say that I just don't see how this moves us forwards. It is not big step, but it open some doors What this does is introduce a fairly restricted new feature that removes backwards compatibility and takes us further away from Oracle compatibility. It is not valid argument for this use case. RAISE statement is not compatible with Oracle long time. WHEN clause change nothing. If I want to write an Assert style test that fits on a single line, just write PEFORM raise_error_when(boolean expression); it is possibility too. But a) it is limited little bit, b) we didn't find a agreement how to design it for upstream. c) I am thinking so there is a space for enhancing RAISE statement for other use cases - tracing, global condition assertions etc which requires a very short function like this CREATE OR REPLACE FUNCTION raise_error_when(test boolean) returns void language plpgsql AS $$ DECLARE BEGIN IF (test) THEN RAISE EXCEPTION 'assertion failure'; END IF; END; $$; -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
[HACKERS] PageRepairFragmentation performance
When profiling replay the WAL generated by pgbench, I noticed the PageRepairFragmentation consumes a large fraction of the CPU time: Per perf report: + 33.44% 6.79% postmaster postgres[.] PageRepairFragmentation The 33.44% figure includes all the functions called by PageRepairFragmentation. Looking at the profile closer, most of that time seems to be spent in sorting the item ids to physical order, so that they can be memmoved in place: + 83.86% 0.00% postmaster libc-2.19.so[.] __libc_start_main + 83.86% 0.00% postmaster postgres[.] main + 83.86% 0.00% postmaster postgres[.] PostmasterMain + 83.86% 0.00% postmaster postgres[.] 0x0023208d + 83.86% 0.00% postmaster postgres[.] AuxiliaryProcessMain + 83.86% 0.00% postmaster postgres[.] StartupProcessMain + 83.63% 1.86% postmaster postgres[.] StartupXLOG + 45.85% 0.10% postmaster postgres[.] heap2_redo + 33.44% 6.79% postmaster postgres[.] PageRepairFragmentation + 24.60%16.63% postmaster postgres[.] pg_qsort + 18.04% 0.23% postmaster postgres[.] heap_redo + 17.07% 1.53% postmaster postgres[.] XLogReadBufferExtended + 16.20% 0.30% postmaster postgres[.] XLogReadBufferForRedoEx + 14.38% 0.31% postmaster postgres[.] ReadRecord + 13.90% 1.29% postmaster postgres[.] XLogReadRecord + 12.40% 1.54% postmaster postgres[.] heap_xlog_update + 12.08%12.06% postmaster postgres[.] ValidXLogRecord + 11.73% 0.10% postmaster postgres[.] XLogReadBufferForRedo + 10.89% 0.27% postmaster postgres[.] ReadBufferWithoutRelcac +8.49% 1.07% postmaster libc-2.19.so[.] __GI___libc_read +7.61% 0.71% postmaster postgres[.] ReadBuffer_common +5.64% 0.48% postmaster postgres[.] smgropen +5.48% 5.47% postmaster postgres[.] itemoffcompare +5.40% 5.38% postmaster postgres[.] hash_search_with_hash_v +4.70% 4.69% postmaster libc-2.19.so[.] __memmove_ssse3_back +4.30% 0.77% postmaster libc-2.19.so[.] __GI___libc_lseek64 +4.29% 0.20% postmaster postgres[.] heap_xlog_insert +3.88% 3.87% postmaster postgres[.] swapfunc +2.81% 0.09% postmaster postgres[.] XLogRecordPageWithFreeS +2.76% 0.00% cp libc-2.19.so[.] __GI___libc_write +2.68% 0.07% postmaster postgres[.] BufTableLookup +2.58% 2.58% postmaster postgres[.] LWLockAcquire +2.17% 0.14% postmaster postgres[.] tag_hash So there's clearly some room for improvement here. A couple of ideas: 1. Replace the qsort with something cheaper. The itemid arrays being sorted are small, a few hundred item at most, usually even smaller. In this pgbench test case I used, the typical size is about 60. With a small array a plain insertion sort is cheaper than the generic qsort(), because it can avoid the function overhead etc. involved with generic qsort. Or we could use something smarter, like a radix sort, knowing that we're sorting small integers. Or we could implement an inlineable version of qsort and use that. 2. Instead of sorting the array and using memmove in-place, we could copy all the tuples to a temporary buffer in arbitrary order, and finally copy the temporary copy back to the buffer. That requires two memory copies per tuple, instead of one memmove, but memcpy() is pretty darn fast. It would be a loss when there are only a few large tuples on the page, so that avoiding the sort doesn't help, or when the tuples are mostly already in the correct places, so that most of the memmove()s are no-ops. But with a lot of small tuples, it would be a win, and it would be simple. The second option would change behaviour slightly, as the tuples would be placed on the page in different physical order than before. It wouldn't be visible to to users, though. I spent some time hacking approach 1, and replaced the qsort() call with a bucket sort. I'm not sure if a bucket sort is optimal, or better than a specialized quicksort implementation, but it seemed simple. With the testcase I've been using - replaying about 2GB of WAL generated by pgbench - this reduces the replay time from about 55 s to 45 s. Thoughts? Attached is the patch I put together. It's actually two patches: the first is just refactoring, putting the common code between PageRepairFragmentation, PageIndexMultiDelete, and PageIndexDeleteNoCompact to function. The second replaces the qsort(). - Heikki From 268e2cce8db21c873f10e62a308ff3b20826ecd7 Mon
Re: [HACKERS] PageRepairFragmentation performance
On 11/18/2014 07:03 PM, Heikki Linnakangas wrote: When profiling replay the WAL generated by pgbench, I noticed the PageRepairFragmentation consumes a large fraction of the CPU time: [snip] 1. Replace the qsort with something cheaper. The itemid arrays being sorted are small, a few hundred item at most, usually even smaller. In this pgbench test case I used, the typical size is about 60. With a small array a plain insertion sort is cheaper than the generic qsort(), because it can avoid the function overhead etc. involved with generic qsort. Or we could use something smarter, like a radix sort, knowing that we're sorting small integers. Or we could implement an inlineable version of qsort and use that. IIRC, we would have a theoretical complexity of quicksort and radix sort should be approximately the same for 256-1024 items... O(n*log(n)) vs O(d*n), where d is ~log2(n) or just 16 in this case. However, lexicographical (bitstring-wise ordering) might not be what we are aiming for here AFAIK, an inlined quicksort should be about the best performing sort available (most of the enhancement coming from staying within the I-cache) 2. Instead of sorting the array and using memmove in-place, we could copy all the tuples to a temporary buffer in arbitrary order, and finally copy the temporary copy back to the buffer. That requires two memory copies per tuple, instead of one memmove, but memcpy() is pretty darn fast. It would be a loss when there are only a few large tuples on the page, so that avoiding the sort doesn't help, or when the tuples are mostly already in the correct places, so that most of the memmove()s are no-ops. But with a lot of small tuples, it would be a win, and it would be simple. Memmove *should* be no slower than memcpy if both are actually translated by the compiler to use intrinsics as opposed to calling the functions --- as it seems to be done here (cfr. __memmove_ssse3_back ) A simple if in order to eliminate the no-op memmoves might as well do it, too. Just my two (euro) cents, though The second option would change behaviour slightly, as the tuples would be placed on the page in different physical order than before. It wouldn't be visible to to users, though. I spent some time hacking approach 1, and replaced the qsort() call with a bucket sort. I'm not sure if a bucket sort is optimal, or better than a specialized quicksort implementation, but it seemed simple. With the testcase I've been using - replaying about 2GB of WAL generated by pgbench - this reduces the replay time from about 55 s to 45 s. Not bad at all... though I suspect most of it might come from staying within the I-cache as opposed to regular qsort. The smaller itemIdSortData structure surely helps a bit, too :) Thoughts? Attached is the patch I put together. It's actually two patches: the first is just refactoring, putting the common code between PageRepairFragmentation, PageIndexMultiDelete, and PageIndexDeleteNoCompact to function. The second replaces the qsort(). Definitively worth-while, even if just for the refactor. The speed-up sounds very good, too. Thanks, / J.L. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_locks doesn't check for interrupts?
Hackers, Since querying pg_locks can be intrusive due to needing to lock the lock partitions, when I'm collecting data about locks I generally put a statement_timeout on it. However, I'm noticing that this statement_timeout appears to be completely ignored; I've seen this query run for up to 10 minutes* when the database is heavily loaded. This it seems likely to me that the functions under pg_locks aren't checking for interrupts. Anybody checked this already? (yes, when a 64,000 item lock table is mostly full of locks, queries against pg_locks *can* take 10 minutes) -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_locks doesn't check for interrupts?
Josh Berkus j...@agliodbs.com writes: Since querying pg_locks can be intrusive due to needing to lock the lock partitions, when I'm collecting data about locks I generally put a statement_timeout on it. However, I'm noticing that this statement_timeout appears to be completely ignored; I've seen this query run for up to 10 minutes* when the database is heavily loaded. This it seems likely to me that the functions under pg_locks aren't checking for interrupts. Anybody checked this already? Whether they do or not, I don't think we allow CHECK_FOR_INTERRUPTS to trigger while holding an LWLock. So this would not be a trivial thing to fix. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_locks doesn't check for interrupts?
On 11/18/2014 10:47 AM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: Since querying pg_locks can be intrusive due to needing to lock the lock partitions, when I'm collecting data about locks I generally put a statement_timeout on it. However, I'm noticing that this statement_timeout appears to be completely ignored; I've seen this query run for up to 10 minutes* when the database is heavily loaded. This it seems likely to me that the functions under pg_locks aren't checking for interrupts. Anybody checked this already? Whether they do or not, I don't think we allow CHECK_FOR_INTERRUPTS to trigger while holding an LWLock. So this would not be a trivial thing to fix. Hmm. So the basic problem is that querying pg_locks itself can make an already bad locking situation worse (I've seen it contribute to a total lockup, which didn't resolve until I terminated the query against pg_locks). I don't see a clear way to make it less dangerous, so I was hoping that at least making it time out made it safer to use. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BRIN and PageIndexDeleteNoCompact
Heikki Linnakangas wrote: BRIN added a new function, PageIndexDeleteNoCompact(), which is like PageIndexMultiDelete but does not remove unused line pointers. However, all the callers pass it just a single offset. So the callers would really be more interested in efficiently squeezing out a single tuple from a page, like PageIndexTupleDelete() does, than a bulk operation. Yeah, I noticed this and yes I agree there's no harm in changing it to a single-tuple mode rather than bulk operation. How about replacing PageIndexDeleteNoCompact() with something more like PageIndexTupleDelete()? It ought to be both faster and simpler. No objection. Are you working on this, or do you intend me to? How relevant is this given your current PageRepairFragmentation work? I think it will cause hard merge conflicts if done right away; should it be attempted prior to the PRF patches, or after it's done? I assume that if you do it, you can do both things simultaneously ... PS. The comment above PageIndexDeleteNoCompact says that unused items at the end of the array are removed. But looking at the code, I don't see it doing that. Hmm, it probably lost the capability during the development :-( -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: plpgsql - Assert statement
On 11/18/14, 9:31 AM, Andrew Dunstan wrote: Frankly, I find this whole proposal, and all the suggested alternatives, somewhat ill-conceived. PLPGSQL is a wordy language. If you want something more terse, use something else. Adding these sorts of syntactic sugar warts onto the language doesn't seem like a terribly good way to proceed. Such as? The enormous advantage of plpgsql is how easy it is to run SQL. Every other PL I've looked at makes that WAY harder. And that's assuming you're in an environment where you can install another PL. And honestly, I've never really found plpgsql to be terribly wordy except in a few cases (assert being one of them). My general experience has been that when I'm doing an IF (other than assert), I'm doing multiple things in the IF block, so it's really not that big a deal. As for why not do this in a separate function; yes, you can do that. But then you've needlessly added to your context stack, it's going to be a lot slower, and you can only really replace RAISE's functionality if you're in a version that has format(). If someone has another brain-flash on how to make this better I'm all ears. But I don't think arguments like use another PL or it's just syntax sugar improve things for our users. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BRIN and PageIndexDeleteNoCompact
On 11/18/2014 09:46 PM, Alvaro Herrera wrote: Heikki Linnakangas wrote: How about replacing PageIndexDeleteNoCompact() with something more like PageIndexTupleDelete()? It ought to be both faster and simpler. No objection. Are you working on this, or do you intend me to? You, please ;-). How relevant is this given your current PageRepairFragmentation work? I think it will cause hard merge conflicts if done right away; should it be attempted prior to the PRF patches, or after it's done? I assume that if you do it, you can do both things simultaneously ... In my PageRepairFragmentation patch, I did refactor the common part of PageIndexDeleteNoCompact to use the shared function. It'll look completely different after it's rewritten to look more like PageIndexTupleDelete, and won't have anything in common with PageRepairFragmentation anymore, and won't really be any easier for me to do as part of the PFR work. So don't wait for that, just go ahead and make the change, whenever suits you. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: plpgsql - Assert statement
On 11/18/2014 02:53 PM, Jim Nasby wrote: On 11/18/14, 9:31 AM, Andrew Dunstan wrote: Frankly, I find this whole proposal, and all the suggested alternatives, somewhat ill-conceived. PLPGSQL is a wordy language. If you want something more terse, use something else. Adding these sorts of syntactic sugar warts onto the language doesn't seem like a terribly good way to proceed. Such as? The enormous advantage of plpgsql is how easy it is to run SQL. Every other PL I've looked at makes that WAY harder. And that's assuming you're in an environment where you can install another PL. And honestly, I've never really found plpgsql to be terribly wordy except in a few cases (assert being one of them). My general experience has been that when I'm doing an IF (other than assert), I'm doing multiple things in the IF block, so it's really not that big a deal. I frequently write one-statement bodies of IF statements. To me that's not a big deal either :-) cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Additional role attributes superuser review
All, 2. Catalog Representation: Condense all attributes in pg_authid to single int64 column and create bitmasks accordingly. I have been working on these changes and I was hoping for some assistance/recommendations. Currently, I am using int32 simply because int64 is causing some issues. The issue is that genbki.pl is not able to associate it with the int8 type as defined in pg_type.h. Therefore Schema_pg_authid in schemapg.h isn't defined correctly. I've been digging and scratching my head on this one but I have reached a point where I think it would be better just to ask. Any thoughts or recommendations on how I should approach this? -Adam -- Adam Brightwell - adam.brightw...@crunchydatasolutions.com Database Engineer - www.crunchydatasolutions.com
Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg
On 11/17/14, 10:50 AM, Tomas Vondra wrote: Either of these restrictions would prevent a situation where a context has to update accounting for two parent contexts. That'd allow updating a single place (because each context has a single parent context with tracking requested). Another option might be to be lazy on updating parent contexts. I'm thinking something like keep a boolean (or make a size of 0 magic) that indicates whether a context has up-to-date info from it's children or not. That would allow you to only update the complete size when you need it, but if your children haven't been touched since the last time you calculated then you're don't need to recalc. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_test_fsync file descriptor leak
The open_datasync code opens the output file as a test to make sure the flags are accepted by the OS, and if it succeeds it then immediately opens the file again with the same flags, overwriting and so leaking the descriptor from the previous open. On Windows MinGW, this prevents the final unlink from working, as the file is still open. Trivial fix attached. Thanks, Jeff pg_test_fsync_leak.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: plpgsql - Assert statement
2014-11-18 21:27 GMT+01:00 Andrew Dunstan and...@dunslane.net: On 11/18/2014 02:53 PM, Jim Nasby wrote: On 11/18/14, 9:31 AM, Andrew Dunstan wrote: Frankly, I find this whole proposal, and all the suggested alternatives, somewhat ill-conceived. PLPGSQL is a wordy language. If you want something more terse, use something else. Adding these sorts of syntactic sugar warts onto the language doesn't seem like a terribly good way to proceed. Such as? The enormous advantage of plpgsql is how easy it is to run SQL. Every other PL I've looked at makes that WAY harder. And that's assuming you're in an environment where you can install another PL. And honestly, I've never really found plpgsql to be terribly wordy except in a few cases (assert being one of them). My general experience has been that when I'm doing an IF (other than assert), I'm doing multiple things in the IF block, so it's really not that big a deal. I frequently write one-statement bodies of IF statements. To me that's not a big deal either :-) anybody did it, but it doesn't need so it is perfect :) I understand well to Jim' feeling. I am looking to Ada 2005 language ... a design of RAISE WITH shows so RAISE statement is extensible in Ada too. Sure - we can live without it, but I don't think so we do some wrong with introduction RAISE WHEN and I am sure, so a live with this feature can be more fun for someone, who intensive use this pattern. cheers andrew
Re: [HACKERS] proposal: plpgsql - Assert statement
On 18/11/14 22:11, Pavel Stehule wrote: 2014-11-18 21:27 GMT+01:00 Andrew Dunstan and...@dunslane.net mailto:and...@dunslane.net: I frequently write one-statement bodies of IF statements. To me that's not a big deal either :-) anybody did it, but it doesn't need so it is perfect :) I understand well to Jim' feeling. I am looking to Ada 2005 language ... a design of RAISE WITH shows so RAISE statement is extensible in Ada too. Sure - we can live without it, but I don't think so we do some wrong with introduction RAISE WHEN and I am sure, so a live with this feature can be more fun for someone, who intensive use this pattern. Personally, I see this as natural extension of the conditional block control which we already have for loops with CONTINUE WHEN and EXIT WHEN. This basically extends it to any block and it seems quite natural to have it for me... -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Refactoring code for sync node detection
On Tue, Nov 18, 2014 at 6:33 PM, Simon Riggs si...@2ndquadrant.com wrote: Can we just wait on this patch until we have the whole feature? Well, this may take some time to even define, and even if goals are clearly defined this may take even more time to have a prototype to discuss about. We quite often break larger patches down into smaller ones, but if they arrive in lots of small pieces its more difficult to understand and correct issues that are happening on the larger scale. Churning the same piece of code multiple times is rather mind numbing. Hm. I think that we are losing ourselves in this thread. The primary goal is to remove a code duplication between syncrep.c and walsender,c that exists since 9.1. Would it be possible to focus only on that for now? That's still the topic of this CF. -- Michael
Re: [HACKERS] proposal: plpgsql - Assert statement
On 11/18/2014 04:11 PM, Pavel Stehule wrote: 2014-11-18 21:27 GMT+01:00 Andrew Dunstan and...@dunslane.net mailto:and...@dunslane.net: On 11/18/2014 02:53 PM, Jim Nasby wrote: On 11/18/14, 9:31 AM, Andrew Dunstan wrote: Frankly, I find this whole proposal, and all the suggested alternatives, somewhat ill-conceived. PLPGSQL is a wordy language. If you want something more terse, use something else. Adding these sorts of syntactic sugar warts onto the language doesn't seem like a terribly good way to proceed. Such as? The enormous advantage of plpgsql is how easy it is to run SQL. Every other PL I've looked at makes that WAY harder. And that's assuming you're in an environment where you can install another PL. And honestly, I've never really found plpgsql to be terribly wordy except in a few cases (assert being one of them). My general experience has been that when I'm doing an IF (other than assert), I'm doing multiple things in the IF block, so it's really not that big a deal. I frequently write one-statement bodies of IF statements. To me that's not a big deal either :-) anybody did it, but it doesn't need so it is perfect :) I understand well to Jim' feeling. I am looking to Ada 2005 language ... a design of RAISE WITH shows so RAISE statement is extensible in Ada too. Sure - we can live without it, but I don't think so we do some wrong with introduction RAISE WHEN and I am sure, so a live with this feature can be more fun for someone, who intensive use this pattern. (drags out recently purchased copy of Barnes Ada 2012) Ada's RAISE exception_name WITH string; is more or less the equivalent of our RAISE level 'format_string'; So I don't think there's much analogy there. I'm not going to die in a ditch over this, but it does seem to me very largely unnecessary. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: plpgsql - Assert statement
2014-11-18 22:28 GMT+01:00 Andrew Dunstan and...@dunslane.net: On 11/18/2014 04:11 PM, Pavel Stehule wrote: 2014-11-18 21:27 GMT+01:00 Andrew Dunstan and...@dunslane.net mailto: and...@dunslane.net: On 11/18/2014 02:53 PM, Jim Nasby wrote: On 11/18/14, 9:31 AM, Andrew Dunstan wrote: Frankly, I find this whole proposal, and all the suggested alternatives, somewhat ill-conceived. PLPGSQL is a wordy language. If you want something more terse, use something else. Adding these sorts of syntactic sugar warts onto the language doesn't seem like a terribly good way to proceed. Such as? The enormous advantage of plpgsql is how easy it is to run SQL. Every other PL I've looked at makes that WAY harder. And that's assuming you're in an environment where you can install another PL. And honestly, I've never really found plpgsql to be terribly wordy except in a few cases (assert being one of them). My general experience has been that when I'm doing an IF (other than assert), I'm doing multiple things in the IF block, so it's really not that big a deal. I frequently write one-statement bodies of IF statements. To me that's not a big deal either :-) anybody did it, but it doesn't need so it is perfect :) I understand well to Jim' feeling. I am looking to Ada 2005 language ... a design of RAISE WITH shows so RAISE statement is extensible in Ada too. Sure - we can live without it, but I don't think so we do some wrong with introduction RAISE WHEN and I am sure, so a live with this feature can be more fun for someone, who intensive use this pattern. (drags out recently purchased copy of Barnes Ada 2012) Ada's RAISE exception_name WITH string; is more or less the equivalent of our RAISE level 'format_string'; So I don't think there's much analogy there. I used it as analogy of immutability of this statement in Ada, I'm not going to die in a ditch over this, but it does seem to me very largely unnecessary. cheers andrew
[HACKERS] Use of recent Russian TZ changes in regression tests
[ moving discussion from -packagers to -hackers ] Summary for onlookers: Bjorn Munch complained that the timestamptz regression tests added in commit b2cbced9e failed on his Solaris machine; it emerged that he uses --with-system-tzdata and the tzdata files on his machine predate 2014h when the recent Russian Federation changes appeared. The new tests intentionally rely on the Oct 2014 Russian zone change, so the failure is unsurprising; the question is whether we want to keep that behavior. I wrote: Christoph Berg c...@df7cb.de writes: Re: Tom Lane 2014-11-18 1327.1416329...@sss.pgh.pa.us Bjorn Munch bjorn.mu...@oracle.com writes: I now discovered that the regression test timestamptz is failing. It has always failed on Solaris 10 due to a 2038 bug that has been fixed in Solaris 11. But now I also get failures on Solaris 11. Hmm ... do you build with --with-system-tzdata, and if so, is the system timezone database up to date? I ran into the same problem. A current tzdata version fixes the problem, but it's not ideal that this is now a dependency. Last time I complained about that problem, the America/Metlaka (sp?) reference was changed to something older that is also included in older tzdata versions. Possibly the regression tests could use some older dates and not depend on Europe/Moscow from Oct 2014. Perhaps. I'm not aware offhand of other cases where a zone abbreviation changed meaning twice; the point of the tests is to make sure the abbreviation decoding code can deal with such cases, so we'd need to find an older similar instance before I'd be happy about changing the tests. I poked around in the IANA files and found some other cases where a zone went forward and back without any DST involved, for instance Kosrae: Zone Pacific/Kosrae 10:51:56 - LMT 1901 11:00 - KOST1969 Oct # Kosrae Time 12:00 - KOST1999 11:00 - KOST That entry has been unchanged for quite some time, but so has this comment about it: # From Paul Eggert (1999-10-29): # The Federated States of Micronesia Visitors Board writes in # The Federated States of Micronesia - Visitor Information (1999-01-26) # http://www.fsmgov.org/info/clocks.html # that Truk and Yap are UTC+10, and Ponape and Kosrae are UTC+11. # We don't know when Kosrae switched from UTC+12; assume January 1 for now. So the problem with depending on this entry for testing is that if somebody pops up and tells the IANA maintainers just when the time switch occurred, they'll change the entry and break our test. The one or two other cases where this happened are about as underdocumented as Kosrae; America/Guyana is an example. The good thing about testing with the MSK changes is that those are quite well-documented and so we don't have to fear getting blindsided by future updates to the IANA database. So basically we are trading off known short term pain (for people on machines with old TZ files) against a risk of unexpected future pain. My inclination is to leave the test as-is, but I'm willing to make the changes if people would rather bet the other way. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Additional role attributes superuser review
All, Currently, I am using int32 simply because int64 is causing some issues. The issue is that genbki.pl is not able to associate it with the int8 type as defined in pg_type.h. Therefore Schema_pg_authid in schemapg.h isn't defined correctly. I've been digging and scratching my head on this one but I have reached a point where I think it would be better just to ask. Attached is a quite trivial patch that addresses the int64 (C) to int8 (SQL) mapping issue. Further digging revealed that Catalog.pm wasn't accounting for int64 (thanks Stephen). Would it be better to include this change as a separate patch (as attached) or would it be preferable to include with a larger role attribute bitmask patch? Thanks, Adam -- Adam Brightwell - adam.brightw...@crunchydatasolutions.com Database Engineer - www.crunchydatasolutions.com diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm new file mode 100644 index eb91c53..523b379 *** a/src/backend/catalog/Catalog.pm --- b/src/backend/catalog/Catalog.pm *** sub Catalogs *** 33,38 --- 33,39 my %RENAME_ATTTYPE = ( 'int16' = 'int2', 'int32' = 'int4', + 'int64' = 'int8', 'Oid' = 'oid', 'NameData' = 'name', 'TransactionId' = 'xid'); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN pageinspect functions
On Tue, Nov 4, 2014 at 7:26 AM, Amit Kapila amit.kapil...@gmail.com wrote: I think these functions will be quite useful for debugging purpose and we already have similar function's for other index (btree). This patch has bitrotted. I attach rebased revision, for the convenience of others - V1.3 of pageinspect will now incorporate both GIN stuff, and BRIN stuff. Seems like this patch was affected by the recent problems with header includes - that's fixed. Do you intend to fix this up? + /* TODO: array of decoded item pointers */ + nulls[2] = true; -- Peter Geoghegan From 5d10bfe4f6db5e37dcf087d62f42cbc6c9423c26 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan p...@heroku.com Date: Tue, 18 Nov 2014 13:38:12 -0800 Subject: [PATCH] Add pageinspect functions for inspecting GIN indexes. --- contrib/pageinspect/Makefile | 2 +- contrib/pageinspect/ginfuncs.c| 265 ++ contrib/pageinspect/pageinspect--1.2--1.3.sql | 37 contrib/pageinspect/pageinspect--1.3.sql | 42 4 files changed, 345 insertions(+), 1 deletion(-) create mode 100644 contrib/pageinspect/ginfuncs.c diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile index a59de8a..e651543 100644 --- a/contrib/pageinspect/Makefile +++ b/contrib/pageinspect/Makefile @@ -1,7 +1,7 @@ # contrib/pageinspect/Makefile MODULE_big = pageinspect -OBJS = rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o brinfuncs.o $(WIN32RES) +OBJS = rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o brinfuncs.o ginfuncs.o $(WIN32RES) EXTENSION = pageinspect DATA = pageinspect--1.3.sql pageinspect--1.0--1.1.sql \ diff --git a/contrib/pageinspect/ginfuncs.c b/contrib/pageinspect/ginfuncs.c new file mode 100644 index 000..0602873 --- /dev/null +++ b/contrib/pageinspect/ginfuncs.c @@ -0,0 +1,265 @@ +/* + * contrib/pageinspect/ginfuncs.c + */ + +#include postgres.h + +#include access/gin.h +#include access/gin_private.h +#include access/htup_details.h +#include catalog/namespace.h +#include catalog/pg_type.h +#include funcapi.h +#include miscadmin.h +#include utils/array.h +#include utils/builtins.h +#include utils/rel.h + +#define DatumGetItemPointer(X) ((ItemPointer) DatumGetPointer(X)) +#define ItemPointerGetDatum(X) PointerGetDatum(X) + +PG_FUNCTION_INFO_V1(gin_metapage); +PG_FUNCTION_INFO_V1(gin_pageopaq); +PG_FUNCTION_INFO_V1(gin_dataleafpage); + +Datum +gin_metapage(PG_FUNCTION_ARGS) +{ + bytea *raw_page = PG_GETARG_BYTEA_P(0); + int raw_page_size; + TupleDesc tupdesc; + Page page; + GinPageOpaque opaq; + GinMetaPageData *metadata; + HeapTuple resultTuple; + Datum values[10]; + bool nulls[10]; + + if (!superuser()) + ereport(ERROR, +(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), + (errmsg(must be superuser to use raw page functions; + + raw_page_size = VARSIZE(raw_page) - VARHDRSZ; + if (raw_page_size BLCKSZ) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(input page too small (%d bytes), raw_page_size))); + page = VARDATA(raw_page); + + opaq = (GinPageOpaque) PageGetSpecialPointer(page); + if (opaq-flags != GIN_META) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(input page is not a GIN metapage), + errdetail(Flags %04X, expected %04X, + opaq-flags, GIN_META))); + + /* Build a tuple descriptor for our result type */ + if (get_call_result_type(fcinfo, NULL, tupdesc) != TYPEFUNC_COMPOSITE) + elog(ERROR, return type must be a row type); + + metadata = GinPageGetMeta(page); + + memset(nulls, 0, sizeof(nulls)); + + values[0] = Int64GetDatum(metadata-head); + values[1] = Int64GetDatum(metadata-tail); + values[2] = Int32GetDatum(metadata-tailFreeSize); + values[3] = Int64GetDatum(metadata-nPendingPages); + values[4] = Int64GetDatum(metadata-nPendingHeapTuples); + + /* statistics, updated by VACUUM */ + values[5] = Int64GetDatum(metadata-nTotalPages); + values[6] = Int64GetDatum(metadata-nEntryPages); + values[7] = Int64GetDatum(metadata-nDataPages); + values[8] = Int64GetDatum(metadata-nEntries); + + values[9] = Int32GetDatum(metadata-ginVersion); + + /* Build and return the result tuple. */ + resultTuple = heap_form_tuple(tupdesc, values, nulls); + + return HeapTupleGetDatum(resultTuple); +} + + +Datum +gin_pageopaq(PG_FUNCTION_ARGS) +{ + bytea *raw_page = PG_GETARG_BYTEA_P(0); + int raw_page_size; + TupleDesc tupdesc; + Page page; + GinPageOpaque opaq; + HeapTuple resultTuple; + Datum values[3]; + bool nulls[10]; + Datum flags[16]; + int nflags = 0; + uint16 flagbits; + + if (!superuser()) + ereport(ERROR, +(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), + (errmsg(must be superuser to use raw page functions; + + raw_page_size = VARSIZE(raw_page) - VARHDRSZ; + if (raw_page_size BLCKSZ) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(input page too small (%d bytes), raw_page_size))); + page = VARDATA(raw_page); + + opaq = (GinPageOpaque)
Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf
On 18/11/14 12:57, Simon Riggs wrote: On 31 October 2014 15:18, Petr Jelinek p...@2ndquadrant.com wrote: Attached is the v2 of the patch with the review comments addressed (see below). ... Done, there is now action_at_recovery_target which can be set to either pause, continue or shutdown, defaulting to pause (which is same as old behavior of pause_at_recovery_target defaulting to true). One comment only: I think the actions should be called: pause, promote and shutdown, since continue leads immediately to promotion of the server. I'm good with this patch otherwise. Barring objections I will commit tomorrow. OK, promote works for me as well, I attached patch that changes continue to promote so you don't have to find and replace everything yourself. The changed doc wording probably needs to be checked. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml index 0f1ff34..fe42394 100644 --- a/doc/src/sgml/recovery-config.sgml +++ b/doc/src/sgml/recovery-config.sgml @@ -289,12 +289,39 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p' # Windows /term listitem para -Specifies whether recovery should pause when the recovery target -is reached. The default is true. -This is intended to allow queries to be executed against the -database to check if this recovery target is the most desirable -point for recovery. The paused state can be resumed by using -functionpg_xlog_replay_resume()/ (See +Alias for action_at_recovery_target, literaltrue/ is same as +action_at_recovery_target = literalpause/ and literalfalse/ +is same as action_at_recovery_target = literalpromote/. + /para + para +This setting has no effect if xref linkend=guc-hot-standby is not +enabled, or if no recovery target is set. + /para + /listitem + /varlistentry + + /variablelist + + varlistentry id=action-at-recovery-target + xreflabel=action_at_recovery_target + termvarnameaction_at_recovery_target/varname (typeenum/type) + indexterm +primaryvarnameaction_at_recovery_target/ recovery parameter/primary + /indexterm + /term + listitem + para +Specifies what action should PostgreSQL do once the recovery target is +reached. The default is literalpause/, which means recovery will +be paused. literalpromote/ means recovery process will finish and +the server will start to accept connections. +Finally literalshutdown/ will stop the PostgreSQL instance at +recovery target. + /para +The intended use of literalpause/ setting is to allow queries to be +executed against the database to check if this recovery target is the +most desirable point for recovery. The paused state can be resumed by +using functionpg_xlog_replay_resume()/ (See xref linkend=functions-recovery-control-table), which then causes recovery to end. If this recovery target is not the desired stopping point, then shutdown the server, change the @@ -302,6 +329,20 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p' # Windows continue recovery. /para para +The literalshutdown/ setting is useful to have instance ready at +exact replay point desired. +The instance will still be able to replay more WAL records (and in fact +will have to replay WAL records since last checkpoint upon next time it is +started). + /para + para +Note that because filenamerecovery.conf/ will not be renamed when +varnameaction_at_recovery_target/ is set to literalshutdown/, +any subsequent start will end with immediate shutdown unless the +configuration is changed or the filenamerecovery.conf/ is removed +manually. + /para + para This setting has no effect if xref linkend=guc-hot-standby is not enabled, or if no recovery target is set. /para diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 3c9aeae..974e6c1 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -227,7 +227,7 @@ static char *recoveryEndCommand = NULL; static char *archiveCleanupCommand = NULL; static RecoveryTargetType recoveryTarget = RECOVERY_TARGET_UNSET; static bool recoveryTargetInclusive = true; -static bool recoveryPauseAtTarget = true; +static RecoveryTargetAction actionAtRecoveryTarget = RECOVERY_TARGET_ACTION_PAUSE; static TransactionId recoveryTargetXid; static TimestampTz recoveryTargetTime; static char *recoveryTargetName; @@ -5068,6 +5068,9 @@ readRecoveryCommandFile(void) ConfigVariable *item, *head = NULL, *tail
Re: [HACKERS] New Event Trigger: table_rewrite
Hi, Michael Paquier michael.paqu...@gmail.com writes: 1) This patch is authorizing VACUUM and CLUSTER to use the event triggers ddl_command_start and ddl_command_end, but aren't those commands actually not DDLs but control commands? Reverted in the attached version 3 of the patch. 6) in_table_rewrite seems unnecessary. Removed in the attached version 3 of the patch. On Sun, Nov 16, 2014 at 5:51 AM, Simon Riggs si...@2ndquadrant.com wrote: 4) pg_event_trigger_table_rewrite_oid is able to return only one OID, which is the one of the table being rewritten, and it is limited to one OID because VACUUM, CLUSTER and ALTER TABLE can only run on one object at the same time in a single transaction. What about thinking that we may have in the future multiple objects rewritten in a single transaction, hence multiple OIDs could be fetched? Why would this API support something which the normal trigger API doesn't, just in case we support a feature that hadn't ever been proposed or discussed? Why can't such a change wait until that feature arrives? Agreed, unchanged in the attached. Robert Haas robertmh...@gmail.com writes: It seems pretty weird, also, that the event trigger will fire after we've taken AccessExclusiveLock when you cluster a particular relation, and before we've taken AccessExclusiveLock when you cluster database-wide. That's more or less an implementation artifact of the current code that we're exposing to the use for, really, no good reason. In the CLUSTER implementation we have only one call site for invoking the Event Trigger, in cluster_rel(). While it's true that in the single relation case, the relation is opened in cluster() then cluster_rel() is called, the opening is done with NoLock in cluster(): rel = heap_open(tableOid, NoLock); My understanding is that the relation locking only happens in cluster_rel() at this line: OldHeap = try_relation_open(tableOid, AccessExclusiveLock); Please help me through the cluster locking strategy here, I feel like I'm missing something obvious, as my conclusion from re-reading the code in lights of your comment is that your comment is not accurate with respect to the current state of the code. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support diff --git a/doc/src/sgml/event-trigger.sgml b/doc/src/sgml/event-trigger.sgml index 6f71a27..704a377 100644 --- a/doc/src/sgml/event-trigger.sgml +++ b/doc/src/sgml/event-trigger.sgml @@ -65,6 +65,12 @@ /para para +The literaltable_rewrite/ event occurs just before a table is going to +get rewritten by the commands literalALTER TABLE/literal, +literalCLUSTER/literal or literalVACUUM/literal. + /para + + para Event triggers (like other functions) cannot be executed in an aborted transaction. Thus, if a DDL command fails with an error, any associated literalddl_command_end/ triggers will not be executed. Conversely, @@ -120,518 +126,625 @@ entryliteralddl_command_start/literal/entry entryliteralddl_command_end/literal/entry entryliteralsql_drop/literal/entry +entryliteraltable_rewrite/literal/entry /row /thead tbody row +entry align=leftliteralANALYZE/literal/entry +entry align=centerliteralX/literal/entry +entry align=centerliteralX/literal/entry +entry align=centerliteral-/literal/entry +entry align=centerliteral-/literal/entry + /row + row entry align=leftliteralALTER AGGREGATE/literal/entry entry align=centerliteralX/literal/entry entry align=centerliteralX/literal/entry entry align=centerliteral-/literal/entry +entry align=centerliteral-/literal/entry /row row entry align=leftliteralALTER COLLATION/literal/entry entry align=centerliteralX/literal/entry entry align=centerliteralX/literal/entry entry align=centerliteral-/literal/entry +entry align=centerliteral-/literal/entry /row row entry align=leftliteralALTER CONVERSION/literal/entry entry align=centerliteralX/literal/entry entry align=centerliteralX/literal/entry entry align=centerliteral-/literal/entry +entry align=centerliteral-/literal/entry /row row entry align=leftliteralALTER DOMAIN/literal/entry entry align=centerliteralX/literal/entry entry align=centerliteralX/literal/entry entry align=centerliteral-/literal/entry +entry align=centerliteral-/literal/entry /row row entry align=leftliteralALTER EXTENSION/literal/entry entry align=centerliteralX/literal/entry entry align=centerliteralX/literal/entry entry align=centerliteral-/literal/entry +entry align=centerliteral-/literal/entry /row row
Re: [HACKERS] Additional role attributes superuser review
Adam Brightwell adam.brightw...@crunchydatasolutions.com writes: Currently, I am using int32 simply because int64 is causing some issues. The issue is that genbki.pl is not able to associate it with the int8 type as defined in pg_type.h. Therefore Schema_pg_authid in schemapg.h isn't defined correctly. I've been digging and scratching my head on this one but I have reached a point where I think it would be better just to ask. Any thoughts or recommendations on how I should approach this? Teach src/backend/catalog/Catalog.pm about it. There once was a policy against using int64 in the system catalogs, which is why it wasn't there already; but the reason for that policy is long gone. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use of recent Russian TZ changes in regression tests
Tom Lane-2 wrote The good thing about testing with the MSK changes is that those are quite well-documented and so we don't have to fear getting blindsided by future updates to the IANA database. So basically we are trading off known short term pain (for people on machines with old TZ files) against a risk of unexpected future pain. My inclination is to leave the test as-is, but I'm willing to make the changes if people would rather bet the other way. The tests are intended to exercise behavior against a known set of input - in this case known TZ files. Ideally I would simply prefer to skip over (or give people a configure option) running these tests if --with-system-tzdata has been specified. The reason for the configure option is that packagers enabling this option are basically giving up some level of control or caring whether the TZ files play well with the files deployed on their users' machines. However, someone compiling and then deploying from source will like to be able to see whether their current environment pass all of the tests even if they are going to be using system timezone files instead of PostgreSQL supplied ones. I am tending to like the argument for saying date/time handling is integral to a database and thus tests should be allowed to exercise any data contained in the TZ files provided by PostgreSQL itself. I would even consider simply testing if the system timezone file is older than the PostgreSQL file and failing a test if that is the case. If that seems too harsh then testing against the earliest time we should have seen and accommodated the behavior seems like the most even-handed approach. Basically pretend we caught these nuances at the moment they came to be and wrote our test just they way we would have to write it back then. If it changes in the future we would have had to adapt anyway so no harm done. In the end having compilation or testing fail if older TZ files are present on the system AND --with-system-tzdata having been configured seems like a reasonable policy but can be separated from the issue here which is that we are fixing code to handle older data but choosing to use newer data to test said code. So I'd say we time-travel to fix the back branches and decide whether to change our policy going forward and also optionally add a configure option to control the level of validation when PostgreSQL is asked to use system timezone information instead of its own. David J. -- View this message in context: http://postgresql.nabble.com/Use-of-recent-Russian-TZ-changes-in-regression-tests-tp5827430p5827436.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Additional role attributes superuser review
On 11/18/2014 04:58 PM, Adam Brightwell wrote: All, Currently, I am using int32 simply because int64 is causing some issues. The issue is that genbki.pl http://genbki.pl is not able to associate it with the int8 type as defined in pg_type.h. Therefore Schema_pg_authid in schemapg.h isn't defined correctly. I've been digging and scratching my head on this one but I have reached a point where I think it would be better just to ask. Attached is a quite trivial patch that addresses the int64 (C) to int8 (SQL) mapping issue. Further digging revealed that Catalog.pm wasn't accounting for int64 (thanks Stephen). Would it be better to include this change as a separate patch (as attached) or would it be preferable to include with a larger role attribute bitmask patch? I think we should just apply this now. As Tom said the reason for not doing it is long gone. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New Event Trigger: table_rewrite
Dimitri Fontaine wrote: In the CLUSTER implementation we have only one call site for invoking the Event Trigger, in cluster_rel(). While it's true that in the single relation case, the relation is opened in cluster() then cluster_rel() is called, the opening is done with NoLock in cluster(): rel = heap_open(tableOid, NoLock); My understanding is that the relation locking only happens in cluster_rel() at this line: OldHeap = try_relation_open(tableOid, AccessExclusiveLock); Please help me through the cluster locking strategy here, I feel like I'm missing something obvious, as my conclusion from re-reading the code in lights of your comment is that your comment is not accurate with respect to the current state of the code. Almost the whole of that function is conditions to bail out clustering the relation if things have changed since the relation list was collected. It seems wrong to invoke the event trigger in all those cases; it's going to fire spuriously. I think you should move the invocation of the event trigger at the end, just before rebuild_relation is called. Not sure where relative to the predicate lock stuff therein; probably before, so that we avoid doing that dance if the event trigger function decides to jump ship. In ATRewriteTables, it seems wrong to call it after make_new_heap. If the event trigger function aborts, we end up with useless work done there; so I think it should be called before that. Also, why do you have the evt_table_rewrite_fired stuff? I think you should fire one event per table, no? The second ATRewriteTable call in ATRewriteTables does not actually rewrite the table; it only scans it to verify constraints. So I'm thinking you shouldn't call this event trigger there. Or, if we decide we want this, we probably also need something for the table scans in ALTER DOMAIN too. You still have the ANALYZE thing in docs, which now should be removed. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use of recent Russian TZ changes in regression tests
David G Johnston david.g.johns...@gmail.com writes: Tom Lane-2 wrote The good thing about testing with the MSK changes is that those are quite well-documented and so we don't have to fear getting blindsided by future updates to the IANA database. So basically we are trading off known short term pain (for people on machines with old TZ files) against a risk of unexpected future pain. My inclination is to leave the test as-is, but I'm willing to make the changes if people would rather bet the other way. I am tending to like the argument for saying date/time handling is integral to a database and thus tests should be allowed to exercise any data contained in the TZ files provided by PostgreSQL itself. I would even consider simply testing if the system timezone file is older than the PostgreSQL file and failing a test if that is the case. That doesn't make any sense as it stands, but I take your point, and indeed that's more or less the reasoning that led me to write the test as it is: we *should* complain if you've got TZ data that's more than 6 months old. However, the contrary opinion is that whether the user's TZ data is too old for his purposes is not ours to decide; and that's surely a tenable position as well. I'm not particularly excited about allowing the regression tests to pass if the test cases fail; that would obscure actual failures in this area. I'd sooner just remove the problematic cases altogether. Another thought that just occurred to me is that we need to test both advance and retreat of a zone's notion of standard time, but that doesn't mean that both cases have to be tested in the same zone. The 2011 Russian advance is probably reasonable to depend on by now, but maybe we could find some other well-documented case where a zone's standard time offset decreased relative to UTC. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use of recent Russian TZ changes in regression tests
I wrote: Another thought that just occurred to me is that we need to test both advance and retreat of a zone's notion of standard time, but that doesn't mean that both cases have to be tested in the same zone. The 2011 Russian advance is probably reasonable to depend on by now, but maybe we could find some other well-documented case where a zone's standard time offset decreased relative to UTC. Ah, here we go: # Venezuela # # From John Stainforth (2007-11-28): # ... the change for Venezuela originally expected for 2007-12-31 has # been brought forward to 2007-12-09. The official announcement was # published today in the Gaceta Oficial de la República Bolivariana # de Venezuela, número 38.819 (official document for all laws or # resolution publication) # http://www.globovision.com/news.php?nid=72208 # Zone NAMEGMTOFF RULES FORMAT [UNTIL] ZoneAmerica/Caracas -4:27:44 - LMT 1890 -4:27:40 - CMT 1912 Feb 12 # Caracas Mean Time? -4:30 - VET 1965# Venezuela Time -4:00 - VET 2007 Dec 9 3:00 -4:30 - VET That 2007 change has the right sign (becoming more negative relative to UTC), and it seems pretty solidly documented so it's unlikely to change on us in future. Being in the other direction from Greenwich shouldn't be an issue, maybe it's even better for coverage purposes. Hence, proposal: leave the MSK 2011 cases as-is but replace the 2014 cases with comparable testing around the VET 2007 change. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN pageinspect functions
On Tue, Nov 18, 2014 at 2:01 PM, Peter Geoghegan p...@heroku.com wrote: Do you intend to fix this up? BTW, how do you feel about the B-Tree check extension [1]? It's very much related to pageinspect -- it's more or less a derivative. I don't think I'm going to have time (or that there is sufficient review bandwidth available) to get it into 9.5, but I should post a revision soon, so it's at least something that's available for use by an expert. I did some clean-up work on it that is unpublished. It'll become a more generic extension - amcheck, per Robert's suggestion. One unpublished additional feature (that I have to fix a bug in) that isn't included in [1] is the idea of checking invariants across B-Tree pages. So, a scankey should indicate that the greatest (non-highkey) item on a non-rightmost page comports with the page that it has a right link to. Without race conditions. I don't have that swapped into my head at the moment, and so I don't have a good sense of how hard it'll be to fix the problem I found... [1] http://www.postgresql.org/message-id/cam3swzrtv+xmrwlwq6c-x7czvwavfdwfi4st1zz4ddgfh4y...@mail.gmail.com -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()
On Mon, Nov 17, 2014 at 3:04 PM, Peter Geoghegan p...@heroku.com wrote: postgres=# select qty from orderlines ; ERROR: 42703: column qty does not exist LINE 1: select qty from orderlines ; ^ HINT: Perhaps you meant to reference the column orderlines.quantity. I don't buy this example, because it would give you the same hint if you told it you wanted to access a column called ant, or uay, or tit. And that's clearly ridiculous. The reason why quantity looks like a reasonable suggestion for qty is because it's a conventional abbreviation, but an extremely high percentage of comparable cases won't be. +/* + * Charge extra (for inexact matches only) when an alias was + * specified that differs from what might have been used to + * correctly qualify this RTE's closest column + */ +if (wrongalias) +rtestate.distance += 3; I don't understand what situation this is catering to. Can you explain? It seems to account for a good deal of complexity. Two cases: 1. Distinguishing between the case where there was an exact match to a column that isn't visible (i.e. the existing reason for errorMissingColumn() to call here), and the case where there is a visible column, but our alias was the wrong one. I guess that could live in errorMissingColumn(), but overall it's more convenient to do it here, so that errorMissingColumn() handles things almost uniformly and doesn't really have to care. 2. For non-exact (fuzzy) matches, it seems more useful to give one match rather than two when the user gave an alias that matches one particular RTE. Consider this: postgres=# select ordersid from orders o join orderlines ol on o.orderid = ol.orderid; ERROR: 42703: column ordersid does not exist LINE 1: select ordersid from orders o join orderlines ol on o.orderi... ^ HINT: Perhaps you meant to reference the column o.orderid or the column ol.orderid. LOCATION: errorMissingColumn, parse_relation.c:3166 postgres=# select ol.ordersid from orders o join orderlines ol on o.orderid = ol.orderid; ERROR: 42703: column ol.ordersid does not exist LINE 1: select ol.ordersid from orders o join orderlines ol on o.ord... ^ HINT: Perhaps you meant to reference the column ol.orderid. LOCATION: errorMissingColumn, parse_relation.c:3147 I guess I'm confused at a broader level. If the alias is wrong, why are we considering names in this RTE *at all*? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()
On Tue, Nov 18, 2014 at 3:29 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Nov 17, 2014 at 3:04 PM, Peter Geoghegan p...@heroku.com wrote: postgres=# select qty from orderlines ; ERROR: 42703: column qty does not exist LINE 1: select qty from orderlines ; ^ HINT: Perhaps you meant to reference the column orderlines.quantity. I don't buy this example, because it would give you the same hint if you told it you wanted to access a column called ant, or uay, or tit. And that's clearly ridiculous. The reason why quantity looks like a reasonable suggestion for qty is because it's a conventional abbreviation, but an extremely high percentage of comparable cases won't be. Is that so terrible? Yes, if those *exact* strings are tried, that'll happen. But the vast majority of 3 letter strings will not do that (including many 3 letter strings that include one of the letters 'q', 't' and 'y', such as qqq, ttt, and yyy). Why, in practice, would someone even attempt those strings? I'm worried about Murphy, not Machiavelli. That seems like a pretty important distinction here. I maintain that omission of part of the correct spelling should be weighed less. I am optimizing for the case where the user has a rough idea of the structure and spelling of things - if they're typing in random strings, or totally distinct synonyms, there is little we can do about that. As I said, there will always be the most marginal case that still gets a suggestion. I see no reason to hurt the common case where we help in order to save the user from seeing a ridiculous suggestion. I have a final test for the absolute quality of a suggestion, but I think we could easily be too conservative about that. At worst, our ridiculous suggestion makes apparent that the user's incorrect spelling was itself ridiculous. With larger strings, we can afford to be more conservative, and we are, because we have more information to go on. Terse column names are not uncommon, though. +/* + * Charge extra (for inexact matches only) when an alias was + * specified that differs from what might have been used to + * correctly qualify this RTE's closest column + */ +if (wrongalias) +rtestate.distance += 3; I don't understand what situation this is catering to. Can you explain? It seems to account for a good deal of complexity. I guess I'm confused at a broader level. If the alias is wrong, why are we considering names in this RTE *at all*? Because it's a common mistake when writing ad-hoc queries. People may forget which exact table their column comes from. We certainly want to weigh the fact that an alias was specified, but it shouldn't totally limit our guess to that RTE. If nothing else, the fact that there was a much closer match from another RTE ought to result in forcing there to be no suggestion (due to there being too many equally good suggestions). That's because, as I said, an *absolute* test for the quality of a match is problematic (which, again, is why I err on the side of letting the final, absolute quality test not limit suggestions, particularly with short strings). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()
Peter Geoghegan p...@heroku.com writes: On Tue, Nov 18, 2014 at 3:29 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Nov 17, 2014 at 3:04 PM, Peter Geoghegan p...@heroku.com wrote: postgres=# select qty from orderlines ; ERROR: 42703: column qty does not exist HINT: Perhaps you meant to reference the column orderlines.quantity. I don't buy this example, because it would give you the same hint if you told it you wanted to access a column called ant, or uay, or tit. And that's clearly ridiculous. The reason why quantity looks like a reasonable suggestion for qty is because it's a conventional abbreviation, but an extremely high percentage of comparable cases won't be. I maintain that omission of part of the correct spelling should be weighed less. I would say that omission of the first letter should completely disqualify suggestions based on this heuristic; but it might make sense to weight omissions less after the first letter. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback
On Tue, Nov 18, 2014 at 2:36 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Nov 13, 2014 at 12:38 PM, Fujii Masao masao.fu...@gmail.com wrote: Thanks for reviewing the patch! On Thu, Nov 13, 2014 at 4:05 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Fujii Masao wrote: --- 127,152 When this option is used, applicationpg_receivexlog/ will report a flush position to the server, indicating when each segment has been synchronized to disk so that the server can remove that segment if it ! is not otherwise needed. literal--synchronous/literal option must ! be specified when making applicationpg_receivexlog/ run as ! synchronous standby by using replication slot. Otherwise WAL data ! cannot be flushed frequently enough for this to work correctly. /para /listitem /varlistentry Whitespace damage here. Fixed. + printf(_( --synchronous flush transaction log in real time\n)); in real time sounds odd. How about flush transaction log immediately after writing, or maybe have transaction log writes be synchronous. The former sounds better to me. So I chose it. --- 781,791 now = feGetCurrentTimestamp(); /* ! * Issue sync command as soon as there are WAL data which ! * has not been flushed yet if synchronous option is true. */ if (lastFlushPosition blockpos ! walfile != -1 synchronous) I'd put the synchronous condition first in the if(), and start the comment with it rather than putting it at the end. Both seem weird. Fixed, i.e., moved the synchronous condition first in the if()'s test and also moved the comment If synchronous option is true also first in the comment. progname, current_walfile_name, strerror(errno)); goto error; } lastFlushPosition = blockpos; ! ! /* ! * Send feedback so that the server sees the latest WAL locations ! * immediately if synchronous option is true. ! */ ! if (!sendFeedback(conn, blockpos, now, false)) ! goto error; ! last_status = now; I'm not clear about this comment .. why does it say if synchronous option is true when it's not checking the condition? I added that comment because the code exists with the if() block checking synchronous condition. But it seems confusing. Just removed that part from the comment. Attached is the updated version of the patch. I've just pushed this. Marked this patch as committed on the commit fest app. -- Michael
Re: [HACKERS] tracking commit timestamps
On 15/11/14 13:36, Simon Riggs wrote: On 15 November 2014 04:32, Steve Singer st...@ssinger.info wrote: The use cases I'm talking about aren't really replication related. Often I have come across systems that want to do something such as 'select * from orders where X the_last_row_I_saw order by X' and then do further processing on the order. Yes, existing facilities provide mechanisms for different types of application change queues. If you want to write a processing queue in SQL, that isn't the best way. You'll need some way to keep track of whether or not its been successfully processed. That's either a column in the table, or a column in a queue table maintained by triggers, with the row write locked on read. You can then have multiple readers from this queue using the new SKIP LOCKED feature, which was specifically designed to facilitate that. Logical decoding was intended for much more than just replication. It provides commit order access to changed data in a form that is both usable and efficient for high volume applicatiion needs. I don't see any reason to add LSN into a SLRU updated at commit to support those application needs. I am still on the fence about the LSN issue, I don't mind it from code perspective, it's already written anyway, but I am not sure if we really want it in the SLRU as Simon says. Mainly because of three things: One, this patch is not really feature patch, as you can do most of what it does via tables already, but more a performance improvement and we should try to make it perform as good as possible then, adding more things does not really improve performance (according to my benchmarks the performance difference with/without LSN is under 1% so it's not terrible, but it's there), not to mention additional disk space. Two, the LSN use-cases seem to still be only theoretical to me, while the timestamp use-case has been production problem for at least a decade. Three, even if we add LSN, I am still not convinced that the use-cases presented here wouldn't be better served by putting that info into actual table instead of SLRU - as people want to use it as filter in WHERE clause, somebody mentioned exporting to different db, etc. Maybe we need better explanation of the LSN use-case(s) to understand why it should be stored here and why the other solutions are significantly worse. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup vs. Windows and tablespaces
On Tue, Nov 18, 2014 at 7:49 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Amit Kapila wrote: On Sun, Nov 16, 2014 at 6:15 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: One use case mentioned upthread is having the clone be created in the same machine as the source server. Does your proposal help with it? That use case got addressed with -T option with which user can relocate tablespace directory (Commit: fb05f3c; pg_basebackup: Add support for relocating tablespaces) Okay. As far as I know, -T only works for plain mode, right? Yes. I wonder if we should make -T modify the tablespace catalog, so that the resulting file in tar output outputs names mangled per the map; that would make -T work in tar mode too. Does that make sense? tablepspace catalog (I assume it is new file you are talking about) is formed on the server where as handling for -T is completely in pg_basebackup, we might be able to make it work, but I am not sure if it is worth because the main usecase for -T option is for plain format. I think even if there is some use case for -T to work with tar format, it is a separate project. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] PostgreSQL doesn't stop propley when --slot option is specified with pg_receivexlog.
On Mon, Nov 17, 2014 at 12:30 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Mon, Nov 17, 2014 at 10:02 AM, Fujii Masao masao.fu...@gmail.com wrote: On Sat, Nov 15, 2014 at 9:10 PM, Michael Paquier michael.paqu...@gmail.com wrote: Yep, sounds a good thing to do if master requested answer from the client in the keepalive message. Something like the patch attached would make the deal. Isn't it better to do this only when replication slot is used? Makes sense. What about a check using reportFlushPosition then? Sounds reasonable. Thanks for updating the patch! But the patch could not already be applied to the master cleanly because c4f99d2 heavily changed the code that the patch also touches... I rewrote the patch and pushed it to both master and REL9_4_STABLE. Anyway, thanks! Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback
On Wed, Nov 19, 2014 at 10:20 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Tue, Nov 18, 2014 at 2:36 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Nov 13, 2014 at 12:38 PM, Fujii Masao masao.fu...@gmail.com wrote: Thanks for reviewing the patch! On Thu, Nov 13, 2014 at 4:05 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Fujii Masao wrote: --- 127,152 When this option is used, applicationpg_receivexlog/ will report a flush position to the server, indicating when each segment has been synchronized to disk so that the server can remove that segment if it ! is not otherwise needed. literal--synchronous/literal option must ! be specified when making applicationpg_receivexlog/ run as ! synchronous standby by using replication slot. Otherwise WAL data ! cannot be flushed frequently enough for this to work correctly. /para /listitem /varlistentry Whitespace damage here. Fixed. + printf(_( --synchronous flush transaction log in real time\n)); in real time sounds odd. How about flush transaction log immediately after writing, or maybe have transaction log writes be synchronous. The former sounds better to me. So I chose it. --- 781,791 now = feGetCurrentTimestamp(); /* ! * Issue sync command as soon as there are WAL data which ! * has not been flushed yet if synchronous option is true. */ if (lastFlushPosition blockpos ! walfile != -1 synchronous) I'd put the synchronous condition first in the if(), and start the comment with it rather than putting it at the end. Both seem weird. Fixed, i.e., moved the synchronous condition first in the if()'s test and also moved the comment If synchronous option is true also first in the comment. progname, current_walfile_name, strerror(errno)); goto error; } lastFlushPosition = blockpos; ! ! /* ! * Send feedback so that the server sees the latest WAL locations ! * immediately if synchronous option is true. ! */ ! if (!sendFeedback(conn, blockpos, now, false)) ! goto error; ! last_status = now; I'm not clear about this comment .. why does it say if synchronous option is true when it's not checking the condition? I added that comment because the code exists with the if() block checking synchronous condition. But it seems confusing. Just removed that part from the comment. Attached is the updated version of the patch. I've just pushed this. Marked this patch as committed on the commit fest app. Thanks! Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] psql \watch always ignores \pset null
Hi, /* * Set up rendering options, in particular, disable the pager, because * nobody wants to be prompted while watching the output of 'watch'. */ myopt.nullPrint = NULL; myopt.topt.pager = 0; I found psql's \watch command always ignores \pset null setting. The above source code comment explains why it does, but I'd like to see the specified string for null value even in \watch's output, in order to distinguish null and an empty value. Thought? Is there any reason why \watch must ignore \pset null setting? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Multiple call of GetTransactionSnapshot in single flow
I have observed in some places like exec_bind_message and exec_simple_query, Though these two function have already got snapshot but before they call function PortalStart, current snapshot gets popped off and then they pass InvalidSnapshot as parameter because of which inside PortalStart again snapshot is taken. In cases where many transactions are running, taking snapshot multiple times may be very costly. What is the reason for taking snapshot multiple time: 1. Is this implementation to make sure snapshot is at more granular level ? 2. Is it something do with current command id of the snapshot? 3. Or there is any other specific reason for this, which I am not able visualize? 4. Or am I missing something else? If it is just reason 1, then maybe we can try to pass the same snapshot to PortalStart as taken in caller, it can enhance the performance in many case. With this change, I did one small performance test on pgbench with prepared queries for pgbench select with 16 users and observed performance benefit of 10%. Please provide your opinion? Thanks and Regards, Kumar Rajeev Rastogi
Re: [HACKERS] inherit support for foreign tables
(2014/11/18 18:09), Ashutosh Bapat wrote: I looked at the patch. It looks good now. Once we have the inh patch ready, we can mark this item as ready for commiter. Thanks for the review! Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw behaves oddly
(2014/11/18 18:27), Ashutosh Bapat wrote: On Tue, Nov 18, 2014 at 1:50 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote: (2014/11/17 19:36), Ashutosh Bapat wrote: Here are my comments about the patch fscan_reltargetlist.patch 2. Instead of using rel-reltargetlist, we should use the tlist passed by caller. This is the tlist expected from the Plan node. For foreign scans it will be same as rel-reltargetlist. Using tlist would make the function consistent with create_*scan_plan functions. I disagree with that for the reasons mentioned below: * For a foreign scan, tlist is *not* necessarily the same as rel-reltargetlist (ie, there is a case where tlist contains all user attributes while rel-reltargetlist contains only attributes actually needed by the query). In such a case it'd be inefficient to use tlist rather than rel-reltargetlist. create_foreignscan_plan() is called from create_scan_plan(), which passes the tlist. The later uses function use_physical_tlist() to decide which targetlist should be used (physical or actual). As per code below in this function 485 /* 486 * We can do this for real relation scans, subquery scans, function scans, 487 * values scans, and CTE scans (but not for, eg, joins). 488 */ 489 if (rel-rtekind != RTE_RELATION 490 rel-rtekind != RTE_SUBQUERY 491 rel-rtekind != RTE_FUNCTION 492 rel-rtekind != RTE_VALUES 493 rel-rtekind != RTE_CTE) 494 return false; 495 496 /* 497 * Can't do it with inheritance cases either (mainly because Append 498 * doesn't project). 499 */ 500 if (rel-reloptkind != RELOPT_BASEREL) 501 return false; For foreign tables as well as the tables under inheritance hierarchy it uses the actual targetlist, which contains only the required columns IOW rel-reltargetlist (see build_path_tlist()) with nested loop parameters substituted. So, it never included unnecessary columns in the targetlist. Maybe I'm missing something, but I think you are overlooking the case for foreign tables that are *not* under an inheritance hierarchy. (Note that the rtekind for foreign tables is RTE_RELATION.) Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multiple call of GetTransactionSnapshot in single flow
Rajeev rastogi rajeev.rast...@huawei.com writes: What is the reason for taking snapshot multiple time: To get the right answer. I believe Robert Haas has been down this rabbit hole most recently before you: see commits d573e239f and 532994299. The execution snapshot has to be (at minimum) later than acquisition of all table locks for the query, and that means it'd better be different from the snapshot used for parse/plan activities. The right solution here is not so much to reduce the number of snapshots as to make them cheaper. Heikki was working on something for that, but I'm not sure where he is with that patch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On Thu, Nov 13, 2014 at 10:25 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Andres Freund wrote: On 2014-11-12 18:23:38 -0500, Robert Haas wrote: The problem is that it's very hard to avoid the wrong index's relfilenode being used when swapping the relfilenodes between two indexes. How about storing both the old and new relfilenodes in the same pg_class entry? That's quite a cool idea [think a bit] But I think it won't work realistically. We have a *lot* of infrastructure that refers to indexes using it's primary key. Hmm, can we make the relmapper do this job instead of having another pg_class column? Essentially the same sketch Robert proposed, instead we would initially set relfilenode=0 and have all onlookers use the relmapper to obtain the correct relfilenode; switching to the new relfilenode can be done atomically, and un-relmap the index once the process is complete. The difference from what Robert proposes is that the transient state is known to cause failures for anyone not prepared to deal with it, so it should be easy to spot what places need adjustment. How would the failure handling actually work? Would we need some extra process to remove the extra relfilenodes? Note that in the current patch the temporary concurrent entry is kept as INVALID all the time, giving the user a path to remove them with DROP INDEX even in the case of invalid toast indexes in catalog pg_toast. Note that I am on the side of using the exclusive lock when swapping relfilenodes for now in any case, that's what pg_repack does by renaming the indexes, and people use it. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \watch always ignores \pset null
Fujii Masao masao.fu...@gmail.com writes: Hi, /* * Set up rendering options, in particular, disable the pager, because * nobody wants to be prompted while watching the output of 'watch'. */ myopt.nullPrint = NULL; myopt.topt.pager = 0; I found psql's \watch command always ignores \pset null setting. The above source code comment explains why it does, but I'd like to see the specified string for null value even in \watch's output, in order to distinguish null and an empty value. Thought? Is there any reason why \watch must ignore \pset null setting? Hmmm ... the comment offers a reasonable argument for forcing pager = 0, but I agree the nullPrint change is not adequately explained. Will, do you remember why you did that? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw behaves oddly
(2014/11/18 18:37), Ashutosh Bapat wrote: On Tue, Nov 18, 2014 at 1:55 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote: (2014/11/17 19:54), Ashutosh Bapat wrote: Here are comments for postgres_fdw-syscol patch. Code --- 1. Instead of a single liner comment System columns can't be sent to remote., it might be better to explain why system columns can't be sent to the remote. Done. I would add and foreign values do not make sense locally (except may be ctid clubbed with foreign server_id) to make it more clear. But I will leave that for the commiter to decide. OK. 3. Since there is already a testcase which triggered this particular change, it will good, if we add that to regression in postgres_fdw. Done. I think, a better case would be SELECT * FROM ft1 t1, pg_class t2 WHERE t1.tableoid = t2.oid. The condition makes sure that the tableoid in the row is same as the OID of the foreign table recorded in pg_class locally. And the EXPLAIN of the query which clearly shows that the tableoid column in not fetched from the foreign server. I thought that test, but I didn't use it because I think we can't see (at least from the EXPLAIN) why the qual is not pushed down: the qual isn't pushed down possibly becasue the qual is considered as a *join* qual, not because the qual just cotains tableoid. (Having said that, I think we can see if the qual isn't pushed down as a join qual for a parameterized plan, but ISTM it's worth complicating regression tests.) Once we resolve the other patch on this thread, I think this item can be marked as ready for commiter from my side. OK. Thank you for the review. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw behaves oddly
On Wed, Nov 19, 2014 at 11:18 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/11/18 18:27), Ashutosh Bapat wrote: On Tue, Nov 18, 2014 at 1:50 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote: (2014/11/17 19:36), Ashutosh Bapat wrote: Here are my comments about the patch fscan_reltargetlist.patch 2. Instead of using rel-reltargetlist, we should use the tlist passed by caller. This is the tlist expected from the Plan node. For foreign scans it will be same as rel-reltargetlist. Using tlist would make the function consistent with create_*scan_plan functions. I disagree with that for the reasons mentioned below: * For a foreign scan, tlist is *not* necessarily the same as rel-reltargetlist (ie, there is a case where tlist contains all user attributes while rel-reltargetlist contains only attributes actually needed by the query). In such a case it'd be inefficient to use tlist rather than rel-reltargetlist. create_foreignscan_plan() is called from create_scan_plan(), which passes the tlist. The later uses function use_physical_tlist() to decide which targetlist should be used (physical or actual). As per code below in this function 485 /* 486 * We can do this for real relation scans, subquery scans, function scans, 487 * values scans, and CTE scans (but not for, eg, joins). 488 */ 489 if (rel-rtekind != RTE_RELATION 490 rel-rtekind != RTE_SUBQUERY 491 rel-rtekind != RTE_FUNCTION 492 rel-rtekind != RTE_VALUES 493 rel-rtekind != RTE_CTE) 494 return false; 495 496 /* 497 * Can't do it with inheritance cases either (mainly because Append 498 * doesn't project). 499 */ 500 if (rel-reloptkind != RELOPT_BASEREL) 501 return false; For foreign tables as well as the tables under inheritance hierarchy it uses the actual targetlist, which contains only the required columns IOW rel-reltargetlist (see build_path_tlist()) with nested loop parameters substituted. So, it never included unnecessary columns in the targetlist. Maybe I'm missing something, but I think you are overlooking the case for foreign tables that are *not* under an inheritance hierarchy. (Note that the rtekind for foreign tables is RTE_RELATION.) Ah! you are right. I confused between rtekind and relkind. Sorry for that. May be we should modify use_physical_tlist() to return false in case of RELKIND_FOREIGN_TABLE, so that we can use tlist in create_foreignscan_plan(). I do not see any create_*_plan() function using reltargetlist directly. Thanks, Best regards, Etsuro Fujita -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] psql \watch always ignores \pset null
On Tue, Nov 18, 2014 at 9:54 PM, Tom Lane t...@sss.pgh.pa.us wrote: Fujii Masao masao.fu...@gmail.com writes: Hi, /* * Set up rendering options, in particular, disable the pager, because * nobody wants to be prompted while watching the output of 'watch'. */ myopt.nullPrint = NULL; myopt.topt.pager = 0; I found psql's \watch command always ignores \pset null setting. The above source code comment explains why it does, but I'd like to see the specified string for null value even in \watch's output, in order to distinguish null and an empty value. Thought? Is there any reason why \watch must ignore \pset null setting? Hmmm ... the comment offers a reasonable argument for forcing pager = 0, but I agree the nullPrint change is not adequately explained. Will, do you remember why you did that? regards, tom lane I tracked down the individual commit[1] from my history where I added that. What I added there is very similar to sections in src/bin/psql/describe.c. I can't remember specifically my reasoning then, but it's likely I copied the patterns there while getting things working. I do still think it's important to remove the pager, but the nullPrint is probably a mistake. [1]: https://github.com/will/postgres/commit/c42d29fece16ec9cb13c159b3307ab9fca892eb2 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \watch always ignores \pset null
Will Leinweber w...@heroku.com writes: On Tue, Nov 18, 2014 at 9:54 PM, Tom Lane t...@sss.pgh.pa.us wrote: Fujii Masao masao.fu...@gmail.com writes: Is there any reason why \watch must ignore \pset null setting? Hmmm ... the comment offers a reasonable argument for forcing pager = 0, but I agree the nullPrint change is not adequately explained. Will, do you remember why you did that? I tracked down the individual commit[1] from my history where I added that. What I added there is very similar to sections in src/bin/psql/describe.c. I can't remember specifically my reasoning then, but it's likely I copied the patterns there while getting things working. I do still think it's important to remove the pager, but the nullPrint is probably a mistake. I took a quick look and noted that the other places where nullPrint is summarily forced to null are for \d and similar queries. For those, the code can reasonably have an opinion about what the presentation should be like, since it knows what SQL query it's issuing. That argument surely doesn't apply to \watch, so I'm in agreement with Fujii that it'd be better to respect the user's \pset setting. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw behaves oddly
(2014/11/19 14:58), Ashutosh Bapat wrote: On Wed, Nov 19, 2014 at 11:18 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote: (2014/11/18 18:27), Ashutosh Bapat wrote: On Tue, Nov 18, 2014 at 1:50 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp (2014/11/17 19:36), Ashutosh Bapat wrote: Here are my comments about the patch fscan_reltargetlist.patch 2. Instead of using rel-reltargetlist, we should use the tlist passed by caller. This is the tlist expected from the Plan node. For foreign scans it will be same as rel-reltargetlist. Using tlist would make the function consistent with create_*scan_plan functions. I disagree with that for the reasons mentioned below: * For a foreign scan, tlist is *not* necessarily the same as rel-reltargetlist (ie, there is a case where tlist contains all user attributes while rel-reltargetlist contains only attributes actually needed by the query). In such a case it'd be inefficient to use tlist rather than rel-reltargetlist. create_foreignscan_plan() is called from create_scan_plan(), which passes the tlist. The later uses function use_physical_tlist() to decide which targetlist should be used (physical or actual). As per code below in this function 485 /* 486 * We can do this for real relation scans, subquery scans, function scans, 487 * values scans, and CTE scans (but not for, eg, joins). 488 */ 489 if (rel-rtekind != RTE_RELATION 490 rel-rtekind != RTE_SUBQUERY 491 rel-rtekind != RTE_FUNCTION 492 rel-rtekind != RTE_VALUES 493 rel-rtekind != RTE_CTE) 494 return false; 495 496 /* 497 * Can't do it with inheritance cases either (mainly because Append 498 * doesn't project). 499 */ 500 if (rel-reloptkind != RELOPT_BASEREL) 501 return false; For foreign tables as well as the tables under inheritance hierarchy it uses the actual targetlist, which contains only the required columns IOW rel-reltargetlist (see build_path_tlist()) with nested loop parameters substituted. So, it never included unnecessary columns in the targetlist. Maybe I'm missing something, but I think you are overlooking the case for foreign tables that are *not* under an inheritance hierarchy. (Note that the rtekind for foreign tables is RTE_RELATION.) Ah! you are right. I confused between rtekind and relkind. Sorry for that. May be we should modify use_physical_tlist() to return false in case of RELKIND_FOREIGN_TABLE, so that we can use tlist in create_foreignscan_plan(). I do not see any create_*_plan() function using reltargetlist directly. Yeah, I think we can do that, but I'm not sure that we should use tlist in create_foreignscan_plan(), not rel-reltargetlist. How about leaving this for committers to decide. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw behaves oddly
On Wed, Nov 19, 2014 at 12:14 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/11/19 14:58), Ashutosh Bapat wrote: On Wed, Nov 19, 2014 at 11:18 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote: (2014/11/18 18:27), Ashutosh Bapat wrote: On Tue, Nov 18, 2014 at 1:50 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp (2014/11/17 19:36), Ashutosh Bapat wrote: Here are my comments about the patch fscan_reltargetlist.patch 2. Instead of using rel-reltargetlist, we should use the tlist passed by caller. This is the tlist expected from the Plan node. For foreign scans it will be same as rel-reltargetlist. Using tlist would make the function consistent with create_*scan_plan functions. I disagree with that for the reasons mentioned below: * For a foreign scan, tlist is *not* necessarily the same as rel-reltargetlist (ie, there is a case where tlist contains all user attributes while rel-reltargetlist contains only attributes actually needed by the query). In such a case it'd be inefficient to use tlist rather than rel-reltargetlist. create_foreignscan_plan() is called from create_scan_plan(), which passes the tlist. The later uses function use_physical_tlist() to decide which targetlist should be used (physical or actual). As per code below in this function 485 /* 486 * We can do this for real relation scans, subquery scans, function scans, 487 * values scans, and CTE scans (but not for, eg, joins). 488 */ 489 if (rel-rtekind != RTE_RELATION 490 rel-rtekind != RTE_SUBQUERY 491 rel-rtekind != RTE_FUNCTION 492 rel-rtekind != RTE_VALUES 493 rel-rtekind != RTE_CTE) 494 return false; 495 496 /* 497 * Can't do it with inheritance cases either (mainly because Append 498 * doesn't project). 499 */ 500 if (rel-reloptkind != RELOPT_BASEREL) 501 return false; For foreign tables as well as the tables under inheritance hierarchy it uses the actual targetlist, which contains only the required columns IOW rel-reltargetlist (see build_path_tlist()) with nested loop parameters substituted. So, it never included unnecessary columns in the targetlist. Maybe I'm missing something, but I think you are overlooking the case for foreign tables that are *not* under an inheritance hierarchy. (Note that the rtekind for foreign tables is RTE_RELATION.) Ah! you are right. I confused between rtekind and relkind. Sorry for that. May be we should modify use_physical_tlist() to return false in case of RELKIND_FOREIGN_TABLE, so that we can use tlist in create_foreignscan_plan(). I do not see any create_*_plan() function using reltargetlist directly. Yeah, I think we can do that, but I'm not sure that we should use tlist in create_foreignscan_plan(), not rel-reltargetlist. How about leaving this for committers to decide. I am fine with that. May be you want to add an XXX comment there to bring it to the committer's notice. Thanks, Best regards, Etsuro Fujita -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] postgres_fdw behaves oddly
(2014/11/19 14:55), Etsuro Fujita wrote: (2014/11/18 18:37), Ashutosh Bapat wrote: On Tue, Nov 18, 2014 at 1:55 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote: (2014/11/17 19:54), Ashutosh Bapat wrote: Here are comments for postgres_fdw-syscol patch. 3. Since there is already a testcase which triggered this particular change, it will good, if we add that to regression in postgres_fdw. Done. I think, a better case would be SELECT * FROM ft1 t1, pg_class t2 WHERE t1.tableoid = t2.oid. The condition makes sure that the tableoid in the row is same as the OID of the foreign table recorded in pg_class locally. And the EXPLAIN of the query which clearly shows that the tableoid column in not fetched from the foreign server. I thought that test, but I didn't use it because I think we can't see (at least from the EXPLAIN) why the qual is not pushed down: the qual isn't pushed down possibly becasue the qual is considered as a *join* qual, not because the qual just cotains tableoid. (Having said that, I think we can see if the qual isn't pushed down as a join qual for a parameterized plan, but ISTM it's worth complicating regression tests.) Sorry, I incorrectly wrote the last sentence. What I tried to write is, ISTM it's *not* worth complicating regression tests. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw behaves oddly
(2014/11/19 15:56), Ashutosh Bapat wrote: On Wed, Nov 19, 2014 at 12:14 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote: (2014/11/19 14:58), Ashutosh Bapat wrote: May be we should modify use_physical_tlist() to return false in case of RELKIND_FOREIGN_TABLE, so that we can use tlist in create_foreignscan_plan(). I do not see any create_*_plan() function using reltargetlist directly. Yeah, I think we can do that, but I'm not sure that we should use tlist in create_foreignscan_plan(), not rel-reltargetlist. How about leaving this for committers to decide. I am fine with that. May be you want to add an XXX comment there to bring it to the committer's notice. It's ok, but I'm not convinced with your idea. So, I think the comment can be adequately described by you, not by me. So, my proposal is for you to add the comment to the CF app. Could you do that? Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers