Re: [HACKERS] psql: show only failed queries
On 26 June 2014 11:53, Samrat Revagade Wrote: I am sending updated patch - buggy statement is printed via more logical psql_error function instead printf Thank you for updating patch, I really appreciate your efforts. Now, everything is good from my side. * it apply cleanly to the current git master * includes necessary docs * I think It is very good and necessary feature. If Kumar Rajeev Rastogi do not have any extra comments, then I think patch is ready for committer. I have reviewed this patch. Please find my review comments below: 1. Command start-up option (e.g. -a/--echo-all for --ECHO=all), for new functionality is not provided. 2. New Command start-up option should be added in psql --help as well as in documentation. Also as I understand, this new option is kind of sub-set of existing option (ECHO=query), so should not we display query string in the same format as it was getting printed earlier. Though I also feel that prefixing query with STATEMENT word will be helpful to grep but at the same time I am worried about inconsistency with existing option. Thanks and Regards, Kumar Rajeev Rastogi
Re: [HACKERS] pg_xlogdump --stats
At 2014-06-30 05:19:10 +, dilip.ku...@huawei.com wrote: I have started reviewing the patch.. Thanks. 1. Patch applied to git head cleanly. 2. Compiled in Linux -- Some warnings same as mentioned by furuyao I've attached an updated version of the patch which should fix the warnings by using %zu. 3. Some compilation error in windows .\contrib\pg_xlogdump\pg_xlogdump.c(1002) : error C2065: 'optional_argument' : undeclared identifier .\contrib\pg_xlogdump\pg_xlogdump.c(1002) : error C2099: initializer is not a constant optional_argument should be added to getopt_long.h file for windows. Hmm. I have no idea what to do about this. I did notice when I wrote the code that nothing else used optional_argument, but I didn't realise that it wouldn't work on Windows. It may be that the best thing to do would be to avoid using optional_argument altogether, and have separate --stats and --stats-per-record options. Thoughts? Please fix these issues and send the updated patch.. I've also attached a separate patch to factor out the identify_record function into rm_identify functions in the individual rmgr files, so that the code can live next to the respective _desc functions. This allows us to remove a number of #includes from pg_xlogdump, and makes the code a bit more straightforward to read. -- Abhijit diff --git a/contrib/pg_xlogdump/pg_xlogdump.c b/contrib/pg_xlogdump/pg_xlogdump.c index 824b8c3..1d3e664 100644 --- a/contrib/pg_xlogdump/pg_xlogdump.c +++ b/contrib/pg_xlogdump/pg_xlogdump.c @@ -15,12 +15,28 @@ #include dirent.h #include unistd.h +#include access/clog.h +#include access/gin_private.h +#include access/gist_private.h +#include access/heapam_xlog.h +#include access/heapam_xlog.h +#include access/multixact.h +#include access/nbtree.h +#include access/spgist_private.h +#include access/xact.h #include access/xlog.h #include access/xlogreader.h #include access/transam.h +#include catalog/pg_control.h +#include catalog/storage_xlog.h +#include commands/dbcommands.h +#include commands/sequence.h +#include commands/tablespace.h #include common/fe_memutils.h #include getopt_long.h #include rmgrdesc.h +#include storage/standby.h +#include utils/relmapper.h static const char *progname; @@ -41,6 +57,8 @@ typedef struct XLogDumpConfig int stop_after_records; int already_displayed_records; bool follow; + bool stats; + bool stats_per_record; /* filter options */ int filter_by_rmgr; @@ -48,6 +66,20 @@ typedef struct XLogDumpConfig bool filter_by_xid_enabled; } XLogDumpConfig; +typedef struct Stats +{ + uint64 count; + uint64 rec_len; + uint64 fpi_len; +} Stats; + +typedef struct XLogDumpStats +{ + uint64 count; + Stats rmgr_stats[RM_NEXT_ID]; + Stats record_stats[RM_NEXT_ID][16]; +} XLogDumpStats; + static void fatal_error(const char *fmt,...) __attribute__((format(PG_PRINTF_ATTRIBUTE, 1, 2))); @@ -322,6 +354,39 @@ XLogDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen, } /* + * Store per-rmgr and per-record statistics for a given record. + */ +static void +XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats, XLogRecPtr ReadRecPtr, XLogRecord *record) +{ + RmgrId rmid; + uint8 recid; + + if (config-filter_by_rmgr != -1 + config-filter_by_rmgr != record-xl_rmid) + return; + + if (config-filter_by_xid_enabled + config-filter_by_xid != record-xl_xid) + return; + + rmid = record-xl_rmid; + recid = (record-xl_info ~XLR_INFO_MASK) 4; + + stats-count++; + + stats-rmgr_stats[rmid].count++; + stats-rmgr_stats[rmid].rec_len += record-xl_len; + stats-rmgr_stats[rmid].fpi_len += + (record-xl_tot_len - record-xl_len - SizeOfXLogRecord); + + stats-record_stats[rmid][recid].count++; + stats-record_stats[rmid][recid].rec_len += record-xl_len; + stats-record_stats[rmid][recid].fpi_len += + (record-xl_tot_len - record-xl_len - SizeOfXLogRecord); +} + +/* * Print a record to stdout */ static void @@ -380,6 +445,476 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogRecPtr ReadRecPtr, XLogRecord } } +/* + * Given an xl_rmid and the high bits of xl_info, returns a string + * describing the record type. + */ +static const char * +identify_record(RmgrId rmid, uint8 info) +{ + const RmgrDescData *desc = RmgrDescTable[rmid]; + const char *rec; + + rec = psprintf(0x%x, info); + + switch (rmid) + { + case RM_XLOG_ID: + switch (info) + { +case XLOG_CHECKPOINT_SHUTDOWN: + rec = CHECKPOINT_SHUTDOWN; + break; +case XLOG_CHECKPOINT_ONLINE: + rec = CHECKPOINT_ONLINE; + break; +case XLOG_NOOP: + rec = NOOP; + break; +case XLOG_NEXTOID: + rec = NEXTOID; + break; +case XLOG_SWITCH: + rec = SWITCH; + break; +case XLOG_BACKUP_END: + rec = BACKUP_END; + break; +case XLOG_PARAMETER_CHANGE: + rec = PARAMETER_CHANGE; + break; +case XLOG_RESTORE_POINT: + rec = RESTORE_POINT; + break; +case XLOG_FPW_CHANGE: +
Re: [HACKERS] pg_xlogdump --stats
At 2014-06-30 12:05:06 +0530, a...@2ndquadrant.com wrote: It may be that the best thing to do would be to avoid using optional_argument altogether, and have separate --stats and --stats-per-record options. Thoughts? That's what I've done in the attached patch, except I've called the new option --record-stats. Both options now use no_argument. This should apply on top of the diff I posted a little while ago. -- Abhijit commit cc9422aa71ef0b507c634282272be3fd15c39c0b Author: Abhijit Menon-Sen a...@2ndquadrant.com Date: Mon Jun 30 12:15:54 2014 +0530 Introduce --record-stats to avoid use of optional_argument diff --git a/contrib/pg_xlogdump/pg_xlogdump.c b/contrib/pg_xlogdump/pg_xlogdump.c index 47838d4..1853b47 100644 --- a/contrib/pg_xlogdump/pg_xlogdump.c +++ b/contrib/pg_xlogdump/pg_xlogdump.c @@ -609,7 +609,8 @@ main(int argc, char **argv) {timeline, required_argument, NULL, 't'}, {xid, required_argument, NULL, 'x'}, {version, no_argument, NULL, 'V'}, - {stats, optional_argument, NULL, 'z'}, + {stats, no_argument, NULL, 'z'}, + {record-stats, no_argument, NULL, 'Z'}, {NULL, 0, NULL, 0} }; @@ -643,7 +644,7 @@ main(int argc, char **argv) goto bad_argument; } - while ((option = getopt_long(argc, argv, be:?fn:p:r:s:t:Vx:z::, + while ((option = getopt_long(argc, argv, be:?fn:p:r:s:t:Vx:zZ, long_options, optindex)) != -1) { switch (option) @@ -738,18 +739,10 @@ main(int argc, char **argv) break; case 'z': config.stats = true; -config.stats_per_record = false; -if (optarg) -{ - if (strcmp(optarg, record) == 0) - config.stats_per_record = true; - else if (strcmp(optarg, rmgr) != 0) - { - fprintf(stderr, %s: unrecognised argument to --stats: %s\n, -progname, optarg); - goto bad_argument; - } -} +break; + case 'Z': +config.stats = true; +config.stats_per_record = true; break; default: goto bad_argument; diff --git a/doc/src/sgml/pg_xlogdump.sgml b/doc/src/sgml/pg_xlogdump.sgml index d9f4a6a..bfd9eb9 100644 --- a/doc/src/sgml/pg_xlogdump.sgml +++ b/doc/src/sgml/pg_xlogdump.sgml @@ -181,12 +181,22 @@ PostgreSQL documentation varlistentry termoption-z/option/term - termoption--stats[=record]/option/term + termoption--stats/option/term listitem para Display summary statistics (number and size of records and -full-page images) instead of individual records. Optionally -generate statistics per-record instead of per-rmgr. +full-page images per rmgr) instead of individual records. + /para + /listitem + /varlistentry + + varlistentry + termoption-Z/option/term + termoption--record-stats/option/term + listitem + para +Display summary statistics (number and size of records and +full-page images) instead of individual records. /para /listitem /varlistentry -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql: show only failed queries
2014-06-30 8:17 GMT+02:00 Rajeev rastogi rajeev.rast...@huawei.com: On 26 June 2014 11:53, Samrat Revagade Wrote: I am sending updated patch - buggy statement is printed via more logical psql_error function instead printf Thank you for updating patch, I really appreciate your efforts. Now, everything is good from my side. * it apply cleanly to the current git master * includes necessary docs * I think It is very good and necessary feature. If Kumar Rajeev Rastogi do not have any extra comments, then I think patch is ready for committer. I have reviewed this patch. Please find my review comments below: 1. Command start-up option (e.g. -a/--echo-all for --ECHO=all), for new functionality is not provided. all not options entered via psql variables has psql option and psql comment. I'll plan add new decription to --help-variables list. If it is necessary I can add long option --echo-errors, I didn't a good char for short option. Any idea? 2. New Command start-up option should be added in psql --help as well as in documentation. depends on previous, Also as I understand, this new option is kind of sub-set of existing option (ECHO=query), so should not we display query string in the same format as it was getting printed earlier. Though I also feel that prefixing query with STATEMENT word will be helpful to grep but at the same time I am worried about inconsistency with existing option. This is question. And I am not strong in feeling what should be preferred. But still I am inclined to prefer a variant with STATEMENT prefix. Mode with -a is used with different purpose than mode show errors only - and output with prefix is much more consistent with log entry - and displaying error. So I agree, so there is potential inconsistency (but nowhere is output defined), but this output is more practical, when you are concentrated to error's processing. Regards Pavel *Thanks and Regards,* *Kumar Rajeev Rastogi *
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
At 2014-06-29 22:25:54 +0530, a...@2ndquadrant.com wrote: I think the really right thing to do would be to have two separate columns, one with all, sameuser, samerole, replication, or empty; and the other an array of database names. After sleeping on it, I realised that the code would return '{all}' for 'all' in pg_hba.conf, but '{all}' for 'all'. So it's not exactly ambiguous, but I don't think it's especially useful for callers. I think having two columns would work. The columns could be called database and database_list and user and user_list respectively. The database column may contain one of all, sameuser, samegroup, replication, but if it's empty, database_list will contain an array of database names. Then (all, {}) and (, {all}) are easily separated. Likewise for user and user_list. I've marked this patch returned with feedback and moved it to the August CF after discussion with Vaishnavi. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL access to database attributes
2014-06-29 21:09 GMT+02:00 Tom Lane t...@sss.pgh.pa.us: Vik Fearing vik.fear...@dalibo.com writes: On 06/21/2014 10:11 PM, Pavel Stehule wrote: Is any reason or is acceptable incompatible change CONNECTION_LIMIT instead CONNECTION LIMIT? Is decreasing parser size about 1% good enough for breaking compatibility? How is compatibility broken? The grammar still accepts the old way, I just changed the documentation to promote the new way. While I agree that this patch wouldn't break backwards compatibility, I don't really see what the argument is for changing the recommended spelling of the command. The difficulty with doing what you've done here is that it creates unnecessary cross-version incompatibilities; for example a 9.5 psql being used against a 9.4 server would tab-complete the wrong spelling of the option. Back-patching would change the set of versions for which the problem exists, but it wouldn't remove the problem altogether. And in fact it'd add new problems, e.g. pg_dumpall output from a 9.3.5 pg_dumpall failing to load into a 9.3.4 server. This is not the kind of change we customarily back-patch anyway. So personally I'd have just made connection_limit be an undocumented internal equivalent for CONNECTION LIMIT, and kept the latter as the preferred spelling, with no client-side changes. +1 There is no important reason do hard changes in this moment Pavel regards, tom lane
[HACKERS] Escaping from blocked send() reprised.
Hello, I have received inquiries related to blocked communication several times for these weeks with different symptoms. Then I found this message from archive, http://postgresql.1045698.n5.nabble.com/Escaping-a-blocked-sendto-syscall-without-causing-a-restart-td5740855.html Subject: Escaping a blocked sendto() syscall without causing a restart Mr. Tom Lane gave a comment replying it, Offhand it looks to me like most signals would kick the backend off the send() call ... but it would loop right back and try again. See internal_flush() in pqcomm.c. (If you're using SSL, this diagnosis may or may not apply.) We can't do anything except repeat the send attempt if the client connection is to be kept in a sane state. (snipped) And I'm not at all sure if we could get it to work in SSL mode... That's true for timeouts that should continue the connection, say, statement_timeout, but focusing on intentional backend termination, I think it does no harm to break it up abruptly, even if it was on SSL. On the other hand it seems still preferable to keep a connection when not blocked. The following expression would detects such a blocking state at just before next send(2) after the previous try exited by signals. (ProcDiePending select(1, NULL, fd, NULL, '1 sec') == 0) Finally, pg_terminate_backend() works even when send is blocked for both SSL and non-SSL connections after 1 second delay with this patch (break_socket_blocking_on_termination_v1.patch). Nevetheless, of course statement_timeout cannot become effective by this method since it breaks the consistency in the client protocol. It needs change in client protocol to have out of band mechanism or something, maybe. Any suggestions? Attached patches are: - break_socket_blocking_on_termination_v1.patch : The patch to break blocked state of send(2) for pg_terminate_backend(). - socket_block_test.patch : debug printing and changing send buffer of libpq for reproducing the blocked situation. Some point of discussion follows, Discussion about the appropriateness of looking into ProcDiePending there and calling CHECK_FOR_INTERRUPTS() seeing it. I have somewhat uneasiness of these things, but what we can at most seems to be replacing ProcDiePending here with some another variable, say ImmediatelyExitFromBlockedState, and somehow go upstairs through normal return path. Additional Try-Catch seems can do that but it looks no benefit for the added complexity.. Discussion on breaking up connetion especially for SSL Breaking an SSL connection up in my_sock_write() cause the following message on client side if it still lives and resumes to receive from the connection, this seems to show that the client handles the event properly. | SSL SYSCALL error: EOF detected | The connection to the server was lost. Attempting reset: Succeeded. Discussion on reliability of select(2) This method is not a perfect solution, since the select(2) sometimes gives a wrong oracle about wheather the follwing send(2) will be blocked. Even so, as far as I see, select(2) just after exiting from blocked send(2) by signal seems always says 'write will be blocked', so what this patch does seems to save most cases except when the any amount of socket buffer is vacated just before the following select. The second chance to exit from blocked send(2) won't come after this(, before one more pg_terminate_backend() ?). Removing the select(2) from the condition (that is, CHECK_FOR_INTERRUPTS() is called always ProcDiePending is true) prevents such a possibility, in exchange for killing 'really live' connection but IMHO it's no problem on intentional server termination. More reliable measure for this would be non-blocking IO but it changes more of the code. Reproducing the situation. Another possible question would be about the possibility of such blocking, or how to reproduce the situation. I found that send(2) on CentOS6.5 somehow sends successfully, for most cases, the remaining data at the retry after exiting by signal during being blocked with buffer full, in spite of no change in environment. So reproducing the stucked situation is rather difficult on the server as is. But such situation would be reproduced quite easily with some cheat, that is, enlarging PQ send buffer, say by ten times. Applying the attached testing patch (socket_block_test.patch), the following steps will make the stucked situation. 1. Do a select which returns big result and enter Ctrl-Z just after invoking. cl $ psql -h localhost postgres cl postgres=# select 1 from generate_series(0, 999); cl ^Z cl [4]+ Stopped psql -h localhost postgres 2. Watch the server to stuck. The server starts to print lines like following to console after a while, then stops. The number enclosed by the square brackets is PID of the server. sv [8809] (bare) rest = 0 / 81920 bytes, ProcDiePending = 0 3. Do
Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?
On 06/29/2014 02:19 AM, Matheus de Oliveira wrote: Hi Hackers, I have worked on that patch a little more. So now I have functional patch (although still WIP) attached. The feature works as following: - Added a boolean parameter only_temp_files to pg_tablespace.spcoptions; - This parameter can be set to true only during CREATE TABLESPACE, not on ALTER TABLESPACE (I have thought of ways of implementing the latter, and I'd like to discuss it more latter); - On the creation of relations, it is checked if it is a temporary-tablespace, and an error occurs when it is and the relation is not temporary (temp table or index on a temp table); - When a temporary file (either relation file or sort/agg file) is created inside a temporary-tablespace, the entire directories structure is created on-demand (e.g. if pg_tblspc/oid/TABLESPACE_VERSION_DIRECTORY is missing, it is created on demand) it is done on OpenTemporaryFileInTablespace, at fd.c (I wonder if shouldn't we do that for any tablespace) and on TablespaceCreateDbspace, at tablespace.c. Right now PostgreSQL appears to rely on the absence of the tablespace directory as a flag to tell it don't start up, something's badly wrong here. If the user creates the tablespace directory, it figures they at least know what they're doing and carries on starting up with the empty tablespace. It's treated as an admin statement - I know it's gone, it's not coming back, just carry on as best you can. If Pg were to auto-create the directory, then if (say) a mount of a tablespace dir failed at boot, Pg would still happily start up and might create files in the tablespace, despite there being inaccessible tables/indexes/etc on the not-mounted volume that's *supposed* to be the tablespace storage. That'd create plenty of mess. So no, I don't think Pg should auto-create the tablespace directory if it's missing for non-temporary tablespaces. Not unless the current (less than friendly) behaviour around lost tablespaces is replaced with something like an `ignore_missing_tablespaces` or `drop_missing_tablespaces` GUC - akin to our existing `zero_missing_pages` recovery GUC. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
I checked that it's reporting the right tableoid now. BTW, why aren't you using the tlist passed to this function? I guess create_scan_plan() passes tlist after processing it, so that should be used rather than rel-reltargetlist. On Mon, Jun 30, 2014 at 12:22 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/06/24 16:30), Etsuro Fujita wrote: (2014/06/23 18:35), Ashutosh Bapat wrote: Selecting tableoid on parent causes an error, ERROR: cannot extract system attribute from virtual tuple. The foreign table has an OID which can be reported as tableoid for the rows coming from that foreign table. Do we want to do that? No. I think it's a bug. I'll fix it. Done. I think this is because create_foreignscan_plan() makes reference to attr_needed, which isn't computed for inheritance children. To aboid this, I've modified create_foreignscan_plan() to see reltargetlist and baserestrictinfo, instead of attr_needed. Please find attached an updated version of the patch. Sorry for the delay. Best regards, Etsuro Fujita -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] replicating DROP commands across servers
Hi. I thought I'd review this patch, since pgaudit uses the pg_event_trigger_dropped_objects function. I went through the patch line by line, and I don't really have anything to say about it. I notice that there are some XXX/FIXME comments in the code, but it's not clear if those need to (or can be) fixed before the code is committed. Everything else looks good. I think this is ready for committer. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IMPORT FOREIGN SCHEMA statement
Le lundi 30 juin 2014 16:43:38 Michael Paquier a écrit : On Thu, Jun 19, 2014 at 11:00 PM, Michael Paquier michael.paqu...@gmail.com wrote: This seems to be related to re-using the table-name between invocations. The attached patch should fix point 2. As for point 1, I don't know the cause for it. Do you have a reproducible test case ? Sure. I'll try harder when looking in more details at the patch for postgres_fdw :) With v2, I have tried randomly some of the scenarios of v1 plus some extras, but was not able to reproduce it. I'll look into the patch for postgres_fdw later. And here are some comments about it, when applied on top of the other 2 patches. 1) Code compiles without warning, regression tests pass. 2) In fetch_remote_tables, the palloc for the parameters should be done after the number of parameters is determined. In the case of IMPORT_ALL, some memory is wasted for nothing. 3) Current code is not able to import default expressions for a table. A little bit of processing with pg_get_expr would be necessary: select pg_catalog.pg_get_expr(adbin, adrelid) from pg_attrdef; There are of course bonus cases like SERIAL columns coming immediately to my mind but it would be possible to determine if a given column is serial with pg_get_serial_sequence. This would be a good addition for the FDW IMO. 4) The same applies of course to the constraint name: CREATE FOREIGN TABLE foobar (a int CONSTRAINT toto NOT NULL) for example. 5) A bonus idea: enable default expression obtention and null/not null switch by default but add options to disable their respective obtention. 6) Defining once PGFDW_IMPORTRESULT_NUMCOLS at the top of postgres_fdw.c without undefining would be perfectly fine. 7) In postgresImportForeignSchema, the palloc calls and the definitions of the variables used to save the results should be done within the for loop. 8) At quick glance, the logic of postgresImportForeignSchema looks awkward... I'll have a second look with a fresher mind later on this one. While having a second look at the core patch, I have found myself re-hacking it, fixing a couple of bugs and adding things that have been missing in the former implementation: - Deletions of unnecessary structures to simplify code and make it cleaner - Fixed a bug related to the management of local schema name. A FDW was free to set the schema name where local tables are created, this should not be the case. - Improved documentation, examples and other things, fixed doc padding for example - Added some missing stuff in equalfuncs.c and copyfuncs.c - Some other things. With that, core patch looks pretty nice actually, and I think that we should let a committer have a look at this part at least for this CF. Also, the postgres_fdw portion has been updated based on the previous core patch modified, using a version that Ronan sent me, which has addressed the remarks I sent before. This patch is still lacking documentation, some cleanup, and regression tests are broken, but it can be used to test the core feature. I unfortunately don't have much time today but I am sending this patch either way to let people play with IMPORT SCHEMA if I don't come back to it before. The regression tests fail because of a typo in pg_type.h: BOOLARRAYOID should be defined to 1000, not 1003 (which clashes against NAMEARRAYOID). What do you think should be documented, and where ? Regards, -- Ronan Dunklau http://dalibo.com - http://dalibo.orgFrom 1cc2922e9087f2d852d2b1196314d7e35dce42a3 Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Mon, 30 Jun 2014 16:36:47 +0900 Subject: [PATCH 2/2] Add support of IMPORT SCHEMA for postgres_fdw --- contrib/postgres_fdw/expected/postgres_fdw.out | 62 ++ contrib/postgres_fdw/postgres_fdw.c| 259 + contrib/postgres_fdw/sql/postgres_fdw.sql | 30 +++ src/include/catalog/pg_type.h | 2 + 4 files changed, 353 insertions(+) diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 2e49ee3..07e6c11 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -2834,3 +2834,65 @@ NOTICE: NEW: (13,test triggered !) (0,27) (1 row) +-- Test IMPORT FOREIGN SCHEMA statement +CREATE schema import_destination; +CREATE schema import_source; +CREATE TABLE import_source.t1 (c1 int, c2 varchar NOT NULL); +CREATE TABLE import_source.t2 (c1 int, c2 varchar); +CREATE TABLE import_source.t3 (c1 int, c2 varchar); +IMPORT FOREIGN SCHEMA import_source FROM SERVER loopback INTO import_destination; +\det+ import_destination; +List of foreign tables + Schema | Table | Server | FDW Options | Description
Re: [HACKERS] psql: show only failed queries
On 30 June 2014 12:24, Pavel Stehule Wrote: I have reviewed this patch. Please find my review comments below: 1. Command start-up option (e.g. -a/--echo-all for --ECHO=all), for new functionality is not provided. all not options entered via psql variables has psql option and psql comment. I'll plan add new decription to --help-variables list. If it is necessary I can add long option --echo-errors, I didn't a good char for short option. Any idea? But the new option we are adding are on a track of existing option, so better we add start-up option for this also. Yeah long option –echo-errors seems to be fine to me also. For short option, I think we can use “-b” stands for blunder. This is the closest one I could think of. 2. New Command start-up option should be added in psql --help as well as in documentation. depends on previous, Right. Also as I understand, this new option is kind of sub-set of existing option (ECHO=query), so should not we display query string in the same format as it was getting printed earlier. Though I also feel that prefixing query with STATEMENT word will be helpful to grep but at the same time I am worried about inconsistency with existing option. This is question. And I am not strong in feeling what should be preferred. But still I am inclined to prefer a variant with STATEMENT prefix. Mode with -a is used with different purpose than mode show errors only - and output with prefix is much more consistent with log entry - and displaying error. So I agree, so there is potential inconsistency (but nowhere is output defined), but this output is more practical, when you are concentrated to error's processing. Yeah right, I just wanted to raise point to provoke other thought to see if anyone having different opinion. If no objection from others, we can go ahead with the current prefixing approach. Thanks and Regards, Kumar Rajeev Rastogi
Re: [HACKERS] better atomics - v0.5
On 2014-06-30 11:04:53 +0530, Amit Kapila wrote: On Sun, Jun 29, 2014 at 2:54 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-06-29 11:53:28 +0530, Amit Kapila wrote: On Sat, Jun 28, 2014 at 1:48 PM, Andres Freund and...@2ndquadrant.com I think it is better to have some discussion about it. Apart from this, today I noticed atomic read/write doesn't use any barrier semantics and just rely on volatile. Yes, intentionally so. It's often important to get/set the current value without the cost of emitting a memory barrier. It just has to be a recent value and it actually has to come from memory. I agree with you that we don't want to incur the cost of memory barrier for get/set, however it should not be at the cost of correctness. I really can't follow here. A volatile read/store forces it to go to memory without barriers. The ABIs of all platforms we work with guarantee that 4bytes stores/reads are atomic - we've been relying on that for a long while. And that's actually enforced by the current definition - I think? Yeah, this is the only point which I am trying to ensure, thats why I sent you few links in previous mail where I had got some suspicion that just doing get/set with volatile might lead to correctness issue in some cases. But this isn't something this patch started doing - we've been doing that for a long while? Some another Note, I found today while reading more on it which suggests that my previous observation is right: Within a thread of execution, accesses (reads and writes) to all volatile objects are guaranteed to not be reordered relative to each other, but this order is not guaranteed to be observed by another thread, since volatile access does not establish inter-thread synchronization. http://en.cppreference.com/w/c/atomic/memory_order That's just saying there's no ordering guarantees with volatile stores/reads. I don't see a contradiction? b) It's only supported from vista onwards. Afaik we still support XP. #ifndef pg_memory_barrier_impl #define pg_memory_barrier_impl() MemoryBarrier() #endif The MemoryBarrier() macro used also supports only on vista onwards, so I think for reasons similar to using MemoryBarrier() can apply for this as well. I think that's odd because barrier.h has been using MemoryBarrier() without a version test for a long time now. I guess everyone uses a new enough visual studio. Yeap or might be the people who even are not using new enough version doesn't have a use case that can hit a problem due to MemoryBarrier() Well, with barrier.h as it stands they'd get a compiler error if it indeed is unsupported? But I think there's something else going on - msft might just be removing XP from it's documentation. Postgres supports building with with visual studio 2005 and MemoryBarrier() seems to be supported by it. I think we'd otherwise seen problems since 9.1 where barrier.h was introduced. In this case, I have a question for you. Un-patched usage in barrier.h is as follows: .. #elif defined(__ia64__) || defined(__ia64) #define pg_compiler_barrier() _Asm_sched_fence() #define pg_memory_barrier() _Asm_mf() #elif defined(WIN32_ONLY_COMPILER) /* Should work on both MSVC and Borland. */ #include intrin.h #pragma intrinsic(_ReadWriteBarrier) #define pg_compiler_barrier() _ReadWriteBarrier() #define pg_memory_barrier() MemoryBarrier() #endif If I understand correctly the current define mechanism in barrier.h, it will have different definition for Itanium processors even for windows. Either noone has ever tested postgres on itanium windows (quite possible), because afaik _Asm_sched_fence() doesn't work on windows/msvc, or windows doesn't define __ia64__/__ia64. Those macros arefor HP's acc on HPUX. However the patch defines as below: #if defined(HAVE_GCC__ATOMIC_INT32_CAS) || defined(HAVE_GCC__SYNC_INT32_CAS) # include storage/atomics-generic-gcc.h #elif defined(WIN32_ONLY_COMPILER) # include storage/atomics-generic-msvc.h What I can understand from above is that defines in storage/atomics-generic-msvc.h, will override any previous defines for compiler/memory barriers and _ReadWriteBarrier()/MemoryBarrier() will be considered for Windows always. Well, the memory barrier is surrounded by #ifndef pg_memory_barrier_impl. The compiler barrier can't reasonably be defined earlier since it's a compiler not an architecture thing. 6. pg_atomic_compare_exchange_u32() It is better to have comments above this and all other related functions. Okay, generally code has such comments on top of function definition rather than with declaration. I don't want to have it on the _impl functions because they're duplicated for the individual platforms and will just become out of date... Sure, I was also not asking for _impl functions. What I was asking in this point was to have comments on top of definition of
[HACKERS] uninitialized values in revised prepared xact code
Hi, I've just rerun valgrind for the first time in a while and saw the following splat. My guess is it exists since bb38fb0d43c, but that's blindly guessing: ==2049== Use of uninitialised value of size 8 ==2049==at 0x4FE66D: EndPrepare (twophase.c:1063) ==2049==by 0x4F231B: PrepareTransaction (xact.c:2217) ==2049==by 0x4F2A38: CommitTransactionCommand (xact.c:2676) ==2049==by 0x79013E: finish_xact_command (postgres.c:2408) ==2049==by 0x78DE97: exec_simple_query (postgres.c:1062) ==2049==by 0x791FDD: PostgresMain (postgres.c:4010) ==2049==by 0x71B13B: BackendRun (postmaster.c:4113) ==2049==by 0x71A86D: BackendStartup (postmaster.c:3787) ==2049==by 0x71714C: ServerLoop (postmaster.c:1566) ==2049==by 0x716804: PostmasterMain (postmaster.c:1219) ==2049==by 0x679405: main (main.c:219) ==2049== Uninitialised value was created by a stack allocation ==2049==at 0x4FE16C: StartPrepare (twophase.c:942) ==2049== ==2049== Syscall param write(buf) points to uninitialised byte(s) ==2049==at 0x5C69640: __write_nocancel (syscall-template.S:81) ==2049==by 0x4FE6AE: EndPrepare (twophase.c:1064) ==2049==by 0x4F231B: PrepareTransaction (xact.c:2217) ==2049==by 0x4F2A38: CommitTransactionCommand (xact.c:2676) ==2049==by 0x79013E: finish_xact_command (postgres.c:2408) ==2049==by 0x78DE97: exec_simple_query (postgres.c:1062) ==2049==by 0x791FDD: PostgresMain (postgres.c:4010) ==2049==by 0x71B13B: BackendRun (postmaster.c:4113) ==2049==by 0x71A86D: BackendStartup (postmaster.c:3787) ==2049==by 0x71714C: ServerLoop (postmaster.c:1566) ==2049==by 0x716804: PostmasterMain (postmaster.c:1219) ==2049==by 0x679405: main (main.c:219) ==2049== Address 0x64694ed is 1,389 bytes inside a block of size 8,192 alloc'd ==2049==at 0x4C27B8F: malloc (vg_replace_malloc.c:298) ==2049==by 0x8E766E: AllocSetAlloc (aset.c:853) ==2049==by 0x8E8E04: MemoryContextAllocZero (mcxt.c:627) ==2049==by 0x8A54D3: AtStart_Inval (inval.c:704) ==2049==by 0x4F1DFC: StartTransaction (xact.c:1841) ==2049==by 0x4F28D1: StartTransactionCommand (xact.c:2529) ==2049==by 0x7900A7: start_xact_command (postgres.c:2383) ==2049==by 0x78DAF4: exec_simple_query (postgres.c:860) ==2049==by 0x791FDD: PostgresMain (postgres.c:4010) ==2049==by 0x71B13B: BackendRun (postmaster.c:4113) ==2049==by 0x71A86D: BackendStartup (postmaster.c:3787) ==2049==by 0x71714C: ServerLoop (postmaster.c:1566) ==2049== Uninitialised value was created by a stack allocation ==2049==at 0x4FE16C: StartPrepare (twophase.c:942) It's probably just padding - twophase.c:1063 is the CRC32 computation of the record data. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extending MSVC scripts to support --with-extra-version
Thank you for sharing updated patch. I have compared it with MSVC and configure generated build i.e. *MacOSX (*--with-extra-version -30JUN*)* pc1dotnetpk:inst asif$ ./bin/psql -d postgres psql (9.5devel-30JUN) Type help for help. postgres=# select version(); version PostgreSQL 9.5devel-30JUN on x86_64-apple-darwin13.2.0, compiled by Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn), 64-bit (1 row) pc1dotnetpk:inst asif$ ./bin/initdb -V initdb (PostgreSQL) 9.5devel-30JUN *Windows7 64bit (*extraver = '-30JUN'*)* C:\PG\postgresql\inst_withpatch_v2_extra-versionbin\psql -d postgres psql (9.5devel-30JUN) WARNING: Console code page (437) differs from Windows code page (1252) 8-bit characters might not work correctly. See psql reference page Notes for Windows users for details. Type help for help. postgres=# select version(); version -- PostgreSQL 9.5devel-30JUN, compiled by Visual C++ build 1600, 64-bit (1 row) C:\PG\postgresql\inst_withpatch_v2_extra-versionbin\initdb.exe -V initdb (PostgreSQL) 9.5devel-30JUN Patch looks good to me. I think it is ready for committer. Thanks. Regards, Muhammad Asif Naeem On Fri, Jun 27, 2014 at 5:00 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Fri, Jun 27, 2014 at 8:26 AM, Asif Naeem anaeem...@gmail.com wrote: I have spent some time reviewing the code. It applies well and PG master branch build fine with setting extraver or keep it undefined. Thanks for reviewing that. I have observed the following output applying the patch i.e. It seems that extraver information only appears when version function is being used. If we use -V (--version) with pg utilities/binaries, it does not include additional provided information. You are right. The first version of this patch updates PG_VERSION_STR but not PG_VERSION, which is the string used for all the binaries to report the version. Can you please guide how can I perform similar functionality via configure script (that can be used on Unix like OS/MinGW) or It is intended for Window specific requirement ?. Thanks. Sure, you can do the equivalent with plain configure like that: ./configure --with-extra-version=-foo --prefix=/to/path/ And here is the output that I get with such options on OSX for example: $ psql -c 'select substring(version(), 1, 52)' substring -- PostgreSQL 9.5devel-foo on x86_64-apple-darwin13.2.0 (1 row) $ initdb --version initdb (PostgreSQL) 9.5devel-foo With the new patch attached, the following output is generated for an MSVC build: $ psql -c 'select version()' version PostgreSQL 9.5devel-foo, compiled by Visual C++ build 1600, 64-bit (1 row) $ initdb --version initdb (PostgreSQL) 9.5devel-foo Regards, -- Michael
Re: [HACKERS] pg_receivexlog add synchronous mode
Thanks for the review! +if (secs = 0) +secs = 1;/* Always sleep at least 1 sec */ + +sleeptime = secs * 1000 + usecs / 1000; The above is the code which caused that problem. 'usecs' should have been reset to zero when 'secs' are rounded up to 1 second. But not. Attached is the updated version of the patch. Thank you for the refactoring v2 patch. I did a review of the patch. 1. applied cleanly and compilation was without warnings and errors 2. all regress tests was passed ok 3. sleeptime is ok when the --status-intarvall is set to 1 Regards, -- Furuya Osamu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction (WIP)
If I understand correctly, the design of this patch has already been considered earlier and rejected. So I guess the patch should also be marked rejected? -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
(2014/06/30 17:47), Ashutosh Bapat wrote: I checked that it's reporting the right tableoid now. Thank you for the check. BTW, why aren't you using the tlist passed to this function? I guess create_scan_plan() passes tlist after processing it, so that should be used rather than rel-reltargetlist. I think that that would be maybe OK, but I think that it would not be efficient to use the list to compute attrs_used, because the tlist would have more information than rel-reltargetlist in cases where the tlist is build through build_physical_tlist(). Thanks, Best regards, Etsuro Fujita On Mon, Jun 30, 2014 at 12:22 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote: (2014/06/24 16:30), Etsuro Fujita wrote: (2014/06/23 18:35), Ashutosh Bapat wrote: Selecting tableoid on parent causes an error, ERROR: cannot extract system attribute from virtual tuple. The foreign table has an OID which can be reported as tableoid for the rows coming from that foreign table. Do we want to do that? No. I think it's a bug. I'll fix it. Done. I think this is because create_foreignscan_plan() makes reference to attr_needed, which isn't computed for inheritance children. To aboid this, I've modified create_foreignscan_plan() to see reltargetlist and baserestrictinfo, instead of attr_needed. Please find attached an updated version of the patch. Sorry for the delay. Best regards, Etsuro Fujita -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql: show only failed queries
2014-06-30 11:20 GMT+02:00 Rajeev rastogi rajeev.rast...@huawei.com: On 30 June 2014 12:24, Pavel Stehule Wrote: I have reviewed this patch. Please find my review comments below: 1. Command start-up option (e.g. -a/--echo-all for --ECHO=all), for new functionality is not provided. all not options entered via psql variables has psql option and psql comment. I'll plan add new decription to --help-variables list. If it is necessary I can add long option --echo-errors, I didn't a good char for short option. Any idea? But the new option we are adding are on a track of existing option, so better we add start-up option for this also. Yeah long option –echo-errors seems to be fine to me also. For short option, I think we can use “-b” stands for blunder. This is the closest one I could think of. fixed see a attachment pls 2. New Command start-up option should be added in psql --help as well as in documentation. depends on previous, Right. Also as I understand, this new option is kind of sub-set of existing option (ECHO=query), so should not we display query string in the same format as it was getting printed earlier. Though I also feel that prefixing query with STATEMENT word will be helpful to grep but at the same time I am worried about inconsistency with existing option. This is question. And I am not strong in feeling what should be preferred. But still I am inclined to prefer a variant with STATEMENT prefix. Mode with -a is used with different purpose than mode show errors only - and output with prefix is much more consistent with log entry - and displaying error. So I agree, so there is potential inconsistency (but nowhere is output defined), but this output is more practical, when you are concentrated to error's processing. Yeah right, I just wanted to raise point to provoke other thought to see if anyone having different opinion. If no objection from others, we can go ahead with the current prefixing approach. ok, we can wait two days Regards Pavel *Thanks and Regards,* *Kumar Rajeev Rastogi* commit e05f23c69db53cefa0c3438c5eaeec11e5fa05b0 Author: Pavel Stehule pavel.steh...@gooddata.com Date: Mon Jun 30 12:39:42 2014 +0200 new -b, --echo-errors option diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index ee6ec3a..ccaf1c3 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -73,6 +73,18 @@ PostgreSQL documentation /varlistentry varlistentry + termoption-b//term + termoption--echo-errors//term + listitem + para + Print a failed SQL commands to standard error output. This is + equivalent to setting the variable varnameECHO/varname to + literalerrors/literal. + /para + /listitem +/varlistentry + +varlistentry termoption-c replaceable class=parametercommand/replaceable//term termoption--command=replaceable class=parametercommand/replaceable//term listitem @@ -2812,7 +2824,8 @@ bar literalqueries/literal, applicationpsql/application merely prints all queries as they are sent to the server. The switch for this is -option-e/option. +option-e/option. If set to literalerror/literal then only +failed queries are displayed. /para /listitem /varlistentry diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index 60169a2..6551b15 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -995,6 +995,9 @@ SendQuery(const char *query) results = NULL; /* PQclear(NULL) does nothing */ } + if (!OK pset.echo == PSQL_ECHO_ERRORS) + psql_error(STATEMENT: %s\n, query); + /* If we made a temporary savepoint, possibly release/rollback */ if (on_error_rollback_savepoint) { diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c index 3aa3c16..0128060 100644 --- a/src/bin/psql/help.c +++ b/src/bin/psql/help.c @@ -87,6 +87,7 @@ usage(void) printf(_(\nInput and output options:\n)); printf(_( -a, --echo-all echo all input from script\n)); + printf(_( -b --echo-errorsecho failed commands sent to server\n)); printf(_( -e, --echo-queries echo commands sent to server\n)); printf(_( -E, --echo-hiddendisplay queries that internal commands generate\n)); printf(_( -L, --log-file=FILENAME send session log to file\n)); diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h index 0a60e68..453d6c8 100644 --- a/src/bin/psql/settings.h +++ b/src/bin/psql/settings.h @@ -31,6 +31,7 @@ typedef enum { PSQL_ECHO_NONE, PSQL_ECHO_QUERIES, + PSQL_ECHO_ERRORS, PSQL_ECHO_ALL } PSQL_ECHO; diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c index 45653a1..3ca3f1e 100644 --- a/src/bin/psql/startup.c +++ b/src/bin/psql/startup.c @@ -354,6 +354,7 @@ parse_psql_options(int argc, char *argv[], struct adhoc_opts * options) {command,
Re: [HACKERS] Set new system identifier using pg_resetxlog
On Sun, Jun 29, 2014 at 11:18 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-06-29 19:44:21 +0530, Abhijit Menon-Sen wrote: At 2014-06-27 00:51:02 +0200, p...@2ndquadrant.com wrote: Also based on Alvaro's comment, I replaced the scanf parsing code with strtoul(l) function. As far as I can tell (from the thread and a quick look at the patch), your latest version of the patch addresses all the review comments. Should I mark this ready for committer now? Well, we haven't really agreed on a way forward yet. Robert disagrees that the feature is independently useful and thinks it might be dangerous for some users. I don't agree with either of those points, but that doesn't absolve the patch from the objection. I think tentatively have been more people in favor, but it's not clear cut imo. So what's the usecase of this feature? If it's for normal operation, using pg_resetxlog for that is a bit dangerous because it can corrupt the database easily. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql: show only failed queries
At 2014-06-30 12:48:30 +0200, pavel.steh...@gmail.com wrote: + para + Print a failed SQL commands to standard error output. This is + equivalent to setting the variable varnameECHO/varname to + literalerrors/literal. No a, just Print failed SQL commands …. -option-e/option. +option-e/option. If set to literalerror/literal then only +failed queries are displayed. Should be errors here, not error. printf(_( -a, --echo-all echo all input from script\n)); + printf(_( -b --echo-errorsecho failed commands sent to server\n)); printf(_( -e, --echo-queries echo commands sent to server\n)); Should have a comma after -b to match other options. Also I would remove sent to server from the description: echo failed commands is fine. Otherwise looks fine. I see no reason to wait for further feedback, so I'll mark this ready for committer if you make the above corrections. At some point, you should probably also update your --help-variables patch to add this new value to the description of ECHO. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?
On Mon, Jun 30, 2014 at 5:14 AM, Craig Ringer cr...@2ndquadrant.com wrote: Right now PostgreSQL appears to rely on the absence of the tablespace directory as a flag to tell it don't start up, something's badly wrong here. Well, in fact the behavior is just to give a LOG message: LOG: could not open tablespace directory pg_tblspc/100607/PG_9.3_201306121: No such file or directory But it starts fine. No? I ask because I'm relying on that. My patch does not on startup, so in the absence of the tablespace directory, the above LOG is shown even for temporary-tablespace, and it tries to create the directory when any process tries to use it. As we don't have catalog access on startup, it will require me to use some kind of _init fork for tablespace if we want it to be re-created during startup. Should we bother about this? If the user creates the tablespace directory, it figures they at least know what they're doing and carries on starting up with the empty tablespace. It's treated as an admin statement - I know it's gone, it's not coming back, just carry on as best you can. Well, I think for a tablespace like that it is okay to create it on-demand (note that I have done this only for temp, not ordinary ones). If Pg were to auto-create the directory, then if (say) a mount of a tablespace dir failed at boot, Pg would still happily start up and might create files in the tablespace, despite there being inaccessible tables/indexes/etc on the not-mounted volume that's *supposed* to be the tablespace storage. That'd create plenty of mess. So no, I don't think Pg should auto-create the tablespace directory if it's missing for non-temporary tablespaces. Ok. I agree. Let's create only for temporary ones (as done by the patch now). Not unless the current (less than friendly) behaviour around lost tablespaces is replaced with something like an `ignore_missing_tablespaces` or `drop_missing_tablespaces` GUC - akin to our existing `zero_missing_pages` recovery GUC. You meant `zero_damaged_pages`, I think. I don't think that is really necessary in this case. Anyway, I was thinking about some way to this tablespace to also allow creation of non-temporary indexes, and after a crash we could mark such indexes as `NOT VALID` (perhaps `zero_damaged_pages` could be of help here?!). That should be part of another patch thought, and would require more thoughts. But I think that would be a great improvement for some environments, like those on AWS with their ephemeral SSDs. Regards, -- Matheus de Oliveira Analista de Banco de Dados Dextra Sistemas - MPS.Br nível F! www.dextra.com.br/postgres
Re: [HACKERS] Set new system identifier using pg_resetxlog
On 2014-06-30 19:57:58 +0900, Fujii Masao wrote: On Sun, Jun 29, 2014 at 11:18 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-06-29 19:44:21 +0530, Abhijit Menon-Sen wrote: At 2014-06-27 00:51:02 +0200, p...@2ndquadrant.com wrote: Also based on Alvaro's comment, I replaced the scanf parsing code with strtoul(l) function. As far as I can tell (from the thread and a quick look at the patch), your latest version of the patch addresses all the review comments. Should I mark this ready for committer now? Well, we haven't really agreed on a way forward yet. Robert disagrees that the feature is independently useful and thinks it might be dangerous for some users. I don't agree with either of those points, but that doesn't absolve the patch from the objection. I think tentatively have been more people in favor, but it's not clear cut imo. So what's the usecase of this feature? If it's for normal operation, using pg_resetxlog for that is a bit dangerous because it can corrupt the database easily. a) Mark a database as not being the same. Currently if you promote two databases, e.g. to shard out, they'll continue to have same system identifier. Which really sucks, especially because timeline ids will often increase synchronously. b) For data recovery it's sometimes useful to create a new database (with the same catalog state) and replay all WAL. For that you need to reset the system identifier. I've done so hacking up resetxlog before. c) We already allow to set pretty much all aspects of the control file via resetxlog - there seems little point of not having the ability to change the system identifier. d) In a logical replication scenario one way to identify individual nodes is via the system identifier. If you want to convert a basebackup into logical standby one sensible way to do so is to create a logical replication slots *before* promoting a physical backup to guarantee that slot is able to stream out all changes. If the slot names contain the consumer's system identifier you need to know the new system identifier beforehand. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
On Mon, Jun 30, 2014 at 4:17 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/06/30 17:47), Ashutosh Bapat wrote: I checked that it's reporting the right tableoid now. Thank you for the check. BTW, why aren't you using the tlist passed to this function? I guess create_scan_plan() passes tlist after processing it, so that should be used rather than rel-reltargetlist. I think that that would be maybe OK, but I think that it would not be efficient to use the list to compute attrs_used, because the tlist would have more information than rel-reltargetlist in cases where the tlist is build through build_physical_tlist(). In that case, we can call build_relation_tlist() for foreign tables. Thanks, Best regards, Etsuro Fujita On Mon, Jun 30, 2014 at 12:22 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote: (2014/06/24 16:30), Etsuro Fujita wrote: (2014/06/23 18:35), Ashutosh Bapat wrote: Selecting tableoid on parent causes an error, ERROR: cannot extract system attribute from virtual tuple. The foreign table has an OID which can be reported as tableoid for the rows coming from that foreign table. Do we want to do that? No. I think it's a bug. I'll fix it. Done. I think this is because create_foreignscan_plan() makes reference to attr_needed, which isn't computed for inheritance children. To aboid this, I've modified create_foreignscan_plan() to see reltargetlist and baserestrictinfo, instead of attr_needed. Please find attached an updated version of the patch. Sorry for the delay. Best regards, Etsuro Fujita -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] psql: show only failed queries
2014-06-30 13:01 GMT+02:00 Abhijit Menon-Sen a...@2ndquadrant.com: At 2014-06-30 12:48:30 +0200, pavel.steh...@gmail.com wrote: + para + Print a failed SQL commands to standard error output. This is + equivalent to setting the variable varnameECHO/varname to + literalerrors/literal. No a, just Print failed SQL commands …. -option-e/option. +option-e/option. If set to literalerror/literal then only +failed queries are displayed. Should be errors here, not error. printf(_( -a, --echo-all echo all input from script\n)); + printf(_( -b --echo-errorsecho failed commands sent to server\n)); printf(_( -e, --echo-queries echo commands sent to server\n)); Should have a comma after -b to match other options. Also I would remove sent to server from the description: echo failed commands is fine. fixed Otherwise looks fine. I see no reason to wait for further feedback, so I'll mark this ready for committer if you make the above corrections. At some point, you should probably also update your --help-variables patch to add this new value to the description of ECHO. I have it in TODO. But I don't would to introduce a dependency between these patches - so when first patch will be committed, than I update second patch Regards Pavel -- Abhijit commit 998203a2e35643430059c4d3490a176a830cfee2 Author: Pavel Stehule pavel.steh...@gooddata.com Date: Mon Jun 30 12:39:42 2014 +0200 new -b, --echo-errors option diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index ee6ec3a..2c4cc96 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -73,6 +73,18 @@ PostgreSQL documentation /varlistentry varlistentry + termoption-b//term + termoption--echo-errors//term + listitem + para + Print failed SQL commands to standard error output. This is + equivalent to setting the variable varnameECHO/varname to + literalerrors/literal. + /para + /listitem +/varlistentry + +varlistentry termoption-c replaceable class=parametercommand/replaceable//term termoption--command=replaceable class=parametercommand/replaceable//term listitem @@ -2812,7 +2824,8 @@ bar literalqueries/literal, applicationpsql/application merely prints all queries as they are sent to the server. The switch for this is -option-e/option. +option-e/option. If set to literalerrors/literal then only +failed queries are displayed. /para /listitem /varlistentry diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index 60169a2..6551b15 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -995,6 +995,9 @@ SendQuery(const char *query) results = NULL; /* PQclear(NULL) does nothing */ } + if (!OK pset.echo == PSQL_ECHO_ERRORS) + psql_error(STATEMENT: %s\n, query); + /* If we made a temporary savepoint, possibly release/rollback */ if (on_error_rollback_savepoint) { diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c index 3aa3c16..f8f000f 100644 --- a/src/bin/psql/help.c +++ b/src/bin/psql/help.c @@ -87,6 +87,7 @@ usage(void) printf(_(\nInput and output options:\n)); printf(_( -a, --echo-all echo all input from script\n)); + printf(_( -b, --echo-errorsecho failed commands\n)); printf(_( -e, --echo-queries echo commands sent to server\n)); printf(_( -E, --echo-hiddendisplay queries that internal commands generate\n)); printf(_( -L, --log-file=FILENAME send session log to file\n)); diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h index 0a60e68..453d6c8 100644 --- a/src/bin/psql/settings.h +++ b/src/bin/psql/settings.h @@ -31,6 +31,7 @@ typedef enum { PSQL_ECHO_NONE, PSQL_ECHO_QUERIES, + PSQL_ECHO_ERRORS, PSQL_ECHO_ALL } PSQL_ECHO; diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c index 45653a1..3ca3f1e 100644 --- a/src/bin/psql/startup.c +++ b/src/bin/psql/startup.c @@ -354,6 +354,7 @@ parse_psql_options(int argc, char *argv[], struct adhoc_opts * options) {command, required_argument, NULL, 'c'}, {dbname, required_argument, NULL, 'd'}, {echo-queries, no_argument, NULL, 'e'}, + {echo-errors, no_argument, NULL, 'b'}, {echo-hidden, no_argument, NULL, 'E'}, {file, required_argument, NULL, 'f'}, {field-separator, required_argument, NULL, 'F'}, @@ -402,6 +403,9 @@ parse_psql_options(int argc, char *argv[], struct adhoc_opts * options) case 'A': pset.popt.topt.format = PRINT_UNALIGNED; break; + case 'b': +SetVariable(pset.vars, ECHO, errors); +break; case 'c': options-action_string = pg_strdup(optarg); if (optarg[0] == '\\') @@ -720,6 +724,8 @@ echo_hook(const char *newval) pset.echo = PSQL_ECHO_NONE; else if (strcmp(newval, queries) == 0) pset.echo =
Re: [HACKERS] [BUGS] BUG #9652: inet types don't support min/max
Hi Haribabu, I am not able to apply latest patch on REL9_4_STABLE or master branch i.e. pc1dotnetpk:postgresql asif$ git apply ~/core/min_max_support_for_inet_datatypes/inet_agg_v4.patch fatal: unrecognized input pc1dotnetpk:postgresql asif$ patch -p0 ~/core/min_max_support_for_inet_datatypes/inet_agg_v4.patch can't find file to patch at input line 3 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -- |*** a/src/backend/utils/adt/network.c |--- b/src/backend/utils/adt/network.c -- File to patch: Is there any other utility required to apply the patch, Can you please guide ?. Thanks. Regards, Muhammad Asif Naeem On Thu, Jun 5, 2014 at 6:28 AM, Haribabu Kommi kommi.harib...@gmail.com wrote: On Thu, Jun 5, 2014 at 9:12 AM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2014-06-04 10:37:48 +1000, Haribabu Kommi wrote: Thanks for the test. Patch is re-based to the latest head. Did you look at the source of the conflict? Did you intentionally mark the functions as leakproof and reviewed that that truly is the case? Or was that caused by copy paste? Yes it is copy paste mistake. I didn't know much about that parameter. Thanks for the information. I changed it. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-bugs mailing list (pgsql-b...@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [HACKERS] Extending MSVC scripts to support --with-extra-version
On Mon, Jun 30, 2014 at 7:05 PM, Asif Naeem anaeem...@gmail.com wrote: Patch looks good to me. I think it is ready for committer. Thanks. Thanks for the extra checks and for taking the time to review it. -- Michael On Mon, Jun 30, 2014 at 7:05 PM, Asif Naeem anaeem...@gmail.com wrote: Thank you for sharing updated patch. I have compared it with MSVC and configure generated build i.e. MacOSX (--with-extra-version -30JUN) pc1dotnetpk:inst asif$ ./bin/psql -d postgres psql (9.5devel-30JUN)Type help for help.postgres=# select version(); version PostgreSQL 9.5devel-30JUN on x86_64-apple-darwin13.2.0, compiled by Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn), 64-bit(1 row)pc1dotnetpk:inst asif$ ./bin/initdb -Vinitdb (PostgreSQL) 9.5devel-30JUN Windows7 64bit (extraver = -30JUN) C:\PG\postgresql\inst_withpatch_v2_extra-versionbin\psql -d postgrespsql (9.5devel-30JUN)WARNING: Console code page (437) differs from Windows code page (1252) 8-bit characters might not work correctly. See psql reference page Notes for Windows users for details.Type help for help.postgres=# select version(); version-- PostgreSQL 9.5devel-30JUN, compiled by Visual C++ build 1600, 64-bit(1 row)C:\PG\postgresql\inst_withpatch_v2_extra-versionbin\initdb.exe -Vinitdb (PostgreSQL) 9.5devel-30JUN Patch looks good to me. I think it is ready for committer. Thanks. Regards,Muhammad Asif NaeemOn Fri, Jun 27, 2014 at 5:00 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Fri, Jun 27, 2014 at 8:26 AM, Asif Naeem anaeem...@gmail.com wrote: I have spent some time reviewing the code. It applies well and PG master branch build fine with setting extraver or keep it undefined. Thanks for reviewing that. I have observed the following output applying the patch i.e. It seems that extraver information only appears when version function is being used. If we use -V (--version) with pg utilities/binaries, it does not include additional provided information. You are right. The first version of this patch updates PG_VERSION_STR but not PG_VERSION, which is the string used for all the binaries to report the version. Can you please guide how can I perform similar functionality via configure script (that can be used on Unix like OS/MinGW) or It is intended for Window specific requirement ?. Thanks. Sure, you can do the equivalent with plain configure like that:./configure --with-extra-version=-foo --prefix=/to/path/And here is the output that I get with such options on OSX for example: $ psql -c select substring(version(), 1, 52) substring -- PostgreSQL 9.5devel-foo on x86_64-apple-darwin13.2.0 (1 row)$ initdb --versioninitdb (PostgreSQL) 9.5devel-fooWith the new patch attached, the following output is generated for an MSVC build:$ psql -c select version() version PostgreSQL 9.5devel-foo, compiled by Visual C++ build 1600, 64-bit(1 row)$ initdb --versioninitdb (PostgreSQL) 9.5devel-foo Regards,-- Michael -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #9652: inet types don't support min/max
At 2014-06-30 16:35:45 +0500, anaeem...@gmail.com wrote: pc1dotnetpk:postgresql asif$ patch -p0 ~/core/min_max_support_for_inet_datatypes/inet_agg_v4.patch can't find file to patch at input line 3 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -- |*** a/src/backend/utils/adt/network.c |--- b/src/backend/utils/adt/network.c -- You need to use patch -p1, to strip off the a/b prefix. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spinlocks and compiler/memory barriers
On Sat, Jun 28, 2014 at 9:41 AM, Andres Freund and...@2ndquadrant.com wrote: No, I think it's going to be *much* simpler than that. How about I take a crack at this next week and then either (a) I'll see why it's a bad idea and we can go from there or (b) you can review what I come up with and tell me why it sucks? Ok. I think that's going in the wrong direction (duplication of nontrivial knowledge), but maybe I'm taking a to 'purist' approach here. Prove me wrong :) I'm not sure what you'll think of this, but see attached. I think newer releases of Sun Studio support that GCC way of doing a barrier, but I don't know how to test for that, so at the moment that uses the fallback put it in a function approach, along with HPPA non-GCC and Univel CC. Those things are obscure enough that I don't care about making them less performance. I think AIX is actually OK as-is; if _check_lock() is a compiler barrier but _clear_lock() is not, that would be pretty odd. How are you suggesting we deal with the generic S_UNLOCK case without having a huge ifdef? Why do we build an abstraction layer (barrier.h) and then not use it? Because (1) the abstraction doesn't fit very well unless we do a lot of additional work to build acquire and release barriers for every platform we support and Meh. Something like the (untested): #if !defined(pg_release_barrier) defined(pg_read_barrier) defined(pg_write_barrier) #define pg_release_barrier() do { pg_read_barrier(); pg_write_barrier();} while (0) #elif !defined(pg_release_barrier) #define pg_release_barrier() pg_memory_barrier() #endif before the fallback definition of pg_read/write_barrier should suffice? That doesn't sound like a good idea. For example, on PPC, a read barrier is lwsync, and so is a write barrier. That would also suck on any architecture where either a read or write barrier gets implemented as a full memory barrier, though I guess it might be smart enough to optimize away most of the cost of the second of two barriers in immediate succession. I think if we want to use barrier.h as the foundation for this, we're going to need to define a new set of things in there that have acquire and release semantics, or just use full barriers for everything - which would be wasteful on, e.g., x86. And I don't particularly see the point in going to a lot of work to invent those new abstractions everywhere when we can just tweak s_lock.h in place a bit and be done with it. Making those files interdependent doesn't strike me as a win. (2) I don't have much confidence that we can depend on the spinlock fallback for barriers without completely breaking obscure platforms, and I'd rather make a more minimal set of changes. Well, it's the beginning of the cycle. And we're already depending on barriers for correctness (and it's not getting less), so I don't really see what avoiding barrier usage buys us except harder to find breakage? If we use barriers to implement any facility other than spinlocks, I have reasonable confidence that we won't break things more than they already are broken today, because I think the fallback to acquire-and-release a spinlock probably works, even though it's likely terrible for performance. I have significantly less confidence that trying to implement spinlocks in terms of barriers is going to be reliable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/storage/lmgr/s_lock.c b/src/backend/storage/lmgr/s_lock.c index efe1b43..38dc34d 100644 --- a/src/backend/storage/lmgr/s_lock.c +++ b/src/backend/storage/lmgr/s_lock.c @@ -154,6 +154,17 @@ s_lock(volatile slock_t *lock, const char *file, int line) return delays; } +#ifdef USE_DEFAULT_S_UNLOCK +void +s_unlock(slock_t *lock) +{ +#ifdef TAS_ACTIVE_WORD + *TAS_ACTIVE_WORD(lock) = -1; +#else + *lock = 0; +#endif +} +#endif /* * Set local copy of spins_per_delay during backend startup. diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h index 895abe6..f1a89dc 100644 --- a/src/include/storage/s_lock.h +++ b/src/include/storage/s_lock.h @@ -55,14 +55,16 @@ * on Alpha TAS() will fail if interrupted. Therefore a retry loop must * always be used, even if you are certain the lock is free. * - * Another caution for users of these macros is that it is the caller's - * responsibility to ensure that the compiler doesn't re-order accesses - * to shared memory to precede the actual lock acquisition, or follow the - * lock release. Typically we handle this by using volatile-qualified - * pointers to refer to both the spinlock itself and the shared data - * structure being accessed within the spinlocked critical section. - * That fixes it because compilers are not allowed to re-order accesses - * to volatile objects relative to other such accesses. + * It is the responsibility of these macros to make sure that the compiler + * does not re-order accesses to shared
Re: [HACKERS] IMPORT FOREIGN SCHEMA statement
On Mon, Jun 30, 2014 at 6:01 PM, Ronan Dunklau ronan.dunk...@dalibo.com wrote: BOOLARRAYOID should be defined to 1000, not 1003 (which clashes against NAMEARRAYOID). Oops, a bad copy-paste. Thanks for catching that. I had a more detailed look at the postgres_fdw patch: 1) Having an example of options usable with postgres_fdw would be a good thing to test the FDW API more extensively. What about simply disable_nulls to disable NULL/NOT NULL creation on the tables imported. 2) (Should have noticed that earlier, sorry) I don't see any advantages but only complications in using PQexecParams to launch the queries checking schema existence remotely and fetching back table definitions as we use them only once per import. Why not removing the parameter part and build only plain query strings using StringInfo? This would have the merit to really simplify fetch_remote_tables. 3) If you add an option, let's get rid of PGFDW_IMPORTRESULT_NUMCOLS as the number of result columns would change. 4) Instead of using a double-layer of arrays when processing results, I think that it would make the whole process more readable to have one array for each result column (attname, atttype, atttypmod, attnotnull) initialized each time a new row is processed and then process through them to get the array items. I imagine that a small function doing the array parsing called on each array result would bring more readability as well to the process flow. 5) This could be costly with large sets of tables in LIMIT TO... I imagine that we can live with this O(n log(n)) process as LIMIT TO should not get big.. foreach(lc1, table_names) { found = false; looked_up = ((RangeVar *) lfirst(lc1))-relname; foreach(lc2, tables) What do you think should be documented, and where ? Two things coming to my mind: - If postgres_fdw can use options with IMPORT, they should be explicitly listed in the section FDW Options of postgres_fdw. - Documenting that postgres_fdw support IMPORT FOREIGN SCHEMA. A simple paragraph in the main section of postgres_fdw containing descriptions would be enough. Once we get that done, I'll do another round of review on this patch and I think that we will be able to mark it as ready for committer. Regards, -- Michael
Re: [HACKERS] Priority table or Cache table
On Tue, Jun 3, 2014 at 9:50 AM, Haribabu Kommi kommi.harib...@gmail.com wrote: Sorry for the late reply. Thanks for the test. Please find the re-based patch with a temp fix for correcting the problem. I will a submit a proper patch fix later. Please note that the new patch still gives assertion error: TRAP: FailedAssertion(!(buf-freeNext != (-2)), File: freelist.c, Line: 178) psql:load_test.sql:5: connection to server was lost Hence, the patch was installed with assertions off. I also ran the test script after making the same configuration changes that you have specified. I found that I was not able to get the same performance difference that you have reported. Following table lists the tps in each scenario and the % increase in performance. Threads Head PatchedDiff 1 1669 1718 3% 2 2844 3195 12% 4 3909 4915 26% 8 7332 8329 14% Kindly let me know if I am missing something. -- Beena Emerson
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
On Fri, Jun 27, 2014 at 2:23 PM, Stephen Frost sfr...@snowman.net wrote: I disagree with Stephen's proposal that this should be in core, or that it needs its own dedicated syntax. I think contrib is a great place for extensions like this. That makes it a whole lot easier for people to develop customized vesions that meet particular needs they may have, and it avoids bloating core with a bunch of stuff that is only of interest to a subset of users. We spent years getting sepgsql into a form that could be shipped in contrib without substantially burdening core, and I don't see why this feature should be handed any differently. I don't exactly see how an extension or contrib module is going to be able to have a reasonable catalog structure which avoids the risk that renaming an object (role, schema, table, column, policy) will break the configuration without being part of core. At some level, I guess that's true, but a custom GUC would get you per-role and per-database and a custom reloption could take it further. A security label provider specifically for auditing can already associate custom metadata with just about any SQL-visible object in the system. Add on to that the desire for audit control to be a grant'able right along the lines of: GRANT AUDIT ON TABLE x TO audit_role; which allows a user who is not the table owner or a superuser to manage the auditing. As Tom would say, I think you just moved the goalposts into the next county. I don't know off-hand what that syntax is supposed to allow, and I don't think it's necessary for pgaudit to implement that (or anything else that you may want) in order to be a viable facility. In fact, considering how much variation there is likely to be between auditing requirements at different sites, I'm not sure this should even be in contrib at this point. It's clearly useful, but there is no requirement that every useful extension must be bundled with the core server, and in fact doing so severely limits our ability to make further changes. For my part, I do view auditing as being a requirement we should be addressing through core- and not just for the reasons mentioned above. We might be able to manage all of the above through an extension, however, as we continue to add capabilities and features, new types and methods, ways to extend the system, and more, auditing needs to keep pace and be considered a core feature as much as the GRANT system is. I think the fact that pgaudit does X and you think it should do Y is a perfect example of why we're nowhere close to being ready to push anything into core. We may very well want to do that someday, but not yet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS Design
On Sun, Jun 29, 2014 at 3:42 PM, Stephen Frost sfr...@snowman.net wrote: An interesting question we haven't much considered is: who can set up policies and add then to users? Maybe we should flip this around, and instead of adding users to policies, we should exempt users from policies. CREATE POLICY p1; And then, if they own p1 and t1, they can do: ALTER TABLE t1 SET POLICY p1 TO t1_p1_quals; (or maybe we should associate it to the policy instead of the table: ALTER POLICY p1 SET TABLE t1 TO t1_p1_quals) And then the policy applies to everyone who doesn't have the grantable EXEMPT privilege on the policy. The policy owner and superuser have that privilege by default and it can be handed out to others like this: GRANT EXEMPT ON POLICY p1 TO snowden; Then users who have row_level_security=on will bypass RLS if possible, and otherwise it will be applied. Users who have row_level_security=off will bypass RLS if possible, and otherwise error. And users who have row_level_security=force will apply RLS even if they are entitled to bypass it. That's interesting. I need to think some more about what that means. I'm not a fan of the EXEMPT approach.. Just out of curiosity, why not? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spinlocks and compiler/memory barriers
Hi, On 2014-06-30 08:03:40 -0400, Robert Haas wrote: On Sat, Jun 28, 2014 at 9:41 AM, Andres Freund and...@2ndquadrant.com wrote: No, I think it's going to be *much* simpler than that. How about I take a crack at this next week and then either (a) I'll see why it's a bad idea and we can go from there or (b) you can review what I come up with and tell me why it sucks? Ok. I think that's going in the wrong direction (duplication of nontrivial knowledge), but maybe I'm taking a to 'purist' approach here. Prove me wrong :) I'm not sure what you'll think of this, but see attached. I think it continues in the tradition of making s_lock.h ever harder to follow. But it's still better than what we have now from a correctness perspective. I think newer releases of Sun Studio support that GCC way of doing a barrier, but I don't know how to test for that, so at the moment that uses the fallback put it in a function approach, Sun studio's support for the gcc way is new (some update to sun studio 12), so that's probably not sufficient. #include mbarrier.h and __compiler_barrier()/__machine_rel_barrier() should do the trick for spinlocks. That seems to have existed for longer. Solaris seems to run with TSO enabled for SPARC (whereas linux uses RMO, relaxed memory ordering), so it's probably fine to just use the compiler barrier. along with HPPA non-GCC and Univel CC. Those things are obscure enough that I don't care about making them less performance. Fine with me. I think AIX is actually OK as-is; if _check_lock() is a compiler barrier but _clear_lock() is not, that would be pretty odd. Agreed. How are you suggesting we deal with the generic S_UNLOCK case without having a huge ifdef? Why do we build an abstraction layer (barrier.h) and then not use it? Because (1) the abstraction doesn't fit very well unless we do a lot of additional work to build acquire and release barriers for every platform we support and Meh. Something like the (untested): #if !defined(pg_release_barrier) defined(pg_read_barrier) defined(pg_write_barrier) #define pg_release_barrier() do { pg_read_barrier(); pg_write_barrier();} while (0) #elif !defined(pg_release_barrier) #define pg_release_barrier() pg_memory_barrier() #endif before the fallback definition of pg_read/write_barrier should suffice? That doesn't sound like a good idea. For example, on PPC, a read barrier is lwsync, and so is a write barrier. That would also suck on any architecture where either a read or write barrier gets implemented as a full memory barrier, though I guess it might be smart enough to optimize away most of the cost of the second of two barriers in immediate succession. Well, that's why I suggested only doing it if we haven't got a pg_release_barrier() defined. And fallback to memory_barrier() directly if read/write barriers are implemented using it so we don't have two memory barriers in a row. I think if we want to use barrier.h as the foundation for this, we're going to need to define a new set of things in there that have acquire and release semantics, or just use full barriers for everything - which would be wasteful on, e.g., x86. And I don't particularly see the point in going to a lot of work to invent those new abstractions everywhere when we can just tweak s_lock.h in place a bit and be done with it. Making those files interdependent doesn't strike me as a win. We'll need release semantics in more places than just s_lock.h. Anything that acts like a lock without using spinlocks actually needs acquire/release semantics. The lwlock patch only gets away with it because the atomic operations implementation happen to act as acquire or full memory barriers. (2) I don't have much confidence that we can depend on the spinlock fallback for barriers without completely breaking obscure platforms, and I'd rather make a more minimal set of changes. Well, it's the beginning of the cycle. And we're already depending on barriers for correctness (and it's not getting less), so I don't really see what avoiding barrier usage buys us except harder to find breakage? If we use barriers to implement any facility other than spinlocks, I have reasonable confidence that we won't break things more than they already are broken today, because I think the fallback to acquire-and-release a spinlock probably works, even though it's likely terrible for performance. I have significantly less confidence that trying to implement spinlocks in terms of barriers is going to be reliable. So you believe we have a reliable barrier implementation - but you don't believe it's reliable enough for spinlocks? If we'd *not* use the barrier fallback for spinlock release if, and only if, it's implemented via spinlocks, I don't see why we'd be worse off? +#if !defined(S_UNLOCK) +#if defined(__INTEL_COMPILER) +#define S_UNLOCK(lock) \ + do {
Re: [HACKERS] RLS Design
* Robert Haas (robertmh...@gmail.com) wrote: On Sun, Jun 29, 2014 at 3:42 PM, Stephen Frost sfr...@snowman.net wrote: An interesting question we haven't much considered is: who can set up policies and add then to users? Maybe we should flip this around, and instead of adding users to policies, we should exempt users from policies. CREATE POLICY p1; And then, if they own p1 and t1, they can do: ALTER TABLE t1 SET POLICY p1 TO t1_p1_quals; (or maybe we should associate it to the policy instead of the table: ALTER POLICY p1 SET TABLE t1 TO t1_p1_quals) And then the policy applies to everyone who doesn't have the grantable EXEMPT privilege on the policy. The policy owner and superuser have that privilege by default and it can be handed out to others like this: GRANT EXEMPT ON POLICY p1 TO snowden; Then users who have row_level_security=on will bypass RLS if possible, and otherwise it will be applied. Users who have row_level_security=off will bypass RLS if possible, and otherwise error. And users who have row_level_security=force will apply RLS even if they are entitled to bypass it. That's interesting. I need to think some more about what that means. I'm not a fan of the EXEMPT approach.. Just out of curiosity, why not? I don't see it as really solving the flexibility need and it feels quite a bit more complicated to reason about. Would someone who is EXEMPT from one policy on a given table still have other policies on that table applied to them? Would a user be able to be EXEMPT from multiple policies? I feel like that's what you're suggesting with this approach, otherwise I don't see it as really different from the 'DIRECT SELECT' privilege discussed previously.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] inherit support for foreign tables
Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes: Done. I think this is because create_foreignscan_plan() makes reference to attr_needed, which isn't computed for inheritance children. I wonder whether it isn't time to change that. It was coded like that originally only because calculating the values would've been a waste of cycles at the time. But this is at least the third place where it'd be useful to have attr_needed for child rels. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
Robert, * Robert Haas (robertmh...@gmail.com) wrote: On Fri, Jun 27, 2014 at 2:23 PM, Stephen Frost sfr...@snowman.net wrote: I don't exactly see how an extension or contrib module is going to be able to have a reasonable catalog structure which avoids the risk that renaming an object (role, schema, table, column, policy) will break the configuration without being part of core. At some level, I guess that's true, but a custom GUC would get you per-role and per-database and a custom reloption could take it further. A security label provider specifically for auditing can already associate custom metadata with just about any SQL-visible object in the system. Ugh. Yes, we could implement a lot of things using per-object GUCs and reloptions. We could do the same for columns with custom attoptions. For my part, that design looks absolutely horrible to me for more-or-less everything. I certainly don't feel like it's the solution which extension authors are looking for and will be happy with, nor do I feel it's the right answer for our users. Add on to that the desire for audit control to be a grant'able right along the lines of: GRANT AUDIT ON TABLE x TO audit_role; which allows a user who is not the table owner or a superuser to manage the auditing. As Tom would say, I think you just moved the goalposts into the next county. I don't know off-hand what that syntax is supposed to allow, and I don't think it's necessary for pgaudit to implement that (or anything else that you may want) in order to be a viable facility. Perhaps I should have used the same argument on the RLS thread.. Instead, I realized that you and others were right- we don't want to implement and commit something which would make adding the features being asked for very difficult to do in the future. The point of the command above is to allow a role *other* than the table owner to control the auditing on the table. It's important in certain places that the auditor have the ability to control the auditing and *only* the auditing- not drop the table or change its definition. I'm also not asking for this to be implemented today in pgaudit but rather to point out that we should have a design which at least allows us to get to the point where these features can be added. Perhaps there's a path to allowing the control I'm suggesting here with the existing pgaudit design and with pgaudit as an extension- if so, please speak up and outline it sufficiently that we can understand that path and know that we're not putting ourselves into a situation where we won't be able to support that flexibility. For my part, I do view auditing as being a requirement we should be addressing through core- and not just for the reasons mentioned above. We might be able to manage all of the above through an extension, however, as we continue to add capabilities and features, new types and methods, ways to extend the system, and more, auditing needs to keep pace and be considered a core feature as much as the GRANT system is. I think the fact that pgaudit does X and you think it should do Y is a perfect example of why we're nowhere close to being ready to push anything into core. We may very well want to do that someday, but not yet. That's fine- but don't push something in which will make it difficult to add these capabilities later (and, to be clear, I'm not asking out of pipe dreams and wishes but rather after having very specific discussions with users and reviewing documents like NIST 800-53, which is publically available for anyone to peruse). Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Spinlocks and compiler/memory barriers
On Mon, Jun 30, 2014 at 9:26 AM, Andres Freund and...@2ndquadrant.com wrote: I think it continues in the tradition of making s_lock.h ever harder to follow. But it's still better than what we have now from a correctness perspective. Well, as you and I have discussed before, someday we probably need to get rid of s_lock.h or at least refactor it heavily, but let's do one thing at a time. I think we're eventually going to require every platform that wants to run PostgreSQL to have working barriers and atomics, and then we can rewrite s_lock.h into something much shorter and cleaner, but I am opposed to doing that today, because even if we don't care about people running obscure proprietary compilers on weird platforms, there are still lots of people running older GCC versions. For right now, I think the prudent thing to do is keep s_lock.h on life support. I think newer releases of Sun Studio support that GCC way of doing a barrier, but I don't know how to test for that, so at the moment that uses the fallback put it in a function approach, Sun studio's support for the gcc way is new (some update to sun studio 12), so that's probably not sufficient. #include mbarrier.h and __compiler_barrier()/__machine_rel_barrier() should do the trick for spinlocks. That seems to have existed for longer. Can you either link to relevant documentation or be more specific about what needs to be done here? Solaris seems to run with TSO enabled for SPARC (whereas linux uses RMO, relaxed memory ordering), so it's probably fine to just use the compiler barrier. If it isn't, that's a change that has nothing to do with making spinlock operations compiler barriers and everything to do with fixing a bug in the existing implementation. So you believe we have a reliable barrier implementation - but you don't believe it's reliable enough for spinlocks? If we'd *not* use the barrier fallback for spinlock release if, and only if, it's implemented via spinlocks, I don't see why we'd be worse off? I can't parse this sentence because it's not clearly to me exactly which part the not applies to, and I think we're talking past each other a bit, too. Basically, every platform we support today has a spinlock implementation that is supposed to prevent CPU reordering across the acquire and release operations but might not prevent compiler reordering. IOW, S_LOCK() should be a CPU acquire fence, and S_UNLOCK() should be a CPU release fence. Now, we want to make these operations compiler fences as well, and AIUI your proposal for that is to make NEW_S_UNLOCK(lock) = out of line function call + S_LOCK(dummy) + S_UNLOCK(dummy) + S_UNLOCK(lock). My proposal is to make NEW_S_UNLOCK(lock) = out of line function call + S_UNLOCK(lock), which I think is strictly better. There's zip to guarantee that adding S_LOCK(dummy) + S_UNLOCK(dummy) is doing anything we want in there, and it's definitely going to be worse for performance. The other thing that I don't like about your proposal has to do with the fact that the support matrix for barrier.h and s_lock.h are not identical. If s_lock.h is confusing to you today, making it further intertwined with barrier.h is not going to make things better; at least, that confuses the crap out of me. Perhaps the only good thing to be said about this mess is that, right now, the dependency is in just one direction: barrier.h depends on s_lock.h, and not the other way around. At some future point we may well want to reverse the direction of that dependency, but to do that we need barrier.h to have a working barrier implementation for every platform that s_lock.h supports. Maybe we'll accomplish that by adding to barrier.h and and maybe we'll accomplish that by subtracting from s_lock.h but the short path to getting this issue fixed is to be agnostic to that question. 1) Afaics ARM will fall back to this for older gccs - and ARM is weakly ordered. I think I'd just also use swpb to release the lock. Something like #define S_INIT_LOCK(lock) (*(lock)) = 0); #define S_UNLOCK s_unlock static __inline__ void s_unlock(volatile slock_t *lock) { register slock_t _res = 0; __asm__ __volatile__( swpb%0, %0, [%2]\n : +r(_res), +m(*lock) : r(lock) : memory); Assert(_res == 1); // lock should be held by us } 2) Afaics it's also wrong for sparc on linux (and probably the BSDs) where relaxed memory ordering is used. 3) Also looks wrong on mips which needs a full memory barrier. You're mixing apples and oranges again. If the existing definitions aren't CPU barriers, they're already broken, and we should fix that and back-patch. On the other hand, the API contract change making these into compiler barriers is a master-only change. I think for purposes of this patch we should assume that the existing code is sufficient to prevent CPU reordering and just focus on fixing
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
At 2014-06-30 09:39:29 -0400, sfr...@snowman.net wrote: I certainly don't feel like it's the solution which extension authors are looking for and will be happy with I don't know if there are any other extension authors involved in this discussion, but I'm not shedding any tears over the idea. (That may be because I see operational compatibility with 9.[234] as a major plus, not a minor footnote.) As Tom would say, I think you just moved the goalposts into the next county. (And they're not even the same distance apart any more. ;-) That's fine- but don't push something in which will make it difficult to add these capabilities later I've been trying to understand why a pgaudit extension (which already exists) will make it difficult to add a hypothetical GRANT AUDIT ON goalpost TO referee syntax later. About the only thing I've come up with is people complaining about having to learn the new syntax when they were used to the old one. Surely that's not the sort of thing you mean? (You've mentioned pg_upgrade and backwards compatibility too, and I don't really understand those either.) -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS Design
On Mon, Jun 30, 2014 at 9:42 AM, Stephen Frost sfr...@snowman.net wrote: I'm not a fan of the EXEMPT approach.. Just out of curiosity, why not? I don't see it as really solving the flexibility need and it feels quite a bit more complicated to reason about. Would someone who is EXEMPT from one policy on a given table still have other policies on that table applied to them? Yes; otherwise, EXEMPT couldn't be granted by non-superusers, and the whole point of that proposal was to come up with something that would be clearly safe for ordinary users to use. Would a user be able to be EXEMPT from multiple policies? Yes, clearly. It would be a privilege on the policy object, so different objects can have different privileges. I feel like that's what you're suggesting with this approach, otherwise I don't see it as really different from the 'DIRECT SELECT' privilege discussed previously.. Right. If you took that away, it wouldn't be different. The number of possible approaches here has expanded beyond what I can keep in my head; I'm assuming you are planning to think this over and propose something comprehensive, or maybe Dean or someone else will do that. But I'm not sure that all the approaches proposed would make it safe for non-superusers to use RLS, and I think it would be good if they could. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
Abhijit, * Abhijit Menon-Sen (a...@2ndquadrant.com) wrote: At 2014-06-30 09:39:29 -0400, sfr...@snowman.net wrote: I certainly don't feel like it's the solution which extension authors are looking for and will be happy with I don't know if there are any other extension authors involved in this discussion, but I'm not shedding any tears over the idea. (That may be because I see operational compatibility with 9.[234] as a major plus, not a minor footnote.) We're certainly not going to include this in the -contrib packages for 9.4-or-earlier, so I've not really been thinking about those concerns, no. Indeed, if you want that to be supported by packagers, they'd probably be happier with it being an independent extension distributed through pgxn or similar.. Having an extension for 9.4-and-earlier which isn't in -contrib then be moved to -contrib for 9.5 is at least an annoyance when it comes to packaging. That's fine- but don't push something in which will make it difficult to add these capabilities later I've been trying to understand why a pgaudit extension (which already exists) will make it difficult to add a hypothetical GRANT AUDIT ON goalpost TO referee syntax later. About the only thing I've come up with is people complaining about having to learn the new syntax when they were used to the old one. I do think people will complain a bit about such a change, but I expect that to be on the level of grumblings rather than real issues. Surely that's not the sort of thing you mean? (You've mentioned pg_upgrade and backwards compatibility too, and I don't really understand those either.) Where would you store the information about the GRANTs? In the reloptions too? Or would you have to write pg_upgrade to detect if pgaudit had been installed and had played with the reloptions field to then extract out the values from it into proper columns or an actual table, not just for grants but for any other custom reloptions which were used? Were pgaudit to grow any tables, functions, etc, we'd have to address how those would be handled by pg_upgrade also. I don't think that's an issue currently, but we'd have to be darn careful to make sure it doesn't happen or we end up with the same upgrade issues hstore has. We've absolutely had push-back regarding breaking things for people who have installed extensions from -contrib by moving those things into core without a very clearly defined way to *not* break them. There have been proposals in the past to explicitly allocate/reserve OIDs to address this issue but those haven't been successful. I'm not saying that I know it's going to be impossible- I'm asking that we consider how this might move into core properly in a way that will make it possible to pg_upgrade. Part of that does include considering what the design of the core solution will offer and what feature set it will have.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] RLS Design
* Robert Haas (robertmh...@gmail.com) wrote: On Mon, Jun 30, 2014 at 9:42 AM, Stephen Frost sfr...@snowman.net wrote: I don't see it as really solving the flexibility need and it feels quite a bit more complicated to reason about. Would someone who is EXEMPT from one policy on a given table still have other policies on that table applied to them? Yes; otherwise, EXEMPT couldn't be granted by non-superusers, and the whole point of that proposal was to come up with something that would be clearly safe for ordinary users to use. I'm confused on this part- granting EXEMPT and/or DIRECT SELECT would definitely need to be supported by a non-superuser, though someone who had the appropriate rights on the object involved (either the policy or the table, depending on where we feel that definition should go). Would a user be able to be EXEMPT from multiple policies? Yes, clearly. It would be a privilege on the policy object, so different objects can have different privileges. Ok.. then I'm not entirely sure how this is different from Dean's proposal except that it's a way of defining the inverse, which we don't do anywhere else in the system today.. I feel like that's what you're suggesting with this approach, otherwise I don't see it as really different from the 'DIRECT SELECT' privilege discussed previously.. Right. If you took that away, it wouldn't be different. Ok. The number of possible approaches here has expanded beyond what I can keep in my head; I'm assuming you are planning to think this over and propose something comprehensive, or maybe Dean or someone else will do that. But I'm not sure that all the approaches proposed would make it safe for non-superusers to use RLS, and I think it would be good if they could. I've been thinking about it quite a bit over the past few days (weeks?) and trying to continue to outline the proposals as they've changed. I'll try and work up another comprehensive email which covers the options currently under discussion as I understand them. Allowing non-superuser to use RLS is absolutely key to this in any case- it'd be great if you could voice any specific concerns you see there. We've already been working through the GUCs previously discussed, as I feel those will be necessary for any of these approaches (in particular the bypass RLS-or-error GUC which pg_dump will enable by default). Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
On Mon, Jun 30, 2014 at 9:39 AM, Stephen Frost sfr...@snowman.net wrote: I think the fact that pgaudit does X and you think it should do Y is a perfect example of why we're nowhere close to being ready to push anything into core. We may very well want to do that someday, but not yet. That's fine- but don't push something in which will make it difficult to add these capabilities later (and, to be clear, I'm not asking out of pipe dreams and wishes but rather after having very specific discussions with users and reviewing documents like NIST 800-53, which is publically available for anyone to peruse). I don't think that's a valid objection. If we someday have auditing in core, and if it subsumes what pgaudit does, then whatever interfaces pgaudit implements can be replaced with wrappers around the core functionality, just as we did for text search. But personally, I think this patch deserves to be reviewed on its own merits, and not the extent to which it satisfies your requirements, or those of NIST 800-53. As I said before, I think auditing is a complicated topic and there's no guarantee that one solution will be right for everyone. As long as we keep those solutions out of core, there's no reason that multiple solutions can't coexist; people can pick the one that best meets their requirements. As soon as we start talking about something putting into core, the bar is a lot higher, because we're not going to put two auditing solutions into core, so if we do put one in, it had better be the right thing for everybody. I don't even think we should be considering that at this point; I think the interesting (and under-discussed) question on this thread is whether it even makes sense to put this into contrib. That means we need some review of the patch for what it is, which there hasn't been much of, yet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
On 12 May 2014 17:14, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 05/12/2014 06:26 PM, Andres Freund wrote: With the new commit-in-progress status in clog, we won't need the sub-committed clog status anymore. The commit-in-progress status will achieve the same thing. Wouldn't that cause many spurious waits? Because commit-in-progress needs to be waited on, but a sub-committed xact surely not? Ah, no. Even today, a subxid isn't marked as sub-committed, until you commit the top-level transaction. The sub-commit state is a very transient state during the commit process, used to make the commit of the sub-transactions and the top-level transaction appear atomic. The commit-in-progress state would be a similarly short-lived state. You mark the subxids and the top xid as commit-in-progress just before the XLogInsert() of the commit record, and you replace them with the real LSNs right after XLogInsert(). My understanding is that we aim to speed up the rate of Snapshot reads. Which may make it feasible to store autonomous transactions in shared memory, hopefully DSM. The above mechanism sounds like it might slow down transaction commit. Could we prototype that and measure the speed? More generally, do we have any explicit performance goals or acceptance criteria for this patch? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On Sat, Jun 28, 2014 at 10:35 AM, Kevin Grittner kgri...@ymail.com wrote: David Fetter da...@fetter.org wrote: On Sat, Jun 21, 2014 at 11:06:26AM -0700, Kevin Grittner wrote: Here is v2. I've taken the liberty of making an extension that uses this. Preliminary tests indicate a 10x performance improvement over the user-space hack I did that's similar in functionality. Wow, this goes well beyond what I expected for a review! Thanks! As I said in an earlier post, I think that this is best committed as a series of patches, one for the core portion and one for each PL which implements the ability to use the transition (delta) relations in AFTER triggers. Your extension covers the C trigger angle, and it seems to me to be worth committing to contrib as a sample of how to use this feature in C. It is very encouraging that you were able to use this without touching what I did in core, and that it runs 10x faster than the alternatives before the patch. Because this review advances the patch so far, it may be feasible to get it committed in this CF. I'll see what is needed to get there and maybe have a patch toward that end in a few days. The minimum that would require, IMV, is a plpgsql implementation, moving the new pg_trigger columns to the variable portion of the record so they can be null capable, more docs, and regression tests. Not to rain on your parade, but this patch hasn't really had a serious code review yet. Performance testing is good, but it's not the same thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spinlocks and compiler/memory barriers
On 2014-06-30 10:05:44 -0400, Robert Haas wrote: On Mon, Jun 30, 2014 at 9:26 AM, Andres Freund and...@2ndquadrant.com wrote: I think it continues in the tradition of making s_lock.h ever harder to follow. But it's still better than what we have now from a correctness perspective. Well, as you and I have discussed before, someday we probably need to get rid of s_lock.h or at least refactor it heavily, but let's do one thing at a time. Well, that task gets harder by making it more complex... (will answer separately about sun studio) Solaris seems to run with TSO enabled for SPARC (whereas linux uses RMO, relaxed memory ordering), so it's probably fine to just use the compiler barrier. If it isn't, that's a change that has nothing to do with making spinlock operations compiler barriers and everything to do with fixing a bug in the existing implementation. Well, it actually has. If we start to remove volatiles from critical sections the compiler will schedule stores closer to the spinlock release and reads more freely - making externally visible ordering violations more likely. Especially on itanic. So, I agree that we need to fix unlocks that aren't sufficiently strong memory barriers, but it does get more urgent by not relying on volatile inside criticial sections anymore. Basically, every platform we support today has a spinlock implementation that is supposed to prevent CPU reordering across the acquire and release operations but might not prevent compiler reordering. IOW, S_LOCK() should be a CPU acquire fence, and S_UNLOCK() should be a CPU release fence. Well, I think s_lock.h comes woefully short of that goal on several platforms. Scary. Now, we want to make these operations compiler fences as well, and AIUI your proposal for that is to make NEW_S_UNLOCK(lock) = out of line function call + S_LOCK(dummy) + S_UNLOCK(dummy) + S_UNLOCK(lock). My proposal was to use barrier.h provided barriers as long as it provides a native implementation. If neither a compiler nor a memory barrier is provided my proposal was to fall back to the memory barrier emulation we already have, but move it out of line (to avoid reordering issues). The reason for doing so is that the release *has* to be a release barrier. To avoid issues with recursion the S_UNLOCK() used in the out of line memory barrier implementation used a modified S_UNLOCK that doesn't include a barrier. My proposal is to make NEW_S_UNLOCK(lock) = out of line function call + S_UNLOCK(lock), which I think is strictly better. There's zip to guarantee that adding S_LOCK(dummy) + S_UNLOCK(dummy) is doing anything we want in there, and it's definitely going to be worse for performance. Uh? At the very least doing a S_LOCK() guarantees that we're doing some sort of memory barrier, which your's doesn't at all. That'd actually fix the majority of platforms with borked release semantics because pretty much all tas() implementations are full barriers. The other thing that I don't like about your proposal has to do with the fact that the support matrix for barrier.h and s_lock.h are not identical. If s_lock.h is confusing to you today, making it further intertwined with barrier.h is not going to make things better; at least, that confuses the crap out of me. Uh. It actually *removed* the confusing edge of the dependency. It's rather confusing that memory barriers rely spinlocks and not the other way round. I think we actually should do that unconditionally, independent of any other changes. The only reasons barrier.h includes s_lock.h is that dummy_spinlock is declared and that the memory barrier is declared inline. 1) Afaics ARM will fall back to this for older gccs - and ARM is weakly ordered. I think I'd just also use swpb to release the lock. Something like #define S_INIT_LOCK(lock) (*(lock)) = 0); #define S_UNLOCK s_unlock static __inline__ void s_unlock(volatile slock_t *lock) { register slock_t _res = 0; __asm__ __volatile__( swpb%0, %0, [%2]\n : +r(_res), +m(*lock) : r(lock) : memory); Assert(_res == 1); // lock should be held by us } 2) Afaics it's also wrong for sparc on linux (and probably the BSDs) where relaxed memory ordering is used. 3) Also looks wrong on mips which needs a full memory barrier. You're mixing apples and oranges again. No, I'm not. If the existing definitions aren't CPU barriers, they're already broken, and we should fix that and back-patch. Yea, and that gets harder if we do it after master has changed incompatibly. And, as explained above, we need to fix S_UNLOCK() to be a memory barrier before we can remove volatiles - which is the goal of your patch, no? On the other hand, the API contract change making these into compiler barriers is a master-only change. I think for purposes of this patch we should assume
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
* Robert Haas (robertmh...@gmail.com) wrote: On Mon, Jun 30, 2014 at 9:39 AM, Stephen Frost sfr...@snowman.net wrote: I think the fact that pgaudit does X and you think it should do Y is a perfect example of why we're nowhere close to being ready to push anything into core. We may very well want to do that someday, but not yet. That's fine- but don't push something in which will make it difficult to add these capabilities later (and, to be clear, I'm not asking out of pipe dreams and wishes but rather after having very specific discussions with users and reviewing documents like NIST 800-53, which is publically available for anyone to peruse). I don't think that's a valid objection. If we someday have auditing in core, and if it subsumes what pgaudit does, then whatever interfaces pgaudit implements can be replaced with wrappers around the core functionality, just as we did for text search. I actually doubt that we'd be willing to do what we did for text search again. Perhaps I'm wrong, which would be great, but I doubt it. But personally, I think this patch deserves to be reviewed on its own merits, and not the extent to which it satisfies your requirements, or those of NIST 800-53. eh? The focus of this patch is to add auditing to PG and having very clearly drawn auditing requirements of a very large customer isn't relevant? I don't follow that logic at all. In fact, I thought the point of pgaudit was specifically to help address the issues which are being raised from this section of our user base (perhaps not those who follow NIST, but at least those organizations with similar interests..). As I said before, I think auditing is a complicated topic and there's no guarantee that one solution will be right for everyone. As long as we keep those solutions out of core, there's no reason that multiple solutions can't coexist; people can pick the one that best meets their requirements. As soon as we start talking about something putting into core, the bar is a lot higher, because we're not going to put two auditing solutions into core, so if we do put one in, it had better be the right thing for everybody. I agree that auditing is a complicated topic. Given the dearth of existing auditing solutions, I seriously doubt we're going to actually see more than one arise. I'd much rather focus on an in-core solution which allows the mechanism by which auditing logs are produced by through an extension (perhaps through hooks in the right places) but the actual definition of *what* gets audited to be defined through SQL with a permissions system that works with the rest of the system. What I'm getting at is this- sure, different users will want to have their audit logs be in different formats (JSON, XML, CSV, whatever), or sent via different means (logged to files, sent to syslog, forwarded on to a RabbitMQ system, splunk, etc), but the definition of what gets audited and allowing individuals other than the owners to be able to modify the auditing is much more clear-cut to me as there's only so many different objects in the system and only so many operations which can be performed on them.. This strikes me as similar to security labels, as was discussed previously. I don't even think we should be considering that at this point; I think the interesting (and under-discussed) question on this thread is whether it even makes sense to put this into contrib. That means we need some review of the patch for what it is, which there hasn't been much of, yet. Evidently I wasn't very clear on this point but my concerns are about it going into contrib, which is what's been proposed, because I worry about an eventual upgrade path from contrib to an in-core solution. I'm also a bit nervous that a contrib solution might appear 'good enough' to enough committers that the push-back on an in-core solution would lead to us never having one. These concerns are, perhaps, not very palatable but I feel they're real enough to bring up. contrib works quite well for stand-alone sets of functions which don't, and never will, have any in-core direct grammar or catalog support. While pgaudit doesn't have those today, I'd like to see *auditing* in PG, eventually, which has both a real grammar and a catalog definition. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] delta relations in AFTER triggers
On Mon, Jun 30, 2014 at 11:03:06AM -0400, Robert Haas wrote: On Sat, Jun 28, 2014 at 10:35 AM, Kevin Grittner kgri...@ymail.com wrote: David Fetter da...@fetter.org wrote: On Sat, Jun 21, 2014 at 11:06:26AM -0700, Kevin Grittner wrote: Here is v2. I've taken the liberty of making an extension that uses this. Preliminary tests indicate a 10x performance improvement over the user-space hack I did that's similar in functionality. Wow, this goes well beyond what I expected for a review! Thanks! As I said in an earlier post, I think that this is best committed as a series of patches, one for the core portion and one for each PL which implements the ability to use the transition (delta) relations in AFTER triggers. Your extension covers the C trigger angle, and it seems to me to be worth committing to contrib as a sample of how to use this feature in C. It is very encouraging that you were able to use this without touching what I did in core, and that it runs 10x faster than the alternatives before the patch. Because this review advances the patch so far, it may be feasible to get it committed in this CF. I'll see what is needed to get there and maybe have a patch toward that end in a few days. The minimum that would require, IMV, is a plpgsql implementation, moving the new pg_trigger columns to the variable portion of the record so they can be null capable, more docs, and regression tests. Not to rain on your parade, but this patch hasn't really had a serious code review yet. Performance testing is good, but it's not the same thing. Happy to help with that, too. What I wanted to start with is whether there was even rudimentary functionality, which I established by writing that extension. I happened to notice, basically as a sanity check, that doing this via tuplestores happened, at least in one case, to be quicker than doing it in user space with temp tables. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Escaping from blocked send() reprised.
On Mon, Jun 30, 2014 at 4:13 AM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Hello, I have received inquiries related to blocked communication several times for these weeks with different symptoms. Then I found this message from archive, http://postgresql.1045698.n5.nabble.com/Escaping-a-blocked-sendto-syscall-without-causing-a-restart-td5740855.html Subject: Escaping a blocked sendto() syscall without causing a restart Mr. Tom Lane gave a comment replying it, Offhand it looks to me like most signals would kick the backend off the send() call ... but it would loop right back and try again. See internal_flush() in pqcomm.c. (If you're using SSL, this diagnosis may or may not apply.) We can't do anything except repeat the send attempt if the client connection is to be kept in a sane state. (snipped) And I'm not at all sure if we could get it to work in SSL mode... That's true for timeouts that should continue the connection, say, statement_timeout, but focusing on intentional backend termination, I think it does no harm to break it up abruptly, even if it was on SSL. On the other hand it seems still preferable to keep a connection when not blocked. The following expression would detects such a blocking state at just before next send(2) after the previous try exited by signals. (ProcDiePending select(1, NULL, fd, NULL, '1 sec') == 0) Finally, pg_terminate_backend() works even when send is blocked for both SSL and non-SSL connections after 1 second delay with this patch (break_socket_blocking_on_termination_v1.patch). Nevetheless, of course statement_timeout cannot become effective by this method since it breaks the consistency in the client protocol. It needs change in client protocol to have out of band mechanism or something, maybe. Any suggestions? You should probably add your patch here, so it doesn't get forgotten about: https://commitfest.postgresql.org/action/commitfest_view/open We're focused on reviewing patches for the current CommitFest, so your patch might not get attention right away. A couple of general thoughts on this topic: 1. I think it's the case that there are platforms around where a signal won't cause send() to return EINTR and I'd be entirely unsurprised if SSL_write() doesn't necessarily return EINTR in that case. I'm not sure what, if anything, we can do about that. 2. I think it would be reasonable to try to kill off the connection without notifying the client if we're unable to send the data to the client in a reasonable period of time. But I'm unsure what a reasonable period of time means. This patch would basically do it after no delay at all, which seems like it might be too aggressive. However, I'm not sure. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Array of composite types returned from python
Just writing to check in. I haven't done anything to look into allowing arrays of composites for input to PL/Python function. I made the submitted modification for a specific project that I'm working on that involves python code that returns data structures. I also have no idea about a more efficient way to convert composite elements. -Ed -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: Sunday, June 29, 2014 4:54 PM To: Abhijit Menon-Sen Cc: Sim Zacks; Behn, Edward (EBEHN); pgsql-hackers@postgresql.org; Peter Eisentraut Subject: Re: [HACKERS] Array of composite types returned from python Abhijit Menon-Sen a...@2ndquadrant.com writes: 3) This is such a simple change with no new infrastructure code (PLyObject_ToComposite already exists). Can you think of a reason why this wasn't done until now? Was it a simple miss or purposefully excluded? This is not an authoritative answer: I think the infrastructure was originally missing, but was later added in #bc411f25 for OUT parameters. Perhaps it was overlooked at the time that the same code would suffice for this earlier-missing case. (I've Cc:ed Peter E. in case he has any comments.) I think the patch is ready for committer. I took a quick look at this; not really a review either, but I have a couple comments. 1. While I think the patch does what it intends to, it's a bit distressing that it will invoke the information lookups in PLyObject_ToComposite over again for *each element* of the array. We probably ought to quantify that overhead to see if it's bad enough that we need to do something about improving caching, as speculated in the comment in PLyObject_ToComposite. 2. I wonder whether the no-composites restriction in plpy.prepare (see lines 133ff in plpy_spi.c) could be removed as easily. regards, tom lane smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Spinlocks and compiler/memory barriers
On Mon, Jun 30, 2014 at 11:17 AM, Andres Freund and...@2ndquadrant.com wrote: Solaris seems to run with TSO enabled for SPARC (whereas linux uses RMO, relaxed memory ordering), so it's probably fine to just use the compiler barrier. If it isn't, that's a change that has nothing to do with making spinlock operations compiler barriers and everything to do with fixing a bug in the existing implementation. Well, it actually has. If we start to remove volatiles from critical sections the compiler will schedule stores closer to the spinlock release and reads more freely - making externally visible ordering violations more likely. Especially on itanic. Granted. Now, we want to make these operations compiler fences as well, and AIUI your proposal for that is to make NEW_S_UNLOCK(lock) = out of line function call + S_LOCK(dummy) + S_UNLOCK(dummy) + S_UNLOCK(lock). My proposal was to use barrier.h provided barriers as long as it provides a native implementation. If neither a compiler nor a memory barrier is provided my proposal was to fall back to the memory barrier emulation we already have, but move it out of line (to avoid reordering issues). The reason for doing so is that the release *has* to be a release barrier. What do you mean by the memory barrier emulation we already have? The only memory barrier emulation we have, to my knowledge, is a spinlock acquire-release cycle. But trying to use a spinlock acquire-release to shore up problems with the spinlock release implementation makes my head explode. On the other hand, the API contract change making these into compiler barriers is a master-only change. I think for purposes of this patch we should assume that the existing code is sufficient to prevent CPU reordering and just focus on fixing compiler ordering problems. Whatever needs to change on the CPU-reordering end of things is a separate commit. I'm not arguing against that it should be a separate commit. Just that the proper memory barrier bit has to come first. I feel like you're speaking somewhat imprecisely in an area where very precise speech is needed to avoid confusion. If you're saying that you think we should fix the S_UNLOCK() implementations that fail to prevent CPU-ordering before we change all the S_UNLOCK() implementations to prevent compiler-reordering, then that is fine with me; otherwise, I don't understand what you're getting at here. Do you want to propose a patch that does *only* that first part before we go further here? Do you want me to try to put together such a patch based on the details mentioned on this thread? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.4 logical replication - walsender keepalive replies
In 9.4 we've the below block of code to walsender.c as /* * We only send regular messages to the client for full decoded * transactions, but a synchronous replication and walsender shutdown * possibly are waiting for a later location. So we send pings * containing the flush location every now and then. */ if (MyWalSnd-flush sentPtr !waiting_for_ping_response) { WalSndKeepalive(true); waiting_for_ping_response = true; } I am finding that my logical replication reader is spending a tremendous amount of time sending feedback to the server because a keep alive reply was requested. My flush pointer is smaller than sendPtr, which I see as the normal case (The client hasn't confirmed all the wal it has been sent). My client queues the records it receives and only confirms when actually processes the record. So the sequence looks something like Server Sends LSN 0/1000 Server Sends LSN 0/2000 Server Sends LSN 0/3000 Client confirms LSN 0/2000 I don't see why all these keep alive replies are needed in this case (the timeout value is bumped way up, it's the above block that is triggering the reply request not something related to timeout) Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] better atomics - v0.5
On Sat, Jun 28, 2014 at 4:25 AM, Andres Freund and...@2ndquadrant.com wrote: To make pin/unpin et al safe without acquiring the spinlock the pincount and the flags had to be moved into one variable so that when incrementing the pincount you automatically get the flagbits back. If it's currently valid all is good, otherwise the spinlock needs to be acquired. But for that to work you need to set flags while the spinlock is held. Possibly you can come up with ways to avoid that, but I am pretty damn sure that the code will be less robust by forbidding atomics inside a critical section, not the contrary. It's a good idea to avoid it, but not at all costs. You might be right, but I'm not convinced. Why? Anything I can do to convince you of this? Note that other users of atomics (e.g. the linux kernel) widely use atomics inside spinlock protected regions. The Linux kernel isn't a great example, because they can ensure that they aren't preempted while holding the spinlock. We can't, and that's a principle part of my concern here. More broadly, it doesn't seem consistent. I think that other projects also sometimes write code that acquires a spinlock while holding another spinlock, and we don't do that; in fact, we've elevated that to a principle, regardless of whether it performs well in practice. In some cases, the CPU instruction that we issue to acquire that spinlock might be the exact same instruction we'd issue to manipulate an atomic variable. I don't see how it can be right to say that a spinlock-protected critical section is allowed to manipulate an atomic variable with cmpxchg, but not allowed to acquire another spinlock with cmpxchg. Either acquiring the second spinlock is OK, in which case our current coding rule is wrong, or acquiring the atomic variable is not OK, either. I don't see how we can have that both ways. What I'm basically afraid of is that this will work fine in many cases but have really ugly failure modes. That's pretty much what happens with spinlocks already - the overhead is insignificant at low levels of contention but as the spinlock starts to become contended the CPUs all fight over the cache line and performance goes to pot. ISTM that making the spinlock critical section significantly longer by putting atomic ops in there could appear to win in some cases while being very bad in others. Well, I'm not saying it's something I suggest doing all the time. But if using an atomic op in the slow path allows you to remove the spinlock from 99% of the cases I don't see it having a significant impact. In most scenarios (where atomics aren't emulated, i.e. platforms we expect to used in heavily concurrent cases) the spinlock and the atomic will be on the same cacheline making stalls much less likely. And this again is my point: why can't we make the same argument about two spinlocks situated on the same cache line? I don't have a bit of trouble believing that doing the same thing with a couple of spinlocks could sometimes work out well, too, but Tom is adamantly opposed to that. I don't want to just make an end run around that issue without understanding whether he has a valid point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Better partial index-only scans
Joshua Yanovski pythones...@gmail.com writes: Proof of concept initial patch for enabling index only scans for partial indices even when an attribute is not in the target list, as long as it is only used in restriction clauses that can be proved by the index predicate. This also works for index quals, though they still can't be used in the target list. However, this patch may be inefficient since it duplicates effort that is currently delayed until after the best plan is chosen. I took a quick look at this. I think it's logically incorrect to exclude Vars used only in index quals from the set that the index has to return, since you can't know at this stage whether the index is lossy (ie, might report xs_recheck = TRUE at runtime). While this is moot for btree indexes, it's not moot for SPGiST indexes which also support index-only scans today. In principle we could extend the AM and opclass API and demand that AMs tell us whether they might return xs_recheck = TRUE. However, I'm pretty hesitant to change the opclass APIs for this purpose; it'd likely break third-party code. Moreover, an advantage of confining the patch to considering only partial-index quals is that you could skip all the added work for non-partial indexes, which would probably largely solve the added-planning-time problem. It also includes a minor fix in the same code in createplan.c to make sure we're explicitly comparing an empty list to NIL, but I can take that out if that's not considered in scope. I don't think the existing code is poor style there. There are certainly hundreds of other cases where we treat != NULL or != NIL as implicit (though of course also other places where we don't). ... as I see it performance could improve in any combination of five ways: * Improve the performance of determining which clauses can't be discarded (e.g. precompute some information about equivalence classes for index predicates, mess around with the order in which we check the clauses to make it fail faster, switch to real union-find data structures for equivalence classes). This is certainly possible, though rather open-ended, and it's not clear that it really fixes the objection (ie, if you speed these things up then you still have a performance discrepancy from adding the tests earlier). * Take advantage of work we do here to speed things up elsewhere (e.g. if this does get chosen as the best plan we don't need to recompute the same information in create_indexscan_plan). That would likely be worth doing if we do this, but it will only buy back a small part of the cost, since the whole problem here is we'd be doing this work for all indexes and not only the eventually selected one. * Delay determining whether to use an index scan or index only scan until after cost analysis somehow. I'm not sure exactly what this would entail. That seems impossible to me, since the whole point of an index-only scan is that it's a lot cheaper than a regular one. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spinlocks and compiler/memory barriers
On 2014-06-30 11:38:31 -0400, Robert Haas wrote: On Mon, Jun 30, 2014 at 11:17 AM, Andres Freund and...@2ndquadrant.com wrote: Now, we want to make these operations compiler fences as well, and AIUI your proposal for that is to make NEW_S_UNLOCK(lock) = out of line function call + S_LOCK(dummy) + S_UNLOCK(dummy) + S_UNLOCK(lock). My proposal was to use barrier.h provided barriers as long as it provides a native implementation. If neither a compiler nor a memory barrier is provided my proposal was to fall back to the memory barrier emulation we already have, but move it out of line (to avoid reordering issues). The reason for doing so is that the release *has* to be a release barrier. What do you mean by the memory barrier emulation we already have? The only memory barrier emulation we have, to my knowledge, is a spinlock acquire-release cycle. Yes. Unrelated, but why haven't we defined it as if (TAS(dummy)) S_UNLOCK(dummy);? That's just about as strong and less of a performance drain? Hm, I guess platforms that do an unlocked test first would be problematic :/ But trying to use a spinlock acquire-release to shore up problems with the spinlock release implementation makes my head explode. Well, it actually makes some sense. Nearly any TAS() implementation is going to have some memory barrier semantics - so using a TAS() as fallback makes sense. That's why we're relying on it for use in memory barrier emulation after all. Also note that barrier.h assumes that a S_LOCK/S_UNLOCK acts as a compiler barrier - which really isn't guaranteed by the current contract of s_lock.h. Although the tas() implementations look safe. On the other hand, the API contract change making these into compiler barriers is a master-only change. I think for purposes of this patch we should assume that the existing code is sufficient to prevent CPU reordering and just focus on fixing compiler ordering problems. Whatever needs to change on the CPU-reordering end of things is a separate commit. I'm not arguing against that it should be a separate commit. Just that the proper memory barrier bit has to come first. I feel like you're speaking somewhat imprecisely in an area where very precise speech is needed to avoid confusion. If you're saying that you think we should fix the S_UNLOCK() implementations that fail to prevent CPU-ordering before we change all the S_UNLOCK() implementations to prevent compiler-reordering, then that is fine with me; Yes, that's what I think is needed. Do you want to propose a patch that does *only* that first part before we go further here? Do you want me to try to put together such a patch based on the details mentioned on this thread? I'm fine with either - we're both going to flying pretty blind :/. I think the way S_UNLOCK's release memory barrier semantics are fixed might influence the way the compiler barrier issue can be solved. Fixing the release semantics will involve going through all tas() implementations and see whether the default release semantics are ok. The ones with broken semantics will need to grow their own S_UNLOCK. I am wondering if that commit shouldn't just remove the default S_UNLOCK and move it to the tas() implementation sites. Please don't forget that I started this thread because I found the current implementation lacking because s_lock neither has sane memory release nor compiler release semantics... So it's not surprising that I want to solve both :) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
David Fetter da...@fetter.org wrote: It's missing a few pieces like surfacing transition table names. I'll work on those. Also, it's not clear to me how to access the pre- and post- relations at the same time, this being necessary for many use cases. I guess I need to think more about how that would be done. If you're going to do any work on the C extension, please start from the attached. I renamed it to something which seemed more meaningful (to me, at least), and cleaned up some cosmetic issues. The substance is the same. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Companydiff --git a/contrib/Makefile b/contrib/Makefile index b37d0dd..bcd7c28 100644 --- a/contrib/Makefile +++ b/contrib/Makefile @@ -48,6 +48,7 @@ SUBDIRS = \ postgres_fdw \ seg \ spi \ + transition_tables \ tablefunc \ tcn \ test_decoding \ diff --git a/contrib/transition_tables/.gitignore b/contrib/transition_tables/.gitignore new file mode 100644 index 000..5dcb3ff --- /dev/null +++ b/contrib/transition_tables/.gitignore @@ -0,0 +1,4 @@ +# Generated subdirectories +/log/ +/results/ +/tmp_check/ diff --git a/contrib/transition_tables/Makefile b/contrib/transition_tables/Makefile new file mode 100644 index 000..8a75350 --- /dev/null +++ b/contrib/transition_tables/Makefile @@ -0,0 +1,20 @@ +# contrib/transition_tables/Makefile + +MODULE_big = transition_tables +OBJS = transition_tables.o + +EXTENSION = transition_tables +DATA = transition_tables--1.0.sql + +REGRESS = transition_tables + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = contrib/transition_tables +top_builddir = ../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/contrib/transition_tables/expected/transition_tables.out b/contrib/transition_tables/expected/transition_tables.out new file mode 100644 index 000..7dfed35 --- /dev/null +++ b/contrib/transition_tables/expected/transition_tables.out @@ -0,0 +1,18 @@ +CREATE EXTENSION transition_tables; +CREATE TABLE IF NOT EXISTS e( +i INT +); +CREATE TRIGGER statement_dml_e +AFTER INSERT OR UPDATE OR DELETE ON e +REFERENCING +OLD TABLE AS old_e +NEW TABLE AS new_e +FOR EACH STATEMENT +EXECUTE PROCEDURE transition_tables(); +INSERT INTO e(i) +SELECT * FROM generate_series(1,1); +NOTICE: Total change: 50005000 +UPDATE e SET i=i+1; +NOTICE: Total change: 1 +DELETE FROM e WHERE i 5000; +NOTICE: Total change: -12497499 diff --git a/contrib/transition_tables/sql/transition_tables.sql b/contrib/transition_tables/sql/transition_tables.sql new file mode 100644 index 000..04f8bd7 --- /dev/null +++ b/contrib/transition_tables/sql/transition_tables.sql @@ -0,0 +1,20 @@ +CREATE EXTENSION transition_tables; + +CREATE TABLE IF NOT EXISTS e( +i INT +); + +CREATE TRIGGER statement_dml_e +AFTER INSERT OR UPDATE OR DELETE ON e +REFERENCING +OLD TABLE AS old_e +NEW TABLE AS new_e +FOR EACH STATEMENT +EXECUTE PROCEDURE transition_tables(); + +INSERT INTO e(i) +SELECT * FROM generate_series(1,1); + +UPDATE e SET i=i+1; + +DELETE FROM e WHERE i 5000; diff --git a/contrib/transition_tables/transition_tables--1.0.sql b/contrib/transition_tables/transition_tables--1.0.sql new file mode 100644 index 000..6c1625e --- /dev/null +++ b/contrib/transition_tables/transition_tables--1.0.sql @@ -0,0 +1,10 @@ +/* contrib/transition_tables--1.0.sql */ + + +-- Complain if script is sourced in psql, rather than via CREATE EXTENSION. +\echo Use CREATE EXTENSION transition_tables to load this file. \quit + +CREATE FUNCTION transition_tables() +RETURNS pg_catalog.trigger +AS 'MODULE_PATHNAME' +LANGUAGE C; diff --git a/contrib/transition_tables/transition_tables.c b/contrib/transition_tables/transition_tables.c new file mode 100644 index 000..2b9264c --- /dev/null +++ b/contrib/transition_tables/transition_tables.c @@ -0,0 +1,144 @@ +/*- + * + * transition_tables.c + * sample of accessing trigger transition tables from a C trigger + * + * Portions Copyright (c) 2011-2014, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * IDENTIFICATION + * contrib/transition_tables.c + * + *- + */ + +#include postgres.h + +#include commands/trigger.h +#include utils/rel.h + +PG_MODULE_MAGIC; + +PG_FUNCTION_INFO_V1(transition_tables); + +Datum +transition_tables(PG_FUNCTION_ARGS) +{ + TriggerData *trigdata = (TriggerData *) fcinfo-context; + TupleDesc tupdesc; + TupleTableSlot *slot; + Tuplestorestate *new_tuplestore; + Tuplestorestate *old_tuplestore; + int64 delta = 0; + + if (!CALLED_AS_TRIGGER(fcinfo)) + ereport(ERROR, +
[HACKERS] Does changing attribute order breaks something?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi! Are there any known issues when changing the attnums? like : airport=# update pg_attribute set attnum = 3 where attname = 'abc' and attrelid = 18328; UPDATE 1 airport=# update pg_attribute set attnum = 2 where attname = 'foo' and attrelid = 18328; UPDATE 1 Does something break or are there any other ideas to change the order of columns? regard geohas -BEGIN PGP SIGNATURE- Version: GnuPG v1 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJTsZJMAAoJEJFGMlQe7wR/WL4H/32LaOBR5jKL4kvmCNKeItIF g8EFPf4gJS9SleX0n6m7ctuKUkGQTkD6Ae7Azv0YyiMzJATgXlMHFUojj2JdyGar yvPF2pxEzTwLPkST0cOuC68k/6KxZBvpy5FBaSFNAI1mExOTyNpO8H7lk6zWxw34 Iw8IWdayBqStdZ4HVMVTp6Vq6ReWpRKf/ekcbTIp9EKpimz0RsLafO3b0J1epv6t w5VHCMLMuE8QtKuYaGkQ1+OwgKUiZQT/8vp4F4c5DIIYqpWWa4JjoNVmOA0leaVm I1IFhNlo59jnPIRNR5/mIzezeA2LxZhuY4EOX6+zLy3iuja7KbdO6wXt1zedWBY= =0KWv -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spinlocks and compiler/memory barriers
On Mon, Jun 30, 2014 at 12:20 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-06-30 11:38:31 -0400, Robert Haas wrote: On Mon, Jun 30, 2014 at 11:17 AM, Andres Freund and...@2ndquadrant.com wrote: Now, we want to make these operations compiler fences as well, and AIUI your proposal for that is to make NEW_S_UNLOCK(lock) = out of line function call + S_LOCK(dummy) + S_UNLOCK(dummy) + S_UNLOCK(lock). My proposal was to use barrier.h provided barriers as long as it provides a native implementation. If neither a compiler nor a memory barrier is provided my proposal was to fall back to the memory barrier emulation we already have, but move it out of line (to avoid reordering issues). The reason for doing so is that the release *has* to be a release barrier. What do you mean by the memory barrier emulation we already have? The only memory barrier emulation we have, to my knowledge, is a spinlock acquire-release cycle. Yes. Unrelated, but why haven't we defined it as if (TAS(dummy)) S_UNLOCK(dummy);? That's just about as strong and less of a performance drain? Hm, I guess platforms that do an unlocked test first would be problematic :/ Yes; also, there's no requirement for S_LOCK() to be defined in terms of TAS(), and thus no requirement for TAS() to exist at all. But trying to use a spinlock acquire-release to shore up problems with the spinlock release implementation makes my head explode. Well, it actually makes some sense. Nearly any TAS() implementation is going to have some memory barrier semantics - so using a TAS() as fallback makes sense. That's why we're relying on it for use in memory barrier emulation after all. As far as I know, all of our TAS() implementations prevent CPU reordering in the acquire direction. It is not clear that they provide CPU-reordering guarantees adequate for the release direction, even when paired with whatever S_UNLOCK() implementation they're mated with. And it's quite clear that many of them aren't adequate to prevent compiler-reordering in any case. Also note that barrier.h assumes that a S_LOCK/S_UNLOCK acts as a compiler barrier - which really isn't guaranteed by the current contract of s_lock.h. Although the tas() implementations look safe. Ugh, you're right. That should really be moved out-of-line. On the other hand, the API contract change making these into compiler barriers is a master-only change. I think for purposes of this patch we should assume that the existing code is sufficient to prevent CPU reordering and just focus on fixing compiler ordering problems. Whatever needs to change on the CPU-reordering end of things is a separate commit. I'm not arguing against that it should be a separate commit. Just that the proper memory barrier bit has to come first. I feel like you're speaking somewhat imprecisely in an area where very precise speech is needed to avoid confusion. If you're saying that you think we should fix the S_UNLOCK() implementations that fail to prevent CPU-ordering before we change all the S_UNLOCK() implementations to prevent compiler-reordering, then that is fine with me; Yes, that's what I think is needed. OK, let's do it that way then. Do you want to propose a patch that does *only* that first part before we go further here? Do you want me to try to put together such a patch based on the details mentioned on this thread? I'm fine with either - we're both going to flying pretty blind :/. I think the way S_UNLOCK's release memory barrier semantics are fixed might influence the way the compiler barrier issue can be solved. I think I'm starting to understand the terminological confusion: to me, a memory barrier means a fence against both the compiler and the CPU. I now think you're using it to mean a fence against the CPU, as distinct from a fence against the compiler. That had me really confused in some previous replies. Fixing the release semantics will involve going through all tas() implementations and see whether the default release semantics are ok. The ones with broken semantics will need to grow their own S_UNLOCK. I am wondering if that commit shouldn't just remove the default S_UNLOCK and move it to the tas() implementation sites. So, I think that here you are talking about CPU behavior rather than compiler behavior. Next paragraph is on that basis. The implementations that don't currently have their own S_UNLOCK() are i386, x86_64, Itanium, ARM without GCC atomics, S/390 zSeries, Sparc, Linux m68k, VAX, m32r, SuperH, non-Linux m68k, Univel CC, HP/UX non-GCC, Sun Studio, and WIN32_ONLY_COMPILER. Because most of those are older platforms, I'm betting that more of them than not are OK; I think we should confine ourselves to trying to fix the ones we're sure are wrong, which if I understand you correctly are ARM without GCC atomics, Sparc, and MIPS. I think it'd be better to just add copies of S_UNLOCK to those three
Re: [HACKERS] better atomics - v0.5
On 2014-06-30 12:15:06 -0400, Robert Haas wrote: More broadly, it doesn't seem consistent. I think that other projects also sometimes write code that acquires a spinlock while holding another spinlock, and we don't do that; in fact, we've elevated that to a principle, regardless of whether it performs well in practice. It's actually more than a principle: It's now required for correctness, because otherwise the semaphore based spinlock implementation will deadlock otherwise. In some cases, the CPU instruction that we issue to acquire that spinlock might be the exact same instruction we'd issue to manipulate an atomic variable. I don't see how it can be right to say that a spinlock-protected critical section is allowed to manipulate an atomic variable with cmpxchg, but not allowed to acquire another spinlock with cmpxchg. Either acquiring the second spinlock is OK, in which case our current coding rule is wrong, or acquiring the atomic variable is not OK, either. I don't see how we can have that both ways. The no nested spinlock thing used to be a guideline. One nobody had big problems with because it didn't prohibit solutions to problems. Since guidelines are more useful when simple it's no nested spinlocks!. Also those two cmpxchg's aren't the same: The CAS for spinlock acquiration will only succeed if the lock isn't acquired. If the lock holder sleeps you'll have to wait for it to wake up. In contrast to that CASs for lockfree (or lock-reduced) algorithms won't be blocked if the last manipulation was done by a backend that's now sleeping. It'll possibly loop once, get the new value, and retry the CAS. Yes, it can still take couple of iterations under heavy concurrency, but you don't have the problem that the lockholder goes to sleep. Now, you can argue that the spinlock based fallbacks make that difference moot, but I think that's just the price for an emulation fringe platforms will have to pay. They aren't platforms used under heavy concurrency. What I'm basically afraid of is that this will work fine in many cases but have really ugly failure modes. That's pretty much what happens with spinlocks already - the overhead is insignificant at low levels of contention but as the spinlock starts to become contended the CPUs all fight over the cache line and performance goes to pot. ISTM that making the spinlock critical section significantly longer by putting atomic ops in there could appear to win in some cases while being very bad in others. Well, I'm not saying it's something I suggest doing all the time. But if using an atomic op in the slow path allows you to remove the spinlock from 99% of the cases I don't see it having a significant impact. In most scenarios (where atomics aren't emulated, i.e. platforms we expect to used in heavily concurrent cases) the spinlock and the atomic will be on the same cacheline making stalls much less likely. And this again is my point: why can't we make the same argument about two spinlocks situated on the same cache line? Because it's not relevant? Where and why do we want to acquire two spinlocks that are on the same cacheline? As far as I know there's never been an actual need for nested spinlocks. So the guideline can be as restrictive as is it is because the price is low and there's no benefit in making it more complex. I think this line of discussion is pretty backwards. The reason I (and seemingly Heikki) want to use atomics is that it can reduce contention significantly. That contention is today too a good part based on spinlocks. Your argument is basically that increasing spinlock contention by doing things that can take more than a fixed cycles can increase contention. But the changes we've talked about only make sense if they significantly reduce spinlock contention in the first place - a potential for a couple iterations while holding better not be relevant for proposed patches. I don't think a blanket rule makes sense here. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] better atomics - v0.5
Robert Haas robertmh...@gmail.com writes: ... this again is my point: why can't we make the same argument about two spinlocks situated on the same cache line? I don't have a bit of trouble believing that doing the same thing with a couple of spinlocks could sometimes work out well, too, but Tom is adamantly opposed to that. I think you might be overstating my position here. What I'm concerned about is that we be sure that spinlocks are held for a sufficiently short time that it's very unlikely that we get pre-empted while holding one. I don't have any particular bright line about how short a time that is, but more than a few instructions worries me. As you say, the Linux kernel is a bad example to follow because it hasn't got a risk of losing its timeslice while holding a spinlock. The existing coding rules discourage looping (though I might be okay with a small constant loop count), and subroutine calls (mainly because somebody might add $random_amount_of_work to the subroutine if they don't realize it can be called while holding a spinlock). Both of these rules are meant to reduce the risk that a short interval balloons into a long one due to unrelated coding changes. The existing coding rules also discourage spinlocking within a spinlock, and the reason for that is that there's no very clear upper bound to the time required to obtain a spinlock, so that there would also be no clear upper bound to the time you're holding the original one (thus possibly leading to cascading performance problems). So ISTM the question we ought to be asking is whether atomic operations have bounded execution time, or more generally what the distribution of execution times is likely to be. I'd be OK with an answer that includes sometimes it can be long so long as sometimes is pretty damn seldom. Spinlocks have a nonzero risk of taking a long time already, since we can't entirely prevent the possibility of losing our timeslice while holding one. The issue here is just to be sure that that happens seldom enough that it doesn't cause performance problems. If we fail to do that we might negate all the desired performance improvements from adopting atomic ops in the first place. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Better partial index-only scans
Abhijit Menon-Sen a...@2ndquadrant.com writes: I don't think it's fair to mark this as returned with feedback without a more detailed review (I think of returned with feedback as providing a concrete direction to follow). I've set it back to needs review. Does anyone else want to look at this patch? I offered a few comments. I think it might also be useful to try to quantify the worst-case performance cost for this, which would arise for a single-table query on a table with lots of indexes. Don't have time for that myself though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] better atomics - v0.5
On Mon, Jun 30, 2014 at 12:54 PM, Tom Lane t...@sss.pgh.pa.us wrote: The existing coding rules also discourage spinlocking within a spinlock, and the reason for that is that there's no very clear upper bound to the time required to obtain a spinlock, so that there would also be no clear upper bound to the time you're holding the original one (thus possibly leading to cascading performance problems). Well, as Andres points out, commit daa7527afc2274432094ebe7ceb03aa41f916607 made that more of an ironclad requirement than a suggestion. If it's sometimes OK to acquire a spinlock from within a spinlock, then that commit was badly misguided; but the design of that patch was pretty much driven by your insistence that that was never, ever OK. If that was wrong, then we should reconsider that whole commit more broadly; but if it was right, then the atomics patch shouldn't drill a hole through it to install an exception just for emulating atomics. I'm personally not convinced that we're approaching this topic in the right way. I'm not convinced that it's at all reasonable to try to emulate atomics on platforms that don't have them. I would punt the problem into the next layer and force things like lwlock.c to have fallback implementations at that level that don't require atomics, and remove those fallback implementations if and when we move the goalposts so that all supported platforms must have working atomics implementations. People who write code that uses atomics are not likely to think about how those algorithms will actually perform when those atomics are merely emulated, and I suspect that means that in practice platforms that have only emulated atomics are going to regress significantly vs. the status quo today. If this patch goes in the way it's written right now, then I think it basically amounts to throwing platforms that don't have working atomics emulation under the bus, and my vote is that if we're going to do that, we should do it explicitly: sorry, we now only support platforms with working atomics. You don't have any, so you can't run PostgreSQL. Have a nice day. If we *don't* want to make that decision, then I'm not convinced that the approach this patch takes is in any way viable, completely aside from this particular question. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction (WIP)
2014-06-30 12:38 GMT+02:00 Abhijit Menon-Sen a...@2ndquadrant.com: If I understand correctly, the design of this patch has already been considered earlier and rejected. So I guess the patch should also be marked rejected? I didn't find a related message. ? Regards Pavel -- Abhijit
Re: [HACKERS] Spinlocks and compiler/memory barriers
On 2014-06-30 12:46:29 -0400, Robert Haas wrote: But trying to use a spinlock acquire-release to shore up problems with the spinlock release implementation makes my head explode. Well, it actually makes some sense. Nearly any TAS() implementation is going to have some memory barrier semantics - so using a TAS() as fallback makes sense. That's why we're relying on it for use in memory barrier emulation after all. As far as I know, all of our TAS() implementations prevent CPU reordering in the acquire direction. It is not clear that they provide CPU-reordering guarantees adequate for the release direction, even when paired with whatever S_UNLOCK() implementation they're mated with. And it's quite clear that many of them aren't adequate to prevent compiler-reordering in any case. I actually don't think there currently are tas() implementations that aren't compiler barriers? Maybe UNIVEL/unixware. A bit of luck. Also note that barrier.h assumes that a S_LOCK/S_UNLOCK acts as a compiler barrier - which really isn't guaranteed by the current contract of s_lock.h. Although the tas() implementations look safe. Ugh, you're right. That should really be moved out-of-line. Good. Then we already loose the compile time interdependency from barrier.h to s_lock.h - although the fallback will have a runtime dependency. Do you want to propose a patch that does *only* that first part before we go further here? Do you want me to try to put together such a patch based on the details mentioned on this thread? I'm fine with either - we're both going to flying pretty blind :/. I think the way S_UNLOCK's release memory barrier semantics are fixed might influence the way the compiler barrier issue can be solved. I think I'm starting to understand the terminological confusion: to me, a memory barrier means a fence against both the compiler and the CPU. I now think you're using it to mean a fence against the CPU, as distinct from a fence against the compiler. That had me really confused in some previous replies. Oh. Lets henceforth define 'fence' as the cache coherency thingy and read/write/release/acquire/memory barrier as the combination? Fixing the release semantics will involve going through all tas() implementations and see whether the default release semantics are ok. The ones with broken semantics will need to grow their own S_UNLOCK. I am wondering if that commit shouldn't just remove the default S_UNLOCK and move it to the tas() implementation sites. So, I think that here you are talking about CPU behavior rather than compiler behavior. Next paragraph is on that basis. Yes, I am. The implementations that don't currently have their own S_UNLOCK() are i386 x86_64 TSO, thus fine. Itanium As a special case volatile stores emit release/acquire fences unless specific compiler flags are used. Thus safe. ARM without GCC atomics Borked. S/390 zSeries Strongly ordered. Sparc Sun Studio Borked. At least in some setups. Linux m68k At least linux doesn't support SMP for m68k, so cache coherency shouldn't be a problem. VAX I don't really know, but I don't care. The NetBSD people statements about VAX SMP support didn't increase my concern for VAX SMP. m32r No idea. SuperH Not offially supported (as it's not in installation.sgml), don't care :) non-Linux m68k Couldn't figure out if anybody supports SMP here. Doesn't look like it. Univel CC Don't care. HP/UX non-GCC Itanics volatile semantics should work. and WIN32_ONLY_COMPILER Should be fine due to TSO on x86 and itanic volatiles. Because most of those are older platforms, I'm betting that more of them than not are OK; I think we should confine ourselves to trying to fix the ones we're sure are wrong Sounds sane. , which if I understand you correctly are ARM without GCC atomics, Sparc, and MIPS. I've to revise my statement on MIPS, it actually looks safe. I seem to have missed that it has its own S_UNLOCK. Do I see it correctly that !(defined(__mips__) !defined(__sgi)) isn't supported? I think it'd be better to just add copies of S_UNLOCK to those three rather and NOT duplicate the definition in the other 12 places. Ok. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Does changing attribute order breaks something?
geohas li...@hasibether.at writes: Are there any known issues when changing the attnums? airport=# update pg_attribute set attnum = 3 where attname = 'abc' and attrelid = 18328; UPDATE 1 airport=# update pg_attribute set attnum = 2 where attname = 'foo' and attrelid = 18328; UPDATE 1 I assume you quickly found out that that totally broke your table, so why are you asking? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] better atomics - v0.5
On 2014-06-30 13:15:23 -0400, Robert Haas wrote: On Mon, Jun 30, 2014 at 12:54 PM, Tom Lane t...@sss.pgh.pa.us wrote: The existing coding rules also discourage spinlocking within a spinlock, and the reason for that is that there's no very clear upper bound to the time required to obtain a spinlock, so that there would also be no clear upper bound to the time you're holding the original one (thus possibly leading to cascading performance problems). Well, as Andres points out, commit daa7527afc2274432094ebe7ceb03aa41f916607 made that more of an ironclad requirement than a suggestion. If it's sometimes OK to acquire a spinlock from within a spinlock, then that commit was badly misguided; but the design of that patch was pretty much driven by your insistence that that was never, ever OK. If that was wrong, then we should reconsider that whole commit more broadly; but if it was right, then the atomics patch shouldn't drill a hole through it to install an exception just for emulating atomics. Well, since nobody has a problem with the rule of not having nested spinlocks and since the problems aren't the same I'm failing to see why we should reconsider this. My recollection was that we primarily discussed whether it'd be a good idea to add code to Assert() when spinlocks are nested to which Tom responded that it's not necessary because critical sections should be short enough to immmediately see that that's not a problem. Then we argued a bit back and forth around that. I'm personally not convinced that we're approaching this topic in the right way. I'm not convinced that it's at all reasonable to try to emulate atomics on platforms that don't have them. I would punt the problem into the next layer and force things like lwlock.c to have fallback implementations at that level that don't require atomics, and remove those fallback implementations if and when we move the goalposts so that all supported platforms must have working atomics implementations. If we're requiring a atomics-less implementation for lwlock.c, bufmgr. et al. I'll stop working on those features (and by extension on atomics). It's an utter waste of resources and maintainability trying to maintain parallel high level locking algorithms in several pieces of the codebase. The nonatomic versions will stagnate and won't actually work under load. I'd rather spend my time on something useful. The spinlock based atomics emulation is *small*. It's easy to reason about correctness there. If we're going that way, we've made a couple of absolute fringe platforms hold postgres, as actually used in reality, hostage. I think that's insane. People who write code that uses atomics are not likely to think about how those algorithms will actually perform when those atomics are merely emulated, and I suspect that means that in practice platforms that have only emulated atomics are going to regress significantly vs. the status quo today. I think you're overstating the likely performance penalty for nonparallel platforms/workloads here quite a bit. The platforms without changes of gettings atomics implemented aren't ones where that's a likely thing. Yes, you'll see a regression if you run a readonly pgbench over a 4 node NUMA system - but it's not large. You won't see much of improved performance in comparison to 9.4, but I think that's primarily it. Also it's not like this is something new - the semaphore based fallback *sucks* but we still have it. *Many* features in new major versions regress performance for fringe platforms. That's normal. If this patch goes in the way it's written right now, then I think it basically amounts to throwing platforms that don't have working atomics emulation under the bus Why? Nobody is going to run 9.5 under high performance workloads on such platforms. They'll be so IO and RAM starved that the spinlocks won't be the bottleneck. , and my vote is that if we're going to do that, we should do it explicitly: sorry, we now only support platforms with working atomics. You don't have any, so you can't run PostgreSQL. Have a nice day. I'd be fine with that. I don't think we're gaining much by those platforms. *But* I don't really see why it's required at this point. The fallback works and unless you're using semaphore based spinlocks the performance isn't bad. The lwlock code doesn't really get slower/faster in comparison to before when the atomics implementation is backed by semaphores. It's pretty much the same number of syscalls using the atomics/current implementation. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set new system identifier using pg_resetxlog
Andres Freund wrote: c) We already allow to set pretty much all aspects of the control file via resetxlog - there seems little point of not having the ability to change the system identifier. I think it's pretty much a given that pg_resetxlog is a tool that can have disastrous effects if used lightly. If people changes their sysid wrongly, they're not any worse than if they change their multixact counters and start getting failures because the old values stored in data cannot be resolved anymore (it's already been wrapped around). Or if they remove all the XLOG they have since the latest crash. From that POV, I don't think the objection that but this can be used to corrupt data! has any value. Also on the other hand pg_resetxlog is already a tool for PG hackers to fool around and test things. In a way, being able to change values in pg_control is useful to many of us; this is only an extension of that. If we only had bricks and mortar, I think we would have a tool to display and tweak pg_control separately from emptying pg_xlog, rather than this odd separation between pg_controldata and pg_resetxlog, each of which do a mixture of those things. But we have a wall two thirds done already, so it seems to make more sense to me to continue forward rather than tear it down and start afresh. This patch is a natural extension of what we already have. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] better atomics - v0.5
Robert Haas robertmh...@gmail.com writes: I'm personally not convinced that we're approaching this topic in the right way. I'm not convinced that it's at all reasonable to try to emulate atomics on platforms that don't have them. I would punt the problem into the next layer and force things like lwlock.c to have fallback implementations at that level that don't require atomics, and remove those fallback implementations if and when we move the goalposts so that all supported platforms must have working atomics implementations. People who write code that uses atomics are not likely to think about how those algorithms will actually perform when those atomics are merely emulated, and I suspect that means that in practice platforms that have only emulated atomics are going to regress significantly vs. the status quo today. I think this is a valid objection, and I for one am not prepared to say that we no longer care about platforms that don't have atomic ops (especially not if it's not a *very small* set of atomic ops). Also, just because a platform claims to have atomic ops doesn't mean that those ops perform well. If there's a kernel trap involved, they don't, at least not for our purposes. We're only going to be bothering with installing atomic-op code in places that are contention bottlenecks for us already, so we are not going to be happy with the results for any atomic-op implementation that's not industrial strength. This is one reason why I'm extremely suspicious of depending on gcc's intrinsics for this; that will not make the issue go away, only make it beyond our power to control. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] better atomics - v0.5
On 2014-06-30 12:54:10 -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: ... this again is my point: why can't we make the same argument about two spinlocks situated on the same cache line? I don't have a bit of trouble believing that doing the same thing with a couple of spinlocks could sometimes work out well, too, but Tom is adamantly opposed to that. I think you might be overstating my position here. What I'm concerned about is that we be sure that spinlocks are held for a sufficiently short time that it's very unlikely that we get pre-empted while holding one. I don't have any particular bright line about how short a time that is, but more than a few instructions worries me. As you say, the Linux kernel is a bad example to follow because it hasn't got a risk of losing its timeslice while holding a spinlock. Well, they actually have preemptable and irq-interruptile versions for some of these locks, but whatever :). So ISTM the question we ought to be asking is whether atomic operations have bounded execution time, or more generally what the distribution of execution times is likely to be. I'd be OK with an answer that includes sometimes it can be long so long as sometimes is pretty damn seldom. I think it ranges from 'never' to 'sometimes, but pretty darn seldom'. Depending on the platform and emulation. And, leaving the emulation and badly written code aside, there's no issue with another backend sleeping while the atomic variable needs to be modified. Spinlocks have a nonzero risk of taking a long time already, since we can't entirely prevent the possibility of losing our timeslice while holding one. The issue here is just to be sure that that happens seldom enough that it doesn't cause performance problems. If we fail to do that we might negate all the desired performance improvements from adopting atomic ops in the first place. I think individual patches will have to stand up to a test like this. Modifying a atomic variable inside a spinlock is only going to be worth it if it allows to avoid spinlocks alltogether in 95%+ of the time. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] better atomics - v0.5
On 2014-06-30 19:44:47 +0200, Andres Freund wrote: On 2014-06-30 13:15:23 -0400, Robert Haas wrote: People who write code that uses atomics are not likely to think about how those algorithms will actually perform when those atomics are merely emulated, and I suspect that means that in practice platforms that have only emulated atomics are going to regress significantly vs. the status quo today. I think you're overstating the likely performance penalty for nonparallel platforms/workloads here quite a bit. The platforms without changes of gettings atomics implemented aren't ones where that's a likely thing. Yes, you'll see a regression if you run a readonly pgbench over a 4 node NUMA system - but it's not large. You won't see much of improved performance in comparison to 9.4, but I think that's primarily it. To quantify this, on my 2 socket xeon E5520 workstation - which is too small to heavily show the contention problems in pgbench -S - the numbers are: pgbench -M prepared -c 16 -j 16 -T 10 -S (best of 5, noticeably variability) master: 152354.294117 lwlocks-atomics: 170528.784958 lwlocks-atomics-spin: 159416.202578 pgbench -M prepared -c 1 -j 1 -T 10 -S (best of 5, noticeably variability) master: 16273.702191 lwlocks-atomics: 16768.712433 lwlocks-atomics-spin: 16744.913083 So, there really isn't a problem with the emulation. It's not actually that surprising - the absolute number of atomic ops is prety much the same. Where we earlier took a spinlock to increment shared we now still take one. I expect that repeating this on the 4 socket machine will show a large gap between lwlocks-atomics and the other two. The other two will be about the same, with lwlocks-atomics-spin remaining a bit better than master. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster name in ps output
Abjit, all: If we're adding log_line_prefix support for cluster_name (something I think is a good idea), we need to also add it to CSVLOG. So, where do we put it in CSVLog? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster name in ps output
On 2014-06-30 12:10:53 -0700, Josh Berkus wrote: Abjit, all: If we're adding log_line_prefix support for cluster_name (something I think is a good idea), we need to also add it to CSVLOG. So, where do we put it in CSVLog? That was shot down (unfortunately imo), and I don't think anybody actively working on it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Allow empty targets in unaccent dictionary
Abhijit Menon-Sen a...@2ndquadrant.com writes: I've attached a patch to contrib/unaccent as outlined in my review the other day. I went to commit this, and while testing I realized that the current implementation of unaccent_lexize is only capable of coping with src strings that are single characters in the current encoding. I'm not sure exactly how complicated it would be to fix that, but it might not be terribly difficult. (Overlapping matches would be the main complication, likely.) Anyway, this raises the question of whether the current patch is actually a desirable way to do things, or whether it would be better if the unaccenting rules were like base-char accent-char - base-char. Presumably, that would require more rules, but it would prevent deletion of a standalone accent-char, which might or might not be a good thing. Also, if there are any contexts where the right translation of an accent-char depends on the base-char, you couldn't do it with the patch as it stands. I don't know any languages that use separate accent-chars, so I'm not qualified to opine on whether this is important. It's not unlikely that we want this patch *and* an improvement that allows multi-character src strings, but I held off committing for the moment until somebody weighs in with an opinion about that. In any case, the patch failed to update the user-facing docs (unaccent.sgml) so we need some more work in that department. The user-facing docs are already pretty weak in that they fail to explain the whitespace rules, much less clarify that the src must be exactly a single character. If we don't improve the code to allow multi-character src, I wonder whether the rule-reading code shouldn't be improved to forbid such cases. I'm also wondering why it silently ignores malformed lines instead of bleating. But that's more or less orthogonal to this patch. Lastly, I didn't especially like the coding details of either proposed patch, and rewrote it as attached. I didn't see a need to introduce a state 5, and I also didn't agree with changing the initial values of trg/trglen to be meaningful. Right now, those variables are initialized only to keep the compiler from bleating; the initial values aren't actually used anywhere. It didn't seem like a good idea for the trgxxx variables to be different from the srcxxx variables in that respect. regards, tom lane diff --git a/contrib/unaccent/unaccent.c b/contrib/unaccent/unaccent.c index a337df6..5a31f85 100644 *** a/contrib/unaccent/unaccent.c --- b/contrib/unaccent/unaccent.c *** initTrie(char *filename) *** 104,114 while ((line = tsearch_readline(trst)) != NULL) { ! /* ! * The format of each line must be src trg where src and trg ! * are sequences of one or more non-whitespace characters, ! * separated by whitespace. Whitespace at start or end of ! * line is ignored. */ int state; char *ptr; --- 104,124 while ((line = tsearch_readline(trst)) != NULL) { ! /*-- ! * The format of each line must be src or src trg, where ! * src and trg are sequences of one or more non-whitespace ! * characters, separated by whitespace. Whitespace at start ! * or end of line is ignored. If trg is omitted, an empty ! * string is used as the replacement. ! * ! * We use a simple state machine, with states ! * 0 initial (before src) ! * 1 in src ! * 2 in whitespace after src ! * 3 in trg ! * 4 in whitespace after trg ! * -1 syntax error detected (line will be ignored) ! *-- */ int state; char *ptr; *** initTrie(char *filename) *** 160,166 } } ! if (state = 3) rootTrie = placeChar(rootTrie, (unsigned char *) src, srclen, trg, trglen); --- 170,183 } } ! if (state == 1 || state == 2) ! { ! /* trg was omitted, so use */ ! trg = ; ! trglen = 0; ! } ! ! if (state 0) rootTrie = placeChar(rootTrie, (unsigned char *) src, srclen, trg, trglen); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C
* From: Noah Misch [mailto:n...@leadboat.com] I liked the proposal here; was there a problem with it? http://www.postgresql.org/message- id/ca+tgmoz3ake4enctmqmzsykc_0pjl_u4c_x47ge48uy1upb...@mail.gmail.com You're referring to the suggestion of accepting and ignoring the option on non-Windows, right? I can do that, I just don't see the point as long as pg_ctl has a separate code path (well, #ifdef) for Windows anyway. The pg_upgrade test suite and the $(prove_check)-based test suites rely on their pg_ctl-started postmasters receiving any console ^C. pg_ctl deserves a --foreground or --no-background option for callers that prefer the current behavior. That, or those tests need a new way to launch the postmaster. Ah. More good news. Just to confirm, this is only about the tests, right, not the code they are testing? If so, is there even a way to run either on Windows? The pg_upgrade test suite is a shell script, and prove_check is defined in Makefile.global and definitely not applicable to Windows. -- Christian -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Allow empty targets in unaccent dictionary
At 2014-06-30 15:19:17 -0400, t...@sss.pgh.pa.us wrote: Anyway, this raises the question of whether the current patch is actually a desirable way to do things, or whether it would be better if the unaccenting rules were like base-char accent-char - base-char. It might be useful to be able to write such rules, but it would be highly impractical to do so instead of being able to single out accent-chars for removal. In all the languages I'm familiar with that use such accent-chars, any accent-char would form a valid combination with nearly every base-char, unlike European languages where you don't have to worry about k-umlaut, say. Also, a standalone accent-char would always be meaningless. (These accent-chars don't actually exist independently in the syllabary that a Hindi speaker might learn in school: they're combining forms of vowels and are treated differently from characters in practice.) Also, if there are any contexts where the right translation of an accent-char depends on the base-char, you couldn't do it with the patch as it stands. I can't think of a satisfactory example at the moment, but that sounds entirely plausible. It's not unlikely that we want this patch *and* an improvement that allows multi-character src strings I think it's enough to apply just this patch, but I wouldn't object to doing both if it were easy. It's not clear to me if that's true after a quick glance at the code, but I'll look again when I'm properly awake. Lastly, I didn't especially like the coding details of either proposed patch, and rewrote it as attached. :-) -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] better atomics - v0.5
On 2014-06-30 13:45:52 -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: I'm personally not convinced that we're approaching this topic in the right way. I'm not convinced that it's at all reasonable to try to emulate atomics on platforms that don't have them. I would punt the problem into the next layer and force things like lwlock.c to have fallback implementations at that level that don't require atomics, and remove those fallback implementations if and when we move the goalposts so that all supported platforms must have working atomics implementations. People who write code that uses atomics are not likely to think about how those algorithms will actually perform when those atomics are merely emulated, and I suspect that means that in practice platforms that have only emulated atomics are going to regress significantly vs. the status quo today. I think this is a valid objection, and I for one am not prepared to say that we no longer care about platforms that don't have atomic ops (especially not if it's not a *very small* set of atomic ops). It's still TAS(), cmpxchg(), xadd. With TAS/xadd possibly implemented via cmpxchg (which quite possibly *is* the native implementation for $arch). Also, just because a platform claims to have atomic ops doesn't mean that those ops perform well. If there's a kernel trap involved, they don't, at least not for our purposes. Yea. I think if we start to use 8byte atomics at some later point we're going to have to prohibit e.g. older arms from using them, even if the compiler claims they're supported. But that's just one #define. This is one reason why I'm extremely suspicious of depending on gcc's intrinsics for this; that will not make the issue go away, only make it beyond our power to control. The patch as submitted makes it easy to provide a platform specific implementation of the important routines if it's likely that it's better than the intrinsics provided one. I'm not too worried about the intrinsics though - the number of projects relying on them these days is quite large. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C
On Mon, Jun 30, 2014 at 07:28:03PM +, Christian Ullrich wrote: * From: Noah Misch [mailto:n...@leadboat.com] I liked the proposal here; was there a problem with it? http://www.postgresql.org/message- id/ca+tgmoz3ake4enctmqmzsykc_0pjl_u4c_x47ge48uy1upb...@mail.gmail.com You're referring to the suggestion of accepting and ignoring the option on non-Windows, right? I can do that, I just don't see the point as long as pg_ctl has a separate code path (well, #ifdef) for Windows anyway. Yes. We normally recognize platform-specific options on every platform. For example, the effective_io_concurrency GUC exists on all platforms, but you can only change it on platforms where it matters. In that light, one could argue for raising an error for --background on non-Windows systems. I don't have a strong opinion on raising an error vs. ignoring the option, but I do think the outcome of --background should be distinguishable from the outcome of --sometypo on all platforms. The pg_upgrade test suite and the $(prove_check)-based test suites rely on their pg_ctl-started postmasters receiving any console ^C. pg_ctl deserves a --foreground or --no-background option for callers that prefer the current behavior. That, or those tests need a new way to launch the postmaster. Ah. More good news. Just to confirm, this is only about the tests, right, not the code they are testing? Yes; the consequence of ignoring ^C is that the test postmaster would persist indefinitely after the ^C. The system under test doesn't care per se, but future test runs might fail strangely due to the conflicting postmaster. The user running the tests won't appreciate the clutter in his process list. If so, is there even a way to run either on Windows? The pg_upgrade test suite is a shell script, and prove_check is defined in Makefile.global and definitely not applicable to Windows. contrib/pg_upgrade/test.sh works under MSYS. Perhaps $(prove_check) currently doesn't, but that's something to be fixed, not relied upon. There's a clone of contrib/pg_upgrade/test.sh in src/tools/msvc/vcregress.pl, and I expect it would have the same problem. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins
Hi, attached is v5 of the patch. The main change is that scaling the number of buckets is done only once, after the initial hash table is build. The main advantage of this is lower price. This also allowed some cleanup of unecessary code. However, this new patch causes warning like this: WARNING: temporary file leak: File 231 still referenced I've been eyeballing the code for a while now, but I still fail to see where this comes from :-( Any ideas? regards Tomas diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 0d9663c..db3a953 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -1900,18 +1900,20 @@ show_hash_info(HashState *hashstate, ExplainState *es) if (es-format != EXPLAIN_FORMAT_TEXT) { ExplainPropertyLong(Hash Buckets, hashtable-nbuckets, es); + ExplainPropertyLong(Original Hash Buckets, +hashtable-nbuckets_original, es); ExplainPropertyLong(Hash Batches, hashtable-nbatch, es); ExplainPropertyLong(Original Hash Batches, hashtable-nbatch_original, es); ExplainPropertyLong(Peak Memory Usage, spacePeakKb, es); } - else if (hashtable-nbatch_original != hashtable-nbatch) + else if ((hashtable-nbatch_original != hashtable-nbatch) || (hashtable-nbuckets_original != hashtable-nbuckets)) { appendStringInfoSpaces(es-str, es-indent * 2); appendStringInfo(es-str, - Buckets: %d Batches: %d (originally %d) Memory Usage: %ldkB\n, - hashtable-nbuckets, hashtable-nbatch, - hashtable-nbatch_original, spacePeakKb); + Buckets: %d (originally %d) Batches: %d (originally %d) Memory Usage: %ldkB\n, + hashtable-nbuckets, hashtable-nbuckets_original, + hashtable-nbatch, hashtable-nbatch_original, spacePeakKb); } else { diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c index 589b2f1..fbd721d 100644 --- a/src/backend/executor/nodeHash.c +++ b/src/backend/executor/nodeHash.c @@ -37,8 +37,10 @@ #include utils/lsyscache.h #include utils/syscache.h +bool enable_hashjoin_bucket = true; static void ExecHashIncreaseNumBatches(HashJoinTable hashtable); +static void ExecHashIncreaseNumBuckets(HashJoinTable hashtable); static void ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node, int mcvsToUse); static void ExecHashSkewTableInsert(HashJoinTable hashtable, @@ -48,6 +50,33 @@ static void ExecHashSkewTableInsert(HashJoinTable hashtable, static void ExecHashRemoveNextSkewBucket(HashJoinTable hashtable); +/* + * Compute appropriate size for hashtable given the estimated size of the + * relation to be hashed (number of rows and average row width). + * + * This is exported so that the planner's costsize.c can use it. + */ + +/* Target bucket loading (tuples per bucket) */ +#define NTUP_PER_BUCKET 10 + +/* Multiple of NTUP_PER_BUCKET triggering the increase of nbuckets. + * + * Once we reach the threshold we double the number of buckets, and we + * want to get 1.0 on average (to get NTUP_PER_BUCKET on average). That + * means these two equations should hold + * + * b = 2a (growth) + * (a + b)/2 = 1 (oscillate around NTUP_PER_BUCKET) + * + * which means b=1. (a = b/2). If we wanted higher threshold, we + * could grow the nbuckets to (4*nbuckets), thus using (b=4a) for + * growth, leading to (b=1.6). Or (b=8a) giving 1. etc. + * + * Let's start with doubling the bucket count, i.e. 1.333. */ +#define NTUP_GROW_COEFFICIENT 1.333 +#define NTUP_GROW_THRESHOLD (NTUP_PER_BUCKET * NTUP_GROW_COEFFICIENT) + /* * ExecHash * @@ -126,6 +155,23 @@ MultiExecHash(HashState *node) } } + /* If average number of tuples per bucket is over the defined threshold, + * increase the number of buckets to get below it. */ + if (enable_hashjoin_bucket) { + + double batchTuples = (hashtable-totalTuples / hashtable-nbatch); + + if (batchTuples (hashtable-nbuckets * NTUP_GROW_THRESHOLD)) { + +#ifdef HJDEBUG + printf(Increasing nbucket to %d (average per bucket = %.1f)\n, + nbuckets, (batchTuples / hashtable-nbuckets)); +#endif + ExecHashIncreaseNumBuckets(hashtable); + + } + } + /* must provide our own instrumentation support */ if (node-ps.instrument) InstrStopNode(node-ps.instrument, hashtable-totalTuples); @@ -271,6 +317,7 @@ ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls) */ hashtable = (HashJoinTable) palloc(sizeof(HashJoinTableData)); hashtable-nbuckets = nbuckets; + hashtable-nbuckets_original = nbuckets; hashtable-log2_nbuckets = log2_nbuckets; hashtable-buckets = NULL; hashtable-keepNulls = keepNulls; @@ -375,17 +422,6 @@ ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls) return hashtable; } - -/* - * Compute appropriate size for hashtable given the estimated size of the - * relation to be hashed (number of rows and average row width). - * - * This
Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C
* From: Noah Misch [mailto:n...@leadboat.com] On Mon, Jun 30, 2014 at 07:28:03PM +, Christian Ullrich wrote: * From: Noah Misch [mailto:n...@leadboat.com] I liked the proposal here; was there a problem with it? http://www.postgresql.org/message- id/ca+tgmoz3ake4enctmqmzsykc_0pjl_u4c_x47ge48uy1upb...@mail.gmail.co m You're referring to the suggestion of accepting and ignoring the option on non-Windows, right? I can do that, I just don't see the point as long as pg_ctl has a separate code path (well, #ifdef) for Windows anyway. Yes. We normally recognize platform-specific options on every platform. For example, the effective_io_concurrency GUC exists on all platforms, but you can only change it on platforms where it matters. In that light, one could argue for raising an error for --background on non-Windows systems. I don't have a strong opinion on raising an error vs. ignoring the option, but I do think the outcome of --background should be distinguishable from the outcome of --sometypo on all platforms. I'm convinced, will change to --sometypo treatment. The pg_upgrade test suite and the $(prove_check)-based test suites rely on their pg_ctl-started postmasters receiving any console ^C. pg_ctl deserves a --foreground or --no-background option for callers that prefer the current behavior. That, or those tests need a new way to launch the postmaster. Ah. More good news. Just to confirm, this is only about the tests, right, not the code they are testing? Yes; the consequence of ignoring ^C is that the test postmaster would persist indefinitely after the ^C. The system under test doesn't care per No it won't, please see below. If so, is there even a way to run either on Windows? The pg_upgrade test suite is a shell script, and prove_check is defined in Makefile.global and definitely not applicable to Windows. contrib/pg_upgrade/test.sh works under MSYS. Perhaps $(prove_check) currently doesn't, but that's something to be fixed, not relied upon. Yes. That doesn't matter, though. There's a clone of contrib/pg_upgrade/test.sh in src/tools/msvc/vcregress.pl, and I expect it would have the same problem. Oh, right. I didn't notice that because it doesn't mention upgradecheck in its usage message. I think I know where we're talking past each other. You may be assuming that kill(postmaster, SIGINT) would be affected by my patch. It would not. PostgreSQL's signal emulation on Windows works completely separately from the actual Windows analogy to signals in console processes. My patch drops these analogous events, but the emulated signals still work the same way as before. When Ctrl-C is pressed in a console window, pg_console_handler() in backend/port/win32/signal.c is called with a CTRL_C_EVENT, and converts that to an emulated SIGINT (unless I tell it not to). That emulated SIGINT is then processed as usual. pg_ctl -m fast stop sends the emulated SIGINT directly. Anyway, I just ran upgradecheck, and it reported success without leaving any stale postmasters around. -- Christian -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Fri, Jun 27, 2014 at 4:10 AM, Dilip kumar dilip.ku...@huawei.com wrote: ... Updated patch is attached in the mail.. Thanks Dilip. I get a compiler warning when building on Windows. When I started looking into that, I see that two files have too much code duplication between them: src/bin/scripts/vac_parallel.c (new file) src/bin/pg_dump/parallel.c (existing file) In particular, pgpipe is almost an exact duplicate between them, except the copy in vac_parallel.c has fallen behind changes made to parallel.c. (Those changes would have fixed the Windows warnings). I think that this function (and perhaps other parts as well--exit_horribly for example) need to refactored into a common file that both files can include. I don't know where the best place for that would be, though. (I haven't done this type of refactoring myself.) Also, there are several places in the patch which use spaces for indentation where tabs are called for by the coding style. It looks like you may have copied the code from one terminal window and copied it into another one, converting tabs to spaces in the process. This makes it hard to evaluate the amount of code duplication. In some places the code spins in a tight loop while waiting for a worker process to become free. If I strace the process, I got a long list of selects with 0 time outs: select(13, [6 8 10 12], NULL, NULL, {0, 0}) = 0 (Timeout) I have not tried to track down the code that causes it. I did notice that vacuumdb spends an awful lot of time at the top of the Linux top output, and this is probably why. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fresh initdb contains a few deleted B-Tree pages
I would like to put on the record that there are a few deleted B-Tree pages in a freshly initdb'd database. As my btreecheck tools shows: mgd=# select bt_index_verify(indexrelid::regclass::text) from pg_index; NOTICE: 0: page 12 of index pg_attribute_relid_attnam_index is deleted LOCATION: bt_page_verify_worker, btreecheck.c:206 NOTICE: 0: page 13 of index pg_attribute_relid_attnam_index is deleted LOCATION: bt_page_verify_worker, btreecheck.c:206 NOTICE: 0: page 14 of index pg_attribute_relid_attnam_index is deleted LOCATION: bt_page_verify_worker, btreecheck.c:206 NOTICE: 0: page 9 of index pg_attribute_relid_attnum_index is deleted LOCATION: bt_page_verify_worker, btreecheck.c:206 (contrib/pageinspect's bt_page_items() will show a similar message if this is done manually) This is not a new-to-9.4 issue. I am not suggesting that it's a real problem that requires immediate fixing, but it is suboptimal. We may wish to fix this someday. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format and API changes (9.5)
On Tue, Jul 1, 2014 at 5:51 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Thanks for the review! On 06/27/2014 09:12 AM, Michael Paquier wrote: Looking at this patch, it does not compile when assertions are enabled because of this assertion in heapam.c: Assert(recdata == data + datalen); It seems like a thinko as removing it makes the code compile. Fixed. Some comments about the code: 1) In src/backend/access/transam/README: 's/no further action is no required/no further action is required/g' Fixed. 2) Description of XLogBeginInsert, XLogHasBlockRef, XLogGetBlockRef and XLogGetBlockRefIds are missing in src/backend/access/transam/README. Added descriptions of XLogBeginInsert and XLogHasBlockRef. Ḯ'm not sure what to do with the latter two. They're not really intended for use by redo routines, but for things like pg_xlogdump that want to extract information about any record type. 3) There are a couple of code blocks marked as FIXME and BROKEN. There are as well some NOT_USED blocks. Yeah. The FIXMEs and BROKEN blocks are all in rm_desc routines. They mostly stand for code that used to print block numbers or relfilenodes, which doesn't make much sense to do in an ad hoc way in rmgr-specific desc routines anymore. Will need to go through them all an decide what's the important information to print for each record type. The few NOT_USED blocks are around code in redo routines that extract some information from the WAL record, that isn't needed anymore. We could remove the fields from the WAL records altogether, but might be useful to keep for debugging purposes. 4) Current patch crashes when running make check in contrib/test_decoding. Looking closer, wal_level = logical is broken: =# show wal_level; wal_level --- logical (1 row) =# create database foo; The connection to the server was lost. Attempting reset: Failed. Fixed. Looking even closer, log_heap_new_cid is broken: Fixed. 6) XLogRegisterData creates a XLogRecData entry and appends it in one of the static XLogRecData entries if buffer_id = 0 or to rdata_head/rdata_tail if buffer_id = -1 (for permanent data related to the WAL record). XLogRegisterXLogRecData does something similar, but with an entry of XLogRecData already built. What do you think about removing XLogRegisterXLogRecData and keeping only XLogRegisterData. This makes the interface simpler, and XLogRecData could be kept private in xlog.c. This would need more work especially on gin side where I am seeing for example constructLeafRecompressWALData directly building a XLogRecData entry. Hmm. I considered that, but punted because I didn't want to rewrite all the places that currently use XLogRecDatas in less straightforward ways. I kept XLogRegisterXLogRecData as an escape hatch for those. But I bit the bullet now, and got rid of all the XLogRegisterXLogRecData() calls. XLogRecData struct is now completely private to xlog.c. Another big change in the attached patch version is that XLogRegisterData() now *copies* the data to a working area, instead of merely adding a XLogRecData entry to the chain. This simplifies things in xlog.c, now that the XLogRecData concept is not visible to the rest of the system. This is a subtle semantic change for the callers: you can no longer register a struct first, and fill the fields afterwards. That adds a lot of memcpys to the critical path, which could be bad for performance. In practice, I think it's OK, or even a win, because a typical WAL record payload is very small. But I haven't measured that. If it becomes an issue, we could try to optimize, and link larger data blocks to the rdata chain, but memcpy small small chunks of data. There are a few WAL record types that don't fit in a small statically allocated buffer, so I provided a new function, XLogEnsureSpace(), that can be called before XLogBegin() to allocate a large-enough working area. These changes required changing a couple of places where WAL records are created: * ginPlaceToPage. This function has a strange split of responsibility between ginPlaceToPage and dataPlaceToPage/entryPlaceToPage. ginPlaceToPage calls dataPlaceToPage or entryPlaceToPage, depending on what kind of a tree it is, and dataPlaceToPage or entryPlaceToPage contributes some data to the WAL record. Then ginPlaceToPage adds some more, and finally calls XLogInsert(). I simplified the SPLIT case so that we now just create full page images of both page halves. We were already logging all the data on both page halves, but we were doing it in a smarter way, knowing what kind of pages they were. For example, for an entry tree page, we included an IndexTuple struct for every tuple on the page, but didn't include the item pointers. A straight full page image takes more space, but not much, so I think in any real application it's going to be lost in noise. (again, haven't measured it though) 8)
Re: [HACKERS] Spinlocks and compiler/memory barriers
On 2014-06-30 19:22:59 +0200, Andres Freund wrote: On 2014-06-30 12:46:29 -0400, Robert Haas wrote: , which if I understand you correctly are ARM without GCC atomics, Sparc, and MIPS. I've to revise my statement on MIPS, it actually looks safe. I seem to have missed that it has its own S_UNLOCK. So, here's my first blind attempt at fixing these. Too tired to think much more about it. Sparc's configurable cache coherency drives me nuts. Linux apparently switched somewhere in 2011 from RMO (relaxed memory order) to TSO (total store order), solaris always used TSO, but the BSDs don't. Man. Accordingly there's a somewhat ugly thingy attached. I don't think this is really ready, but it's what I can come up today. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 691839dc8fbb78134b530096d33d8a1307231eef Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Tue, 1 Jul 2014 00:11:17 +0200 Subject: [PATCH] Fix spinlock implementations for some sparc and arm platforms. When compiling postgres on arm using a older gcc, that doesn't yet understand __sync_lock_test_and_set(), the default S_UNLOCK routine was used. Unfortunately that's not correct for ARM's memory model. The support for older gccs is using the swbp instruction (available for both arvm5 and most armv6) - unfortunately there's not a common barrier instruction for those architecture versions. So instead implement unlock using swbp again, that's possibly a bit slower but correct. Some Sparc CPUs can be run in various coherence models, ranging from RMO (relaxed) over PSO (partial) to TSO (total). Solaris has always run CPUs in TSO mode, but linux didn't use to and the various *BSDs still don't. Unfortunately the sparc TAS/S_UNLOCK were only correct under TSO. Fix that by adding the necessary memory barrier instructions. On sparcv8+, which should be all relavant CPUs, these are treated as NOPs if the current consistency model doesn't require the barriers. Blindly written, so possibly not working. They didn't use to use correct acquire/ --- src/backend/port/tas/sunstudio_sparc.s | 2 + src/include/storage/s_lock.h | 74 +- 2 files changed, 75 insertions(+), 1 deletion(-) diff --git a/src/backend/port/tas/sunstudio_sparc.s b/src/backend/port/tas/sunstudio_sparc.s index 486b7be..dcf54e2 100644 --- a/src/backend/port/tas/sunstudio_sparc.s +++ b/src/backend/port/tas/sunstudio_sparc.s @@ -37,6 +37,8 @@ pg_atomic_cas: ! ! http://cvs.opensolaris.org/source/xref/on/usr/src/lib/libc/sparc/threads/sparc.il ! + ! NB: We're assuming we're running on a TSO system here - solaris + ! userland luckily always has done so. #if defined(__sparcv9) || defined(__sparcv8plus) cas [%o0],%o2,%o1 diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h index 895abe6..9aa9e8e 100644 --- a/src/include/storage/s_lock.h +++ b/src/include/storage/s_lock.h @@ -341,6 +341,30 @@ tas(volatile slock_t *lock) return (int) _res; } +/* + * Implement unlocking using swpb as well, that guarantees release + * semantics. Only some armv5 models have the data synchronization barrier + * instructions - but those need it. Luckily swpb always works. + */ +#define S_UNLOCK(lock) s_unlock(lock) + +static __inline__ void +s_unlock(volatile slock_t *lock) +{ + register slock_t _res = 0; + + __asm__ __volatile__( + swpb %0, %0, [%2] \n +: +r(_res), +m(*lock) +: r(lock) +: memory); + + /* lock should have been held */ + Assert(res == 1); +} + +#define S_INIT_LOCK(lock) (*(lock) == 0) + #endif /* HAVE_GCC_INT_ATOMICS */ #endif /* __arm__ */ @@ -393,6 +417,12 @@ tas(volatile slock_t *lock) #if defined(__sparc__) /* Sparc */ +/* + * Solaris has always run sparc processors in TSO (total store) mode, but + * linux didn't use to and the *BSDs still don't. So, be careful about + * acquire/release semantics. The CPU will treat superflous membars as NOPs, + * so it's just code space. + */ #define HAS_TEST_AND_SET typedef unsigned char slock_t; @@ -414,9 +444,51 @@ tas(volatile slock_t *lock) : =r(_res), +m(*lock) : r(lock) : memory); +#if defined(__sparcv7) + /* + * No stbar or membar available, luckily no actually produced hardware + * requires a barrier + */ +#elif defined(__sparcv8) + /* stbar is available (and required for both PSO, RMO), membar isn't */ + __asm__ __volatile__ (stbar \n:::memory); +#else + /* + * #StoreLoad (RMO) | #StoreStore (PSO, RMO)are the approppriate release + * barrier for sparcv8 upwards. + */ + __asm__ __volatile__ (membar #StoreLoad | #StoreStore \n:::memory); +#endif return (int) _res; } +#if defined(__sparcv7) +/* + * No stbar or membar available, luckily no actually produced hardware + * requires a barrier + */ +#define S_UNLOCK(lock) (*((volatile slock_t *) (lock)) = 0) +#elif __sparcv8 +/* stbar is available (and required
Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins
On 30.6.2014 23:12, Tomas Vondra wrote: Hi, attached is v5 of the patch. The main change is that scaling the number of buckets is done only once, after the initial hash table is build. The main advantage of this is lower price. This also allowed some cleanup of unecessary code. However, this new patch causes warning like this: WARNING: temporary file leak: File 231 still referenced I've been eyeballing the code for a while now, but I still fail to see where this comes from :-( Any ideas? Meh, the patches are wrong - I haven't realized the tight coupling between buckets/batches in ExecHashGetBucketAndBatch: *bucketno = hashvalue (nbuckets - 1); *batchno = (hashvalue hashtable-log2_nbuckets) (nbatch - 1); The previous patches worked mostly by pure luck (the nbuckets was set only in the first batch), but once I moved the code at the end, it started failing. And by worked I mean didn't throw an error, but probably returned bogus results with (nbatch1). However, ISTM this nbuckets-nbatch coupling is not really necessary, it's only constructed this way to assign independent batch/bucket. So I went and changed the code like this: *bucketno = hashvalue (nbuckets - 1); *batchno = (hashvalue (32 - hashtable-log2_nbatch)); I.e. using the top bits for batchno, low bits for bucketno (as before). Hopefully I got it right this time. At least it seems to be working for cases that failed before (no file leaks, proper rowcounts so far). regards Tomas diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 0d9663c..db3a953 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -1900,18 +1900,20 @@ show_hash_info(HashState *hashstate, ExplainState *es) if (es-format != EXPLAIN_FORMAT_TEXT) { ExplainPropertyLong(Hash Buckets, hashtable-nbuckets, es); + ExplainPropertyLong(Original Hash Buckets, +hashtable-nbuckets_original, es); ExplainPropertyLong(Hash Batches, hashtable-nbatch, es); ExplainPropertyLong(Original Hash Batches, hashtable-nbatch_original, es); ExplainPropertyLong(Peak Memory Usage, spacePeakKb, es); } - else if (hashtable-nbatch_original != hashtable-nbatch) + else if ((hashtable-nbatch_original != hashtable-nbatch) || (hashtable-nbuckets_original != hashtable-nbuckets)) { appendStringInfoSpaces(es-str, es-indent * 2); appendStringInfo(es-str, - Buckets: %d Batches: %d (originally %d) Memory Usage: %ldkB\n, - hashtable-nbuckets, hashtable-nbatch, - hashtable-nbatch_original, spacePeakKb); + Buckets: %d (originally %d) Batches: %d (originally %d) Memory Usage: %ldkB\n, + hashtable-nbuckets, hashtable-nbuckets_original, + hashtable-nbatch, hashtable-nbatch_original, spacePeakKb); } else { diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c index 589b2f1..48cca62 100644 --- a/src/backend/executor/nodeHash.c +++ b/src/backend/executor/nodeHash.c @@ -37,8 +37,10 @@ #include utils/lsyscache.h #include utils/syscache.h +bool enable_hashjoin_bucket = true; static void ExecHashIncreaseNumBatches(HashJoinTable hashtable); +static void ExecHashIncreaseNumBuckets(HashJoinTable hashtable); static void ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node, int mcvsToUse); static void ExecHashSkewTableInsert(HashJoinTable hashtable, @@ -48,6 +50,33 @@ static void ExecHashSkewTableInsert(HashJoinTable hashtable, static void ExecHashRemoveNextSkewBucket(HashJoinTable hashtable); +/* + * Compute appropriate size for hashtable given the estimated size of the + * relation to be hashed (number of rows and average row width). + * + * This is exported so that the planner's costsize.c can use it. + */ + +/* Target bucket loading (tuples per bucket) */ +#define NTUP_PER_BUCKET 10 + +/* Multiple of NTUP_PER_BUCKET triggering the increase of nbuckets. + * + * Once we reach the threshold we double the number of buckets, and we + * want to get 1.0 on average (to get NTUP_PER_BUCKET on average). That + * means these two equations should hold + * + * b = 2a (growth) + * (a + b)/2 = 1 (oscillate around NTUP_PER_BUCKET) + * + * which means b=1. (a = b/2). If we wanted higher threshold, we + * could grow the nbuckets to (4*nbuckets), thus using (b=4a) for + * growth, leading to (b=1.6). Or (b=8a) giving 1. etc. + * + * Let's start with doubling the bucket count, i.e. 1.333. */ +#define NTUP_GROW_COEFFICIENT 1.333 +#define NTUP_GROW_THRESHOLD (NTUP_PER_BUCKET * NTUP_GROW_COEFFICIENT) + /* * ExecHash * @@ -116,6 +145,7 @@ MultiExecHash(HashState *node) /* It's a skew tuple, so put it into that hash table */ ExecHashSkewTableInsert(hashtable, slot, hashvalue, bucketNumber); +hashtable-skewTuples += 1; } else { @@ -126,6 +156,23 @@ MultiExecHash(HashState *node)
Re: [HACKERS] PATCH: Allow empty targets in unaccent dictionary
Abhijit Menon-Sen a...@2ndquadrant.com writes: At 2014-06-30 15:19:17 -0400, t...@sss.pgh.pa.us wrote: Anyway, this raises the question of whether the current patch is actually a desirable way to do things, or whether it would be better if the unaccenting rules were like base-char accent-char - base-char. It might be useful to be able to write such rules, but it would be highly impractical to do so instead of being able to single out accent-chars for removal. On reflection, if we were thinking of this as a general substring-replacement mechanism rather than just a de-accenter (and why shouldn't we think of it that way?), then clearly both multi-character source strings and zero-character substitute strings could be of value. The fact that the existing patch only fixes one of those omissions isn't a strike against it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum cleanup locks
Zombie thread arise! I was searching for old threads on a specific problem and came across this patch that was dropped due to concerns about SnapshotNow: On Wed, Nov 2, 2011 at 3:19 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jun 7, 2011 at 3:24 PM, Greg Stark gsst...@mit.edu wrote: Well it's super-exclusive-vacuum-lock avoidance techniques. Why shouldn't it make more sense to try to reduce the frequency and impact of the single-purpose outlier in a non-critical-path instead of burdening every other data reader with extra overhead? I think Robert's plan is exactly right though I would phrase it differently. We should get the exclusive lock, freeze/kill any xids and line pointers, then if the pin-count is 1 do the compaction. I wrote a really neat patch to do this today... and then, as I thought about it some more, I started to think that it's probably unsafe. Here's the trouble: with this approach, we assume that it's OK to change the contents of the line pointer while holding only an exclusive lock on the buffer. But there is a good deal of code out there that thinks it's OK to examine a line pointer with only a pin on the buffer (no lock). You need a content lock to scan the item pointer array, but once you've identified an item of interest, you're entitled to assume that it won't be modified while you hold a buffer pin. Now, if you've identified a particular tuple as being visible to your scan, then you might think that VACUUM shouldn't be removing it anyway. But I think that's only true for MVCC scans - for example, what happens under SnapshotNow semantics? Well now that you've done all that amazing work eliminating SnapshotNow maybe this patch deserves another look? I see anti-wraparound vacuums more and more often as database transaction velocity increases and vacuum takes longer and longer as database sizes increase. Having a faster vacuum that can skip vacuuming pages but is still guaranteed to freeze every page is pretty tempting. Of course the freeze epoch in the header obviating the need for freezing is an even more attractive goal but AIUI we're not even sure that's going to work and this is a nice easy win. But then then on third thought, if you've also got an MVCC snapshot taken before the start of the SnapshotNow scan, you are probably OK, because your advertised xmin should prevent anyone from removing anything anyway, so how do you actually provoke a failure? Anyway, I'm attaching the patch, in case anyone has any ideas on where to go with this. I'm really wishing we had more bits in the vm. It looks like we could use: - contains not-all-visible tuples - contains not-frozen xids - in need of compaction Another idea would be to store the upper 2-4 bits of the oldest xid in the page. That would let you determine whether it's in need of an anti-wraparound vacuum. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Jeff Janes wrote: In particular, pgpipe is almost an exact duplicate between them, except the copy in vac_parallel.c has fallen behind changes made to parallel.c. (Those changes would have fixed the Windows warnings). I think that this function (and perhaps other parts as well--exit_horribly for example) need to refactored into a common file that both files can include. I don't know where the best place for that would be, though. (I haven't done this type of refactoring myself.) I think commit d2c1740dc275543a46721ed254ba3623f63d2204 is apropos. Maybe we should move pgpipe back to src/port and have pg_dump and this new thing use that. I'm not sure about the rest of duplication in vac_parallel.c; there might be a lot in common with what pg_dump/parallel.c does too. Having two copies of code is frowned upon for good reasons. This patch introduces 1200 lines of new code in vac_parallel.c, ugh. If we really require 1200 lines to get parallel vacuum working for vacuumdb, I would question the wisdom of this effort. To me, it seems better spent improving autovacuum to cover whatever it is that this patch is supposed to be good for --- or maybe just enable having a shell script that launches multiple vacuumdb instances in parallel ... -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD
On 06/29/2014 03:10 PM, Patrick Finnegan wrote: And it also runs on the 11/780 which can have multiple CPUs... but I've never seen support for using more than one CPU (and the NetBSD page still says NetBSD/vax can only make use of one CPU on multi-CPU machines). If that has changed, I'd love to hear about it. Support for my VAX 6000 would also be nice... It changed well over a decade ago, if memory serves. The specific work was done on a VAX-8300 or -8350. I'm pretty sure the 11/780's specific flavor of SMP is not supported. (though I do have a pair of 11/785s here...wanna come hack? ;)) -Dave -- Dave McGuire, AK4HZ/3 New Kensington, PA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD
On 06/29/2014 10:24 AM, Tom Lane wrote: Is there anyone in the NetBSD/VAX community who would be willing to host a PG buildfarm member? I could put together a simh-based machine (i.e., fast) on a vm, if nobody else has stepped up for this. No other volunteers have emerged, so if you'd do that it'd be great. Ok, I am certainly willing to do it. Though I haven't used PostgreSQL in quite awhile, I ran it A LOT back when its query language was PostQUEL, and later when it was known as Postgres95. It'd give me a serious warm fuzzy to be able to support the project in some way. Dave McGuire, AK4HZ/3 New Kensington, PA Hey, right up the river from here! Come on up and hack! There's always something neat going on around here. Ever run a PDP-11? B-) -Dave -- Dave McGuire, AK4HZ/3 New Kensington, PA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD
On Sun, Jun 29, 2014 at 3:12 PM, Dave McGuire mcgu...@neurotica.com wrote: On 06/29/2014 03:10 PM, Patrick Finnegan wrote: And it also runs on the 11/780 which can have multiple CPUs... but I've never seen support for using more than one CPU (and the NetBSD page still says NetBSD/vax can only make use of one CPU on multi-CPU machines). If that has changed, I'd love to hear about it. Support for my VAX 6000 would also be nice... It changed well over a decade ago, if memory serves. The specific work was done on a VAX-8300 or -8350. I'm pretty sure the 11/780's specific flavor of SMP is not supported. (though I do have a pair of 11/785s here...wanna come hack? ;)) If it works, someone should update the documentation. :) Which flavor of 11/78x MP? The official DEC kind (which is really just two computers with a block of shared memory) or the Purdue kind (which isn't quite SMP, but actually shares the system bus)? Pat
Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD
Dave McGuire skrev 2014-06-29 21:01: On 06/29/2014 02:58 PM, Patrick Finnegan wrote: Last I checked, NetBSD doesn't support any sort of multiprocessor VAX. Multiprocessor VAXes exist, but you're stuck with either Ultrix or VMS on them. Hi Pat, it's good to see your name in my inbox. NetBSD ran on multiprocessor BI-bus VAXen many, many years ago. Is that support broken? I made it run on 8300 once, in the early days of NetBSD MP. I planned to make it run on both 8800 and 6300, but due to lack of docs neither of those came true. So, unless someone has a 8300 to test on (just over 1 VUPS per CPU, not much), current state is unknown. It worked last time I tested :-) -- Ragge -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD
On Sun, Jun 29, 2014 at 3:01 PM, Dave McGuire mcgu...@neurotica.com wrote: On 06/29/2014 02:58 PM, Patrick Finnegan wrote: Last I checked, NetBSD doesn't support any sort of multiprocessor VAX. Multiprocessor VAXes exist, but you're stuck with either Ultrix or VMS on them. Hi Pat, it's good to see your name in my inbox. Hi! :) NetBSD ran on multiprocessor BI-bus VAXen many, many years ago. Is that support broken? And it also runs on the 11/780 which can have multiple CPUs... but I've never seen support for using more than one CPU (and the NetBSD page still says NetBSD/vax can only make use of one CPU on multi-CPU machines). If that has changed, I'd love to hear about it. Support for my VAX 6000 would also be nice... . Pat
Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD
Last I checked, NetBSD doesn't support any sort of multiprocessor VAX. Multiprocessor VAXes exist, but you're stuck with either Ultrix or VMS on them. Pat On Sun, Jun 29, 2014 at 2:06 PM, Tom Lane t...@sss.pgh.pa.us wrote: Dave McGuire mcgu...@neurotica.com writes: On 06/29/2014 10:54 AM, Andres Freund wrote: Maybe I'm just not playful enough, but keeping a platform alive so we can run postgres in simulator seems a bit, well, pointless. On the in a simulator matter: It's important to keep in mind that there are more VAXen out there than just simulated ones. I'm offering up a simulated one here because I can spin it up in a dedicated VM on a VMware host that's already running and I already have power budget for. I could just as easily run it on real hardware...there are, at last count, close to forty real-iron VAXen here, but only a few of those are running 24/7. I'd happily bring up another one to do Postgres builds and testing, if someone will send me the bucks to pay for the additional power and cooling. (that is a real offer) Well, the issue from our point of view is that a lot of what we care about testing is extremely low-level hardware behavior, like whether spinlocks work as expected across processors. It's not clear that a simulator would provide a sufficiently accurate emulation. OTOH, the really nasty issues like cache coherency rules don't arise in single-processor systems. So unless you have a multiprocessor VAX available to spin up, a simulator may tell us as much as we'd learn anyway. (If you have got one, maybe some cash could be found --- we do have project funds available, and I think they'd be well spent on testing purposes. I don't make those decisions though.) regards, tom lane
Re: [HACKERS] Cluster name in ps output
Hi, On 2014-06-28 15:08:45 +0200, Andres Freund wrote: Otherwise it looks good to me. So, I'd looked at it with an eye towards committing it and found some more things. I've now * added the restriction that the cluster name can only be ASCII. It's shown from several clusters with differing encodings, and it's shown in process titles, so that seems wise. * moved everything to the LOGGING_WHAT category * added minimal documention to monitoring.sgml * expanded the config.sgml entry to mention the restriction on the name. * Changed the GUCs short description I'll leave the patch sit a while before actually committing it. I also think, but haven't done so, we should add a double colon after the cluster name, so it's not: postgres: server1 stats collector process but postgres: server1: stats collector process Comments? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From cdef775befc2a770e3ca29c434ae42666375358b Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Sun, 29 Jun 2014 11:40:53 +0200 Subject: [PATCH] Add cluster_name GUC which will be included in process titles if set. When running several postgres clusters on one OS instance it's often inconveniently hard to identify which postgres process belongs to which instance. Add the cluster_name GUC which will be included as part of the process titles if set. With that processes can more easily identified using tools like 'ps'. Thomas Munro, with some adjustments by me and review by a host of people. --- doc/src/sgml/config.sgml | 23 ++ doc/src/sgml/monitoring.sgml | 16 +++ src/backend/utils/misc/guc.c | 28 +++ src/backend/utils/misc/postgresql.conf.sample | 3 ++- src/backend/utils/misc/ps_status.c| 22 +++-- src/include/utils/guc.h | 1 + 6 files changed, 86 insertions(+), 7 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index e3d1c62..11d552e 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4131,6 +4131,29 @@ local0.*/var/log/postgresql /listitem /varlistentry + varlistentry id=guc-cluster-name xreflabel=cluster_name + termvarnamecluster_name/varname (typestring/type)/term + indexterm + primaryvarnamecluster_name/ configuration parameter/primary + /indexterm + listitem + para +Sets the cluster name that appears in the process title for all +processes in this cluster. The name can be any string of less than +symbolNAMEDATALEN/ characters (64 characters in a standard +build). Only printable ASCII characters may be used in the +varnameapplication_name/varname value. Other characters will be +replaced with question marks (literal?/literal). No name is shown +if this parameter is set to the empty string literal''/ (which is +the default). This parameter can only be set at server start. + /para + para +The process title is typically viewed using programs like +applicationps/ or, on Windows, applicationProcess Explorer/. + /para + /listitem + /varlistentry + varlistentry termvarnamedebug_print_parse/varname (typeboolean/type) indexterm diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 69d99e7..88f22a1 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -92,6 +92,22 @@ postgres: replaceableuser/ replaceabledatabase/ replaceablehost/ re /para para + If xref linkend=guc-cluster-name has been configured the + cluster name will also be show in commandps/ output: +screen +$ psql -c 'SHOW cluster_name' + cluster_name +-- + server1 +(1 row) + +$ ps aux|grep server1 +postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: server1 logger process +... +/screen + /para + + para If you have turned off xref linkend=guc-update-process-title then the activity indicator is not updated; the process title is set only once when a new process is launched. On some platforms this saves a measurable diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 6902c23..3a31a75 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -198,6 +198,7 @@ static void assign_effective_io_concurrency(int newval, void *extra); static void assign_pgstat_temp_directory(const char *newval, void *extra); static bool check_application_name(char **newval, void **extra, GucSource source); static void assign_application_name(const char *newval, void *extra); +static bool check_cluster_name(char **newval, void **extra, GucSource source); static const char *show_unix_socket_permissions(void); static const char
Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD
Well, the issue from our point of view is that a lot of what we care about testing is extremely low-level hardware behavior, like whether spinlocks work as expected across processors. It's not clear that a simulator would provide a sufficiently accurate emulation. OTOH, the really nasty issues like cache coherency rules don't arise in single-processor systems. So unless you have a multiprocessor VAX available to spin up, a simulator may tell us as much as we'd learn anyway. (If you have got one, maybe some cash could be found --- we do have project funds available, and I think they'd be well spent on testing purposes. I don't make those decisions though.) Depending on how often you'd like the system to try to run a compile, I'd be happy to run it on a VAXstation 4000/60. It runs bulk package builds for pkgsrc, but we could do a compile every week or so (every day would really eat into cycles for other packages). John -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] better atomics - v0.5
On Mon, Jun 30, 2014 at 07:44:47PM +0200, Andres Freund wrote: On 2014-06-30 13:15:23 -0400, Robert Haas wrote: I'm personally not convinced that we're approaching this topic in the right way. I'm not convinced that it's at all reasonable to try to emulate atomics on platforms that don't have them. I would punt the problem into the next layer and force things like lwlock.c to have fallback implementations at that level that don't require atomics, and remove those fallback implementations if and when we move the goalposts so that all supported platforms must have working atomics implementations. If we're requiring a atomics-less implementation for lwlock.c, bufmgr. et al. I'll stop working on those features (and by extension on atomics). It's an utter waste of resources and maintainability trying to maintain parallel high level locking algorithms in several pieces of the codebase. The nonatomic versions will stagnate and won't actually work under load. I'd rather spend my time on something useful. The spinlock based atomics emulation is *small*. It's easy to reason about correctness there. If we're going that way, we've made a couple of absolute fringe platforms hold postgres, as actually used in reality, hostage. I think that's insane. I agree completely. If this patch goes in the way it's written right now, then I think it basically amounts to throwing platforms that don't have working atomics emulation under the bus , and my vote is that if we're going to do that, we should do it explicitly: sorry, we now only support platforms with working atomics. You don't have any, so you can't run PostgreSQL. Have a nice day. I might share that view if a platform faced a huge and easily-seen performance regression, say a 40% regression to 5-client performance. I don't expect the shortfall to reach that ballpark for any atomics-exploiting algorithm we'll adopt, and your (Andres's) latest measurements corroborate that. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Allow empty targets in unaccent dictionary
Abhijit Menon-Sen a...@2ndquadrant.com writes: At 2014-06-30 15:19:17 -0400, t...@sss.pgh.pa.us wrote: It's not unlikely that we want this patch *and* an improvement that allows multi-character src strings I think it's enough to apply just this patch, but I wouldn't object to doing both if it were easy. It's not clear to me if that's true after a quick glance at the code, but I'll look again when I'm properly awake. I went ahead and committed this patch, and also some further work to fix the multicharacter-source problem. I took it on myself to make the code issue warnings about misformatted lines, too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C
On Mon, Jun 30, 2014 at 09:16:45PM +, Christian Ullrich wrote: * From: Noah Misch [mailto:n...@leadboat.com] Yes; the consequence of ignoring ^C is that the test postmaster would persist indefinitely after the ^C. The system under test doesn't care per No it won't, please see below. I think I know where we're talking past each other. You may be assuming that kill(postmaster, SIGINT) would be affected by my patch. It would not. PostgreSQL's signal emulation on Windows works completely separately from the actual Windows analogy to signals in console processes. My patch drops these analogous events, but the emulated signals still work the same way as before. I wasn't assuming otherwise. When Ctrl-C is pressed in a console window, pg_console_handler() in backend/port/win32/signal.c is called with a CTRL_C_EVENT, and converts that to an emulated SIGINT (unless I tell it not to). That emulated SIGINT is then processed as usual. pg_ctl -m fast stop sends the emulated SIGINT directly. The main point of this patch is to have pg_console_handler() in the postmaster ignore ^C when pg_ctl start had started this postmaster. Right? Anyway, I just ran upgradecheck, and it reported success without leaving any stale postmasters around. By the time the test reports success, it has already shutdown the postmaster cleanly. I'm focusing on the case where the user decides halfway through the test run to cancel it. Currently, ^C will reach both the test driver and the postmaster, and everything will shut down; no pg_ctl stop is involved. What happens if you issue vcregress upgradecheck and strike ^C between the Setting up data for upgrading and Dumping old cluster messages? -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
(2014/06/30 20:17), Ashutosh Bapat wrote: On Mon, Jun 30, 2014 at 4:17 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote: (2014/06/30 17:47), Ashutosh Bapat wrote: BTW, why aren't you using the tlist passed to this function? I guess create_scan_plan() passes tlist after processing it, so that should be used rather than rel-reltargetlist. I think that that would be maybe OK, but I think that it would not be efficient to use the list to compute attrs_used, because the tlist would have more information than rel-reltargetlist in cases where the tlist is build through build_physical_tlist(). In that case, we can call build_relation_tlist() for foreign tables. Do you mean build_physical_tlist()? Yeah, we can call build_physical_tlist() (and do that in some cases), but if we call the function, it would generate a tlist that contains all Vars in the relation, not only those Vars actually needed by the query (ie, Vars in reltargetlist), and thus it would take more cycles to compute attr_used from the tlist than from reltargetlist. That' what I wanted to say. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
(2014/06/30 22:48), Tom Lane wrote: Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes: Done. I think this is because create_foreignscan_plan() makes reference to attr_needed, which isn't computed for inheritance children. I wonder whether it isn't time to change that. It was coded like that originally only because calculating the values would've been a waste of cycles at the time. But this is at least the third place where it'd be useful to have attr_needed for child rels. +1 for calculating attr_needed for child rels. (I was wondering too.) I'll create a separate patch for it. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers