Re: [HACKERS] delta relations in AFTER triggers
On 7 August 2014 19:49, Kevin Grittner kgri...@ymail.com wrote: Amit Khandekar amit.khande...@enterprisedb.com wrote: On 21 June 2014 23:36, Kevin Grittner kgri...@ymail.com wrote: Kevin Grittner kgri...@ymail.com wrote: I didn't change the tuplestores to TID because it seemed to me that it would preclude using transition relations with FDW triggers, and it seemed bad not to support that. Does anyone see a way around that, or feel that it's OK to not support FDW triggers in this regard? I think it is ok to use tuplestores for now, but as mentioned by you somewhere else in the thread, later on we should change this to using tids as an optimization. Well, the optimization would probably be to use a tuplestore of tids referencing modified tuples in the base table, rather than a tuplestore of the data itself. But I think we're in agreement. Right, that's what I meant. I see that the tupelstores for transition tables are stored per query depth. If the DML involves a table that has multiple child tables, it seems as though a single tuplestore would be shared among all these tables. That means if we define such triggers using transition table clause for all the child tables, then the trigger function for a child table will see tuples from other child tables as well. Is that true ? I don't think so. I will make a note of the concern to confirm by testing. Thanks. I will wait for this. I tried to google some SQLs that use REFERENCING clause with triggers. It looks like in some database systems, even the WHEN clause of CREATE TRIGGER can refer to a transition table, just like how it refers to NEW and OLD row variables. For e.g. : CREATE TRIGGER notify_dept AFTER UPDATE ON weather REFERENCING NEW_TABLE AS N_TABLE NEW AS N_ROW FOR EACH ROW WHEN ((SELECT AVG (temperature) FROM N_TABLE) 10) BEGIN notify_department(N_ROW.temperature, N_ROW.city); END Above, it is used to get an aggregate value of all the changed rows. I think we do not currently support aggregate expressions in the where clause, but with transition tables, it makes more sense to support it later if not now. Interesting point; I had not thought about that. Will see if I can include support for that in the patch for the next CF; failing that; I will at least be careful to not paint myself into a corner where it is unduly hard to do later. We currently do the WHEN checks while saving the AFTER trigger events, and also add the tuples one by one while saving the trigger events. If and when we support WHEN, we would need to make all of these tuples saved *before* the first WHEN clause execution, and that seems to demand more changes in the current trigger code. More comments below : --- Are we later going to extend this support for constraint triggers as well ? I think these variables would make sense even for deferred constraint triggers. I think we would need some more changes if we want to support this, because there is no query depth while executing deferred triggers and so the tuplestores might be inaccessible with the current design. --- The following (and similarly other) statements : trigdesc-trig_insert_new_table |= (TRIGGER_FOR_INSERT(tgtype) TRIGGER_USES_TRANSITION_TABLE(trigger-tgnewtable)) ? true : false; can be simplfied to : trigdesc-trig_insert_new_table |= (TRIGGER_FOR_INSERT(tgtype) TRIGGER_USES_TRANSITION_TABLE(trigger-tgnewtable)); - AfterTriggerExecute() { . /* * Set up the tuplestore information. */ if (trigdesc-trig_delete_old_table || trigdesc-trig_update_old_table) LocTriggerData.tg_olddelta = GetCurrentTriggerDeltaTuplestore(afterTriggers-old_tuplestores); . } Above, trigdesc-trig_update_old_table is true if at least one of the triggers of the table has transition tables. So this will cause the delta table to be available on all of the triggers of the table even if only one out of them uses transition tables. May be this would be solved by using LocTriggerData.tg_trigger-tg_olddelta rather than trigdesc-trig_update_old_table to decide whether LocTriggerData.tg_olddelta should be assigned. --- GetCurrentTriggerDeltaTuplestore() is now used for getting fdw tuplestore also, so it should have a more general name. --- #define TRIGGER_USES_TRANSITION_TABLE(namepointer) \ ((namepointer) != (char *) NULL (*(namepointer)) != '\0') Since all other code sections assume trigger-tgoldtable to be non-NULL, we can skip the NULL check above. --- We should add a check to make sure the user does not supply same name for OLD TABLE and NEW TABLE. --- The below code comment needs to be changed. * Only tables are initially supported, and only for AFTER EACH STATEMENT * triggers, but other permutations are accepted by the parser so we can give * a meaningful message from C code. The comment implies that with ROW triggers we do not support TABLE transition
[HACKERS] Incorrect log message and checks in pgrecvlogical
Hi, While looking at pg_recvlogical code, I noticed that a couple of error messages are incorrect when invoking some combinations of --create, --start or --drop. For example, here --init should be --create, --stop should be --drop: $ pg_recvlogical --create --drop --slot foo pg_recvlogical: cannot use --init or --start together with --stop Try pg_recvlogical --help for more information. Also, when checking the combination startpos (create || drop), I think that we should check that startpos is InvalidXLogRecPtr. The patch attached fixes all those things. It should be back-patched to 9.4. Regards, -- Michael diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c index dbc002d..f3b0f34 100644 --- a/src/bin/pg_basebackup/pg_recvlogical.c +++ b/src/bin/pg_basebackup/pg_recvlogical.c @@ -814,15 +814,15 @@ main(int argc, char **argv) if (do_drop_slot (do_create_slot || do_start_slot)) { - fprintf(stderr, _(%s: cannot use --init or --start together with --stop\n), progname); + fprintf(stderr, _(%s: cannot use --create or --start together with --drop\n), progname); fprintf(stderr, _(Try \%s --help\ for more information.\n), progname); exit(1); } - if (startpos (do_create_slot || do_drop_slot)) + if (startpos != InvalidXLogRecPtr (do_create_slot || do_drop_slot)) { - fprintf(stderr, _(%s: cannot use --init or --stop together with --startpos\n), progname); + fprintf(stderr, _(%s: cannot use --create or --drop together with --startpos\n), progname); fprintf(stderr, _(Try \%s --help\ for more information.\n), progname); exit(1); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog add synchronous mode
On Fri, Aug 8, 2014 at 5:10 PM, Fujii Masao masao.fu...@gmail.com wrote: Thanks for the review! Applied the patch. I noticed that the tab padding for the new option -F in the getops switch is incorrect. Attached patch corrects that. pgindent would have caught that anyway, but it doesn't hurt to be correct now. Thanks, -- Michael diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c index 0b7af54..4483c87 100644 --- a/src/bin/pg_basebackup/pg_receivexlog.c +++ b/src/bin/pg_basebackup/pg_receivexlog.c @@ -441,15 +441,15 @@ main(int argc, char **argv) case 'n': noloop = 1; break; - case 'F': - fsync_interval = atoi(optarg) * 1000; - if (fsync_interval -1000) - { -fprintf(stderr, _(%s: invalid fsync interval \%s\\n), - progname, optarg); -exit(1); - } - break; + case 'F': +fsync_interval = atoi(optarg) * 1000; +if (fsync_interval -1000) +{ + fprintf(stderr, _(%s: invalid fsync interval \%s\\n), + progname, optarg); + exit(1); +} +break; case 'v': verbose++; break; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Inconsistent use of --slot/-S in pg_receivexlog and pg_recvlogical
Hi, I noticed that pg_receivexlog is able to use --slot but not -S, even if the code is written this way. Attached is a patch correcting that. This makes pg_receivexlog consistent with pg_recvlogical regarding the slot option. IMHO, this should be backpatched to REL9_4_STABLE. Regards, -- Michael diff --git a/doc/src/sgml/ref/pg_receivexlog.sgml b/doc/src/sgml/ref/pg_receivexlog.sgml index c15776f..5916b8f 100644 --- a/doc/src/sgml/ref/pg_receivexlog.sgml +++ b/doc/src/sgml/ref/pg_receivexlog.sgml @@ -242,6 +242,7 @@ PostgreSQL documentation /varlistentry varlistentry + termoption-S replaceableslotname/replaceable/option/term termoption--slot=replaceable class=parameterslotname/replaceable/option/term listitem para diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c index 0b7af54..a8b9ad3 100644 --- a/src/bin/pg_basebackup/pg_receivexlog.c +++ b/src/bin/pg_basebackup/pg_receivexlog.c @@ -77,7 +77,7 @@ usage(void) printf(_( -U, --username=NAMEconnect as specified database user\n)); printf(_( -w, --no-password never prompt for password\n)); printf(_( -W, --password force password prompt (should happen automatically)\n)); - printf(_( --slot=SLOTNAMEreplication slot to use\n)); + printf(_( -S, --slot=SLOTNAMEreplication slot to use\n)); printf(_(\nReport bugs to pgsql-b...@postgresql.org.\n)); } @@ -394,7 +394,7 @@ main(int argc, char **argv) } } - while ((c = getopt_long(argc, argv, D:d:h:p:U:s:nF:wWv, + while ((c = getopt_long(argc, argv, D:d:h:p:U:s:S:nF:wWv, long_options, option_index)) != -1) { switch (c) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgresql.auto.conf and reload
On Tue, Aug 12, 2014 at 1:23 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Mon, Aug 11, 2014 at 11:08 PM, Fujii Masao masao.fu...@gmail.com wrote: While updating the patch, I found that the ConfigVariable which is removed from list has the fields that the memory has been already allocated but we forgot to free them. So I included the code to free them in the patch. Yes, that is right. ! /* ! * Pick up only the data_directory if DataDir is not set, which ! * means that the configuration file is read for the first time and ! * PG_AUTOCONF_FILENAME file cannot be read yet. In this case, ! * we shouldn't pick any settings except the data_directory ! * from postgresql.conf because they might be overwritten ! * with the settings in PG_AUTOCONF_FILENAME file which will be ! * read later. OTOH, since it's ensured that data_directory doesn't ! * exist in PG_AUTOCONF_FILENAME file, it will never be overwritten ! * later. ! */ ! else It is bit incovinient to read this part of code, some other places in same file use comment inside else construct which seems to be better. one example is as below: Yep, updated that way. else { /* * ordinary variable, append to list. For multiple items of * same parameter, retain only which comes later. */ I have marked this as Ready For Committer. Thanks for the review! Committed. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] minor pgbench doc fix
Attached patch removes spurious brackets from pgbench documentation. -- Fabien.diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml index b7d88f3..1551686 100644 --- a/doc/src/sgml/pgbench.sgml +++ b/doc/src/sgml/pgbench.sgml @@ -748,7 +748,7 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/ varlistentry term - literal\setrandom replaceablevarname/ replaceablemin/ replaceablemax/ [ uniform | [ { gaussian | exponential } replaceablethreshold/ ] ]/literal + literal\setrandom replaceablevarname/ replaceablemin/ replaceablemax/ [ uniform | { gaussian | exponential } replaceablethreshold/ ]/literal /term listitem -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enhancing pgbench parameter checking
Fabien, Find attached a patch which adds these changes to your current version. Thank you for the review and patch. Looks good. I changed the status to Ready for Committer. I will wait for a fewd days and if there's no objection, I will commit the patch. I have committed the patch. Thanks! Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 logical replication - walsender keepalive replies
On 2014-08-11 23:52:32 +0200, Andres Freund wrote: On 2014-08-11 17:22:27 -0400, Steve Singer wrote: On 07/14/2014 01:19 PM, Steve Singer wrote: On 07/06/2014 10:11 AM, Andres Freund wrote: Hi Steve, Right. I thought about this for a while, and I think we should change two things. For one, don't request replies here. It's simply not needed, as this isn't dealing with timeouts. For another don't just check -flush sentPtr but also -write sentPtr. The reason we're sending these feedback messages is to inform the 'logical standby' that there's been WAL activity which it can't see because they don't correspond to anything that's logically decoded (e.g. vacuum stuff). Would that suit your needs? Greetings, Yes I think that will work for me. I tested with the attached patch that I think does what you describe and it seems okay. Any feedback on this? Do we want that change for 9.4, or do we want something else? I plan to test and apply it in the next few days. Digging myself from under stuff from before my holiday right now... Committed. Thanks and sorry for the delay. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] minor pgbench doc fix
On Tue, Aug 12, 2014 at 5:12 PM, Fabien COELHO coe...@cri.ensmp.fr wrote: Attached patch removes spurious brackets from pgbench documentation. Applied. Thanks! Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_receivexlog --status-interval add fsync feedback
Hi all, This patch is to add setting to send status packets after fsync to --status-interval of pg_receivexlog. If -1 is specified to --status-interval, status packets is sent as soon as after fsync. Others are the same as when 0 is specified to --status-interval. When requested by the server, send the status packet. To use replication slot, and specify -1 to --fsync-interval. As a result, simple WAL synchronous replication can be performed. In simple WAL synchronization, it is possible to test the replication. If there was F/O in synchronous replication, it is available in the substitute until standby is restored. Regards, -- Furuya Osamu pg_receivexlog-fsync-feedback-v1.patch Description: pg_receivexlog-fsync-feedback-v1.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog add synchronous mode
On Tue, Aug 12, 2014 at 4:34 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Fri, Aug 8, 2014 at 5:10 PM, Fujii Masao masao.fu...@gmail.com wrote: Thanks for the review! Applied the patch. I noticed that the tab padding for the new option -F in the getops switch is incorrect. Attached patch corrects that. pgindent would have caught that anyway, but it doesn't hurt to be correct now. Yeah, that's a problem. But, as you clearly said, upcoming pgindent would fix this kind of problem. So I'm not sure if it's really worth fixing only this case independently. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW
Hi Fujita-san, Issues addressed by Eitoku-san were fixed properly, but he found a bug and a possible enhancement in the v2 patch. * push-down check misses delete triggers update_is_pushdown_safe() seems to have a bug that it misses the existence of row-level delete trigger. DELETE statement executed against a foreign table which has row-level delete trigger is pushed down to remote, and consequently no row-level delete trigger is fired. * further optimization Is there any chance to consider further optimization by passing the operation type (UPDATE|DELETE) of undergoing statement to update_is_pushdown_safe()? It seems safe to push down UPDATE statement when the target foreign table has no update trigger even it has a delete trigger (of course the opposite combination would be also fine). * Documentation The requirement of pushing down UPDATE/DELETE statements would not be easy to understand for non-expert users, so it seems that there is a room to enhance documentation. An idea is to define which expression is safe to send to remote first (it might need to mention the difference of semantics), and refer the definition from the place describing the requirement of pushing-down for SELECT, UPDATE and DELETE. 2014-08-04 20:30 GMT+09:00 Etsuro Fujita fujita.ets...@lab.ntt.co.jp: (2014/07/30 17:22), Etsuro Fujita wrote: (2014/07/29 0:58), Robert Haas wrote: On Fri, Jul 25, 2014 at 3:39 AM, Albe Laurenz laurenz.a...@wien.gv.at wrote: Shigeru Hanada wrote: * Naming of new behavior You named this optimization Direct Update, but I'm not sure that this is intuitive enough to express this behavior. I would like to hear opinions of native speakers. How about batch foreign update or batch foreign modification? (Disclaimer: I'm not a native speaker either.) I think direct update sounds pretty good. Batch does not sound as good to me, since it doesn't clearly describe what makes this patch special as opposed to some other grouping of updates that happens to produce a speedup. I agree with Robert on that point. Another term that might be used is update pushdown, since we are pushing the whole update to the remote server instead of having the local server participate. Without looking at the patch, I don't have a strong opinion on whether that's better than direct update in this context. Update Pushdown is fine with me. If there are no objections of others, I'll change the name from Direct Update to Update Pushdown. Done. (I've left deparseDirectUpdateSql/deparseDirectDeleteSql as-is, though.) Other changes: * Address the comments from Eitoku-san. * Add regression tests. * Fix a bug, which fails to show the actual row counts in EXPLAIN ANALYZE for UPDATE/DELETE without a RETURNING clause. * Rebase to HEAD. Please find attached an updated version of the patch. Thanks, Best regards, Etsuro Fujita -- Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Incremental Backup
As I already stated, timestamps will be only used to early detect changed files. To declared two files identical they must have same size, same mtime and same *checksum*. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it signature.asc Description: OpenPGP digital signature
Re: [HACKERS] WAL format and API changes (9.5)
On 08/05/2014 03:50 PM, Michael Paquier wrote: - When testing pg_xlogdump I found an assertion failure for record XLOG_GIN_INSERT: Assertion failed: (((bool) (((const void*)(insertData-newitem.key) != ((void*)0)) ((insertData-newitem.key)-ip_posid != 0, function gin_desc, file gindesc.c, line 127. I could not reproduce this. Do you have a test case? - Would it be a win to add an assertion check for (CritSectionCount == 0) in XLogEnsureRecordSpace? Maybe; added. - There is no mention of REGBUF_NO_IMAGE in transam/README. Fixed - This patch adds a lot of blocks like that in the redo code: + if (BufferIsValid(buffer)) + UnlockReleaseBuffer(buffer); What do you think about adding a new macro like UnlockBufferIfValid(buffer)? I don't think such a macro would be an improvement in readability, TBH. - Don't we need a call to XLogBeginInsert() in XLogSaveBufferForHint(): + int flags = REGBUF_FORCE_IMAGE; + if (buffer_std) + flags |= REGBUF_STANDARD; + XLogRegisterBuffer(0, buffer, flags); [...] - recptr = XLogInsert(RM_XLOG_ID, XLOG_FPI, rdata); + recptr = XLogInsert(RM_XLOG_ID, XLOG_FPI); Indeed, fixed. With that, initdb -k works, I had not tested the patch with page checksums before. - There is still a TODO item in ValidXLogRecordHeader to check if block data header is valid or not. Just mentioning. Removed. Previously, we ValidXLogRecordHeader checked that the xl_tot_len of a record doesn't exceed the size of header + xl_len + (space needed for max number of bkp blocks). But the new record format has no limit on the amount of data registered with a buffer, so such a test is not possible anymore. I added the TODO item there to remind me that I need to check if we need to do something else there instead, but I think it's fine as it is. We still sanity-check the block data in ValidXLogRecord; the ValidXLogRecordHeader() check happens before we have read the whole record header in memory. - XLogRecGetBlockRefIds needing an already-allocated array for *out is not user-friendly. Cannot we just do all the work inside this function? I agree it's not a nice API. Returning a palloc'd array would be nicer, but this needs to work outside the backend too (e.g. pg_xlogdump). It could return a malloc'd array in frontend code, but it's not as nice. On the whole, do you think that be better than the current approach? - All the functions in xlogconstruct.c could be defined in a separate header xlogconstruct.h. What they do is rather independent from xlog.h. The same applies to all the structures and functions of xlogconstruct.c in xlog_internal.h: XLogRecordAssemble, XLogRecordAssemble, InitXLogRecordConstruct. (worth noticing as well that the only reason XLogRecData is defined externally of xlogconstruct.c is that it is being used in XLogInsert()). Hmm. I left the defines for xlogconstruct.c functions that are used to build a WAL record in xlog.h, so that it's not necessary to include both xlog.h and xlogconstruct.h in all files that write a WAL record (XLogInsert() is defined in xlog.h). But perhaps it would be better to move the prototype for XLogInsert to xlogconstruct.h too? That might be a better division; everything needed to insert a WAL record would be in xlogconstruct.h, and xlog.h would left for more internal functions related to WAL. And perhaps rename the files to xloginsert.c and xloginsert.h. Here's a new patch with those little things fixed. - Heikki wal-format-and-api-changes-6.patch.gz Description: application/gzip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #9652: inet types don't support min/max
On Mon, Aug 4, 2014 at 3:22 PM, Asif Naeem anaeem...@gmail.com wrote: Sorry for not being clear, above mentioned test is related to following doc (sgml) changes that seems not working as described i.e. Table 9-35. cidr and inet Functions FunctionReturn TypeDescriptionExampleResult max(inet, inet) inetlarger of two inet typesmax('192.168.1.5', '192.168.1.4')192.168.1.5 min(inet, inet) inetsmaller of two inet typesmin('192.168.1.5', '192.168.1.4')192.168.1.4 Can you please elaborate these sgml change ? I thought of adding them for newly added network functions but mistakenly I kept the names as max and min. Anyway with your suggestion in the earlier mail, these changes are not required. I removed these changes in the attached patch. Thanks for reviewing the patch. Regards, Hari Babu Fujitsu Australia inet_agg_v7.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback
On Tue, Aug 12, 2014 at 6:19 PM, furu...@pm.nttdata.co.jp wrote: Hi all, This patch is to add setting to send status packets after fsync to --status-interval of pg_receivexlog. If -1 is specified to --status-interval, status packets is sent as soon as after fsync. I don't think that it's good idea to control that behavior by using --status-interval. I'm sure that there are some users who both want that behavior and want set the maximum interval between a feedback is sent back to the server because these settings are available in walreceiver. But your version of --status-interval doesn't allow those settings at all. That is, if set to -1, the maximum interval between each feedback cannot be set. OTOH, if set to the positive number, feedback-as-soon-as-fsync behavior cannot be enabled. Therefore, I'm thinking that it's better to introduce new separate option for that behavior. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inconsistent use of --slot/-S in pg_receivexlog and pg_recvlogical
On Tue, Aug 12, 2014 at 4:50 PM, Michael Paquier michael.paqu...@gmail.com wrote: Hi, I noticed that pg_receivexlog is able to use --slot but not -S, even if the code is written this way. Attached is a patch correcting that. This makes pg_receivexlog consistent with pg_recvlogical regarding the slot option. IMHO, this should be backpatched to REL9_4_STABLE. Regards, Looks good to me. And, I could not find the reason why only -S was not added, in the past discussion. Anyway, barring any objection, I will commit this. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL regression test suite
On 08/05/2014 10:46 PM, Robert Haas wrote: On Mon, Aug 4, 2014 at 10:38 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Now that we use TAP for testing client tools, I think we can use that to test various SSL options too. I came up with the attached. Comments? It currently assumes that the client's and the server's hostnames are postgres-client.test and postgres-server.test, respectively. That makes it a bit tricky to run on a single systme. The README includes instructions; basically you need to set up an additional loopback device, and add entries to /etc/hosts for that. That seems so onerous that I think few people will do it, and not regularly, resulting in the tests breaking and nobody noticing. Reconfiguring the loopback interface like that requires root privilege, and won't survive a reboot, and doing it in the system configuration will require hackery specific to the particular flavor of Linux you're running, and probably other hackery on non-Linux systems (never mind Windows). Yeah, you're probably right. Why can't you make it work over 127.0.0.1? I wanted it to be easy to run the client and the server on different hosts. As soon as we have more than one SSL implementation, it would be really nice to do interoperability testing between a client and a server using different implementations. Also, to test sslmode=verify-full, where the client checks that the server certificate's hostname matches the hostname that it connected to, you need to have two aliases for the same server, one that matches the certificate and one that doesn't. But I think I found a way around that part; if the certificate is set up for localhost, and connect to 127.0.0.1, you get a mismatch. So, I got rid of the DNS setup, it only depends localhost/127.0.0.1 now. Patch attached. That means that it's not easy to run the client and the server on different hosts, but we can improve that later. - Heikki commit 140c590ca86a0ba4a6b422e4b618cd459b84175f Author: Heikki Linnakangas heikki.linnakan...@iki.fi Date: Wed Aug 6 18:43:39 2014 +0300 Refactor cert file stuff in client diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c index f950fc3..cee7b2e 100644 --- a/src/interfaces/libpq/fe-secure-openssl.c +++ b/src/interfaces/libpq/fe-secure-openssl.c @@ -780,57 +780,21 @@ destroy_ssl_system(void) static int initialize_SSL(PGconn *conn) { - struct stat buf; - char homedir[MAXPGPATH]; - char fnbuf[MAXPGPATH]; - char sebuf[256]; - bool have_homedir; - bool have_cert; EVP_PKEY *pkey = NULL; - - /* - * We'll need the home directory if any of the relevant parameters are - * defaulted. If pqGetHomeDirectory fails, act as though none of the - * files could be found. - */ - if (!(conn-sslcert strlen(conn-sslcert) 0) || - !(conn-sslkey strlen(conn-sslkey) 0) || - !(conn-sslrootcert strlen(conn-sslrootcert) 0) || - !(conn-sslcrl strlen(conn-sslcrl) 0)) - have_homedir = pqGetHomeDirectory(homedir, sizeof(homedir)); - else /* won't need it */ - have_homedir = false; - - /* Read the client certificate file */ - if (conn-sslcert strlen(conn-sslcert) 0) - strncpy(fnbuf, conn-sslcert, sizeof(fnbuf)); - else if (have_homedir) - snprintf(fnbuf, sizeof(fnbuf), %s/%s, homedir, USER_CERT_FILE); - else - fnbuf[0] = '\0'; - - if (fnbuf[0] == '\0') - { - /* no home directory, proceed without a client cert */ - have_cert = false; - } - else if (stat(fnbuf, buf) != 0) - { - /* - * If file is not present, just go on without a client cert; server - * might or might not accept the connection. Any other error, - * however, is grounds for complaint. - */ - if (errno != ENOENT errno != ENOTDIR) - { - printfPQExpBuffer(conn-errorMessage, - libpq_gettext(could not open certificate file \%s\: %s\n), - fnbuf, pqStrerror(errno, sebuf, sizeof(sebuf))); - return -1; - } - have_cert = false; - } - else + char *sslcertfile = NULL; + char *engine = NULL; + char *keyname = NULL; + char *sslkeyfile = NULL; + char *sslrootcert = NULL; + char *sslcrl = NULL; + int ret = -1; + + if (!pqsecure_get_ssl_files(conn, +sslcertfile, sslkeyfile, engine, keyname, +sslrootcert, sslcrl)) + return PGRES_POLLING_READING; + + if (sslcertfile) { /* * Cert file exists, so load it. Since OpenSSL doesn't provide the @@ -855,216 +819,146 @@ initialize_SSL(PGconn *conn) { printfPQExpBuffer(conn-errorMessage, libpq_gettext(could not acquire mutex: %s\n), strerror(rc)); - return -1; + goto fail; } #endif - if (SSL_CTX_use_certificate_chain_file(SSL_context, fnbuf) != 1) + if (SSL_CTX_use_certificate_chain_file(SSL_context, sslcertfile) != 1) { char *err = SSLerrmessage(); printfPQExpBuffer(conn-errorMessage, libpq_gettext(could not read certificate file \%s\: %s\n), - fnbuf, err); + sslcertfile, err); SSLerrfree(err); #ifdef
Re: [HACKERS] Incorrect log message and checks in pgrecvlogical
On 2014-08-12 16:28:56 +0900, Michael Paquier wrote: Hi, While looking at pg_recvlogical code, I noticed that a couple of error messages are incorrect when invoking some combinations of --create, --start or --drop. For example, here --init should be --create, --stop should be --drop: $ pg_recvlogical --create --drop --slot foo pg_recvlogical: cannot use --init or --start together with --stop Try pg_recvlogical --help for more information. Also, when checking the combination startpos (create || drop), I think that we should check that startpos is InvalidXLogRecPtr. The patch attached fixes all those things. It should be back-patched to 9.4. Thanks, applied. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/PgSQL: RAISE and the number of parameters
Hello Marko, Here's a patch for making PL/PgSQL throw an error during compilation (instead of runtime) if the number of parameters passed to RAISE don't match the number of placeholders in error message. I'm sure people can see the pros of doing it this way. Patch scanned, applied tested without problem on head. The compile-time raise parameter checking is a good move. 3 minor points: - I would suggest to avoid continue within a loop so that the code is simpler to understand, at least for me. - I would suggest to update the documentation accordingly. - The execution code now contains error detections which should never be raised, but I suggest to keep it in place anyway. However I would suggest to add comments about the fact that it should not be triggered. See the attached patch which implements these suggestions on top of your patch. -- Fabien.diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml index 91fadb0..a3c763a 100644 --- a/doc/src/sgml/plpgsql.sgml +++ b/doc/src/sgml/plpgsql.sgml @@ -3403,6 +3403,8 @@ RAISE ; Inside the format string, literal%/literal is replaced by the string representation of the next optional argument's value. Write literal%%/literal to emit a literal literal%/literal. +The number of optional arguments must match the number of literal%/ +placeholders in the format string, otherwise a compilation error is reported. /para para diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c index 69d1965..a3cd6fc 100644 --- a/src/pl/plpgsql/src/pl_exec.c +++ b/src/pl/plpgsql/src/pl_exec.c @@ -2939,6 +2939,10 @@ exec_stmt_raise(PLpgSQL_execstate *estate, PLpgSQL_stmt_raise *stmt) continue; } +/* + * Too few arguments to raise: + * This should never happen as a compile-time check is performed. + */ if (current_param == NULL) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), @@ -2965,7 +2969,8 @@ exec_stmt_raise(PLpgSQL_execstate *estate, PLpgSQL_stmt_raise *stmt) /* * If more parameters were specified than were required to process the - * format string, throw an error + * format string, throw an error. + * This should never happen as a compile-time check is performed. */ if (current_param != NULL) ereport(ERROR, diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y index ba928e3..03976e4 100644 --- a/src/pl/plpgsql/src/pl_gram.y +++ b/src/pl/plpgsql/src/pl_gram.y @@ -3785,15 +3785,12 @@ check_raise_parameters(PLpgSQL_stmt_raise *stmt) for (cp = stmt-message; *cp; cp++) { - if (cp[0] != '%') - continue; - /* literal %, ignore */ - if (cp[1] == '%') - { - cp++; - continue; + if (cp[0] == '%') { + if (cp[1] == '%') /* literal %, ignore */ +cp++; + else +expected_nparams++; } - expected_nparams++; } if (expected_nparams list_length(stmt-params)) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/PgSQL: RAISE and the number of parameters
Hi Fabien, On 8/12/14 1:09 PM, Fabien COELHO wrote: Here's a patch for making PL/PgSQL throw an error during compilation (instead of runtime) if the number of parameters passed to RAISE don't match the number of placeholders in error message. I'm sure people can see the pros of doing it this way. Patch scanned, applied tested without problem on head. Thanks! The compile-time raise parameter checking is a good move. 3 minor points: - I would suggest to avoid continue within a loop so that the code is simpler to understand, at least for me. I personally find the code easier to read with the continue. - I would suggest to update the documentation accordingly. I scanned the docs trying to find any mentions of the run-time error but couldn't find any. I don't object to adding this, though. - The execution code now contains error detections which should never be raised, but I suggest to keep it in place anyway. However I would suggest to add comments about the fact that it should not be triggered. Good point. I actually realized I hadn't sent the latest version of the patch, sorry about that. I did this: https://github.com/johto/postgres/commit/1acab6fc5387c893ca29fed9284e09258e0c5c56 to also turn the ereport()s into elog()s since the user should never see them. See the attached patch which implements these suggestions on top of your patch. Thanks for reviewing! I'll incorporate the doc changes, but I'm going to let others decide on the logic in the check_raise_parameters() loop before doing any changes. Will send an updated patch shortly. .marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL regression test suite
On 2014-08-12 14:01:18 +0300, Heikki Linnakangas wrote: On 08/05/2014 10:46 PM, Robert Haas wrote: Why can't you make it work over 127.0.0.1? I wanted it to be easy to run the client and the server on different hosts. As soon as we have more than one SSL implementation, it would be really nice to do interoperability testing between a client and a server using different implementations. Also, to test sslmode=verify-full, where the client checks that the server certificate's hostname matches the hostname that it connected to, you need to have two aliases for the same server, one that matches the certificate and one that doesn't. But I think I found a way around that part; if the certificate is set up for localhost, and connect to 127.0.0.1, you get a mismatch. Alternatively, and to e.g. test wildcard certs and such, I think you can specify both host and hostaddr to connect to connect without actually doing a dns lookup. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb format is pessimal for toast compression
On Fri, Aug 8, 2014 at 10:50 PM, Hannu Krosing ha...@2ndquadrant.com wrote: How hard and how expensive would it be to teach pg_lzcompress to apply a delta filter on suitable data ? So that instead of integers their deltas will be fed to the real compressor Has anyone given this more thought? I know this might not be 9.4 material, but to me it sounds like the most promising approach, if it's workable. This isn't a made up thing, the 7z and LZMA formats also have an optional delta filter. Of course with JSONB the problem is figuring out which parts to apply the delta filter to, and which parts not. This would also help with integer arrays, containing for example foreign key values to a serial column. There's bound to be some redundancy, as nearby serial values are likely to end up close together. In one of my past projects we used to store large arrays of integer fkeys, deliberately sorted for duplicate elimination. For an ideal case comparison, intar2 could be as large as intar1 when compressed with a 4-byte wide delta filter: create table intar1 as select array(select 1::int from generate_series(1,100)) a; create table intar2 as select array(select generate_series(1,100)::int) a; In PostgreSQL 9.3 the sizes are: select pg_column_size(a) from intar1; 45810 select pg_column_size(a) from intar2; 420 So a factor of 87 difference. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL regression test suite
On 08/12/2014 02:28 PM, Andres Freund wrote: On 2014-08-12 14:01:18 +0300, Heikki Linnakangas wrote: Also, to test sslmode=verify-full, where the client checks that the server certificate's hostname matches the hostname that it connected to, you need to have two aliases for the same server, one that matches the certificate and one that doesn't. But I think I found a way around that part; if the certificate is set up for localhost, and connect to 127.0.0.1, you get a mismatch. Alternatively, and to e.g. test wildcard certs and such, I think you can specify both host and hostaddr to connect to connect without actually doing a dns lookup. Oh, I didn't know that's possible! Yeah, that's a good solution. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5: Memory-bounded HashAgg
On 12 Srpen 2014, 7:06, Jeff Davis wrote: On Mon, 2014-08-11 at 01:29 +0200, Tomas Vondra wrote: On 10.8.2014 23:26, Jeff Davis wrote: This patch is requires the Memory Accounting patch, or something similar to track memory usage. I think the patch you sent actually includes the accounting patch. Is that on purpose, or by accident? Accident, thank you. So once a group gets into memory, it stays there? That's going to work fine for aggregates with fixed-size state (int4, or generally state that gets allocated and does not grow), but I'm afraid for aggregates with growing state (as for example array_agg and similar) that's not really a solution. I agree in theory, but for now I'm just not handling that case at all because there is other work that needs to be done first. For one thing, we would need a way to save the transition state, and we don't really have that. In the case of array_agg, the state is not serialized and there's no generic way to ask it to serialize itself without finalizing. Yes and no. It's true we don't have this ability for aggregates passing state using 'internal', and arguably these are the cases that matter (because those are the states that tend to bloat as more values are passed to the aggregate). We can do that for states with a known type (because we have serialize deserialize methods for them), but we can't really require all aggregates to use only known types. The 'internal' is there for a reason. So I think eventually we should to support something like this: CREATE AGGREGATE myaggregate ( ... SERIALIZE_FUNC = 'dump_data', DESERIALIZE_FUNC = 'read_data', ... ); That being said, we can't require this from all existing aggregates. There'll always be aggregates not providing this (for example some old ones). So even if we have this, we'll have to support the case when it's not provided - possibly by using the batching algorithm you provided. What I imagine is this: hitting work_mem limit - do we know how to dump the aggregate state? yes (known type or serialize/deserialize) = use the batching algorithm from hash join no (unknown type, ...) = use the batching algorithm described in the original message Now, I'm not trying to make you implement all this - I'm willing to work on that. Implementing this CREATE AGGREGATE extension is however tightly coupled with your patch, because that's the only place where it might be used (that I'm aware of). I'm open to ideas. Do you think my patch is going generally in the right direction, and we can address this problem later; or do you think we need a different approach entirely? I certainly think having memory-bounded hashagg is a great improvement, and yes - this patch can get us there. Maybe it won't get us all the way to the perfect solution but so what? We can improve that by further patches (and I'm certainly willing to spend some time on that). So thanks a lot for working on this! While hacking on the hash join, I envisioned the hash aggregate might work in a very similar manner, i.e. something like this: * nbatches=1, nbits=0 * when work_mem gets full = nbatches *= 2, nbits += 1 * get rid of half the groups, using nbits from the hash = dump the current states into 'states.batchno' file = dump further tuples to 'tuples.batchno' file * continue until the end, or until work_mem gets full again It would get a little messy with HashAgg. Hashjoin is dealing entirely with tuples; HashAgg deals with tuples and groups. I don't see why it should get messy? In the end, you have a chunk of data and a hash for it. Also, if the transition state is fixed-size (or even nearly so), it makes no sense to remove groups from the hash table before they are finished. We'd need to detect that somehow, and it seems almost like two different algorithms (though maybe not a bad idea to use a different algorithm for things like array_agg). It just means you need to walk through the hash table, look at the hashes and dump ~50% of the groups to a file. I'm not sure how difficult that is with dynahash, though (hashjoin uses a custom hashtable, that makes this very simple). Not saying that it can't be done, but (unless you have an idea) requires quite a bit more work than what I did here. It also seems to me that the logic of the patch is about this: * try to lookup the group in the hash table * found = call the transition function * not found * enough space = call transition function * not enough space = tuple/group goes to a batch Which pretty much means all tuples need to do the lookup first. The nice thing on the hash-join approach is that you don't really need to do the lookup - you just need to peek at the hash whether the group belongs to the current batch (and if not, to which batch it should go). That's an interesting point. I suspect that, in practice, the cost of hashing the tuple is
Re: [HACKERS] pg_receivexlog and replication slots
On Fri, Jul 11, 2014 at 6:23 PM, Andres Freund and...@2ndquadrant.com wrote: Ok. Do you plan to take care of it? If, I'd be fine with backpatching it. I'm not likely to get to it right now :( Actually I came up with the same need as Magnus, but a bit later, so... Attached is a patch to support physical slot creation and drop in pg_receivexlog with the addition of new options --create and --drop. It would be nice to have that in 9.4, but I will not the one deciding that at the end :) Code has been refactored with what is already available in pg_recvlogical for the slot creation and drop. Regards, -- Michael From 62ef9bd097453648e4f313f1aedf91ba2b4a3f1e Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Tue, 12 Aug 2014 21:54:13 +0900 Subject: [PATCH] Add support for physical slot creation/deletion in pg_receivexlog Physical slot creation can be done with --create and drop with --drop. In both cases --slot is needed. Code for replication slot creation and drop is refactored with what was available in pg_recvlogical. --- doc/src/sgml/ref/pg_receivexlog.sgml | 29 ++ src/bin/pg_basebackup/pg_receivexlog.c | 73 + src/bin/pg_basebackup/pg_recvlogical.c | 66 +++ src/bin/pg_basebackup/streamutil.c | 97 ++ src/bin/pg_basebackup/streamutil.h | 7 +++ 5 files changed, 203 insertions(+), 69 deletions(-) diff --git a/doc/src/sgml/ref/pg_receivexlog.sgml b/doc/src/sgml/ref/pg_receivexlog.sgml index c15776f..9251f85 100644 --- a/doc/src/sgml/ref/pg_receivexlog.sgml +++ b/doc/src/sgml/ref/pg_receivexlog.sgml @@ -72,6 +72,35 @@ PostgreSQL documentation titleOptions/title para +applicationpg_receivexlog/application can run in one of two following +modes, which control physical replication slot: + +variablelist + + varlistentry + termoption--create/option/term + listitem + para +Create a new physical replication slot with the name specified in +option--slot/option, then exit. + /para + /listitem + /varlistentry + + varlistentry + termoption--drop/option/term + listitem + para +Drop the replication slot with the name specified +in option--slot/option, then exit. + /para + /listitem + /varlistentry +/variablelist + + /para + + para The following command-line options control the location and format of the output. diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c index 0b7af54..67c56cb 100644 --- a/src/bin/pg_basebackup/pg_receivexlog.c +++ b/src/bin/pg_basebackup/pg_receivexlog.c @@ -38,6 +38,8 @@ static int noloop = 0; static int standby_message_timeout = 10 * 1000; /* 10 sec = default */ static int fsync_interval = 0; /* 0 = default */ static volatile bool time_to_abort = false; +static bool do_create_slot = false; +static bool do_drop_slot = false; static void usage(void); @@ -78,6 +80,9 @@ usage(void) printf(_( -w, --no-password never prompt for password\n)); printf(_( -W, --password force password prompt (should happen automatically)\n)); printf(_( --slot=SLOTNAMEreplication slot to use\n)); + printf(_(\nOptional actions:\n)); + printf(_( --create create a new replication slot (for the slot's name see --slot)\n)); + printf(_( --drop drop the replication slot (for the slot's name see --slot)\n)); printf(_(\nReport bugs to pgsql-b...@postgresql.org.\n)); } @@ -261,14 +266,6 @@ StreamLog(void) uint32 hi, lo; - /* - * Connect in replication mode to the server - */ - conn = GetConnection(); - if (!conn) - /* Error message already written in GetConnection() */ - return; - if (!CheckServerVersionForStreaming(conn)) { /* @@ -370,6 +367,9 @@ main(int argc, char **argv) {status-interval, required_argument, NULL, 's'}, {slot, required_argument, NULL, 'S'}, {verbose, no_argument, NULL, 'v'}, +/* action */ + {create, no_argument, NULL, 1}, + {drop, no_argument, NULL, 2}, {NULL, 0, NULL, 0} }; @@ -453,6 +453,13 @@ main(int argc, char **argv) case 'v': verbose++; break; +/* action */ + case 1: +do_create_slot = true; +break; + case 2: +do_drop_slot = true; +break; default: /* @@ -477,10 +484,26 @@ main(int argc, char **argv) exit(1); } + if (replication_slot == NULL (do_drop_slot || do_create_slot)) + { + fprintf(stderr, _(%s: replication slot needed with action --create or --drop\n), progname); + fprintf(stderr, _(Try \%s --help\ for more information.\n), +progname); + exit(1); + } + + if (do_drop_slot do_create_slot) + { + fprintf(stderr, _(%s: cannot use --create together with --drop\n), progname); + fprintf(stderr, _(Try \%s --help\ for more information.\n), +progname); + exit(1); + } + /* * Required arguments */ - if (basedir == NULL) +
Re: [HACKERS] PL/PgSQL: RAISE and the number of parameters
Hello, - I would suggest to avoid continue within a loop so that the code is simpler to understand, at least for me. I personally find the code easier to read with the continue. Hmmm. I had to read the code to check it, and I did it twice. The point is that there is 3 exit points instead of 1 in the block, which is not in itself very bad, but: for(...) { if (ccc) xxx; ... foo++; } It looks like foo++ is always executed, and you have to notice that xxx a few lines above is a continue to realise that it is only when ccc is false. This is also disconcerting if it happens to be the rare case, i.e. here most of the time the char is not '%', so most of the time foo is not incremented, although it is a the highest level. Also the code with continue does not really put forward that what is counted is '%' followed by a non '%'. Note that the corresponding execution code (pgplsql/src/pl_exec.c) uses one continue to get rid of the second '%', which is quite defendable in that case as it is a kind of exception, but the main condition remains a simple if block. Final argument, the structured version is shorter than the unstructured version, with just the two continues removed, and one negation also removed. to also turn the ereport()s into elog()s since the user should never see them. I'm not sure why elog is better than ereport in that case: ISTM that it is an error worth reporting if it ever happens, say there is another syntax added later on which is not caught for some reason by the compile-time check, so I would not change it. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Incremental Backup
On Tue, Aug 12, 2014 at 6:41 AM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: To declared two files identical they must have same size, same mtime and same *checksum*. Still not safe. Checksum collisions do happen, especially in big data sets. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/PgSQL: RAISE and the number of parameters
2014-08-12 15:09 GMT+02:00 Fabien COELHO coe...@cri.ensmp.fr: Hello, - I would suggest to avoid continue within a loop so that the code is simpler to understand, at least for me. I personally find the code easier to read with the continue. Hmmm. I had to read the code to check it, and I did it twice. The point is that there is 3 exit points instead of 1 in the block, which is not in itself very bad, but: for(...) { if (ccc) xxx; ... foo++; } It looks like foo++ is always executed, and you have to notice that xxx a few lines above is a continue to realise that it is only when ccc is false. This is also disconcerting if it happens to be the rare case, i.e. here most of the time the char is not '%', so most of the time foo is not incremented, although it is a the highest level. Also the code with continue does not really put forward that what is counted is '%' followed by a non '%'. Note that the corresponding execution code (pgplsql/src/pl_exec.c) uses one continue to get rid of the second '%', which is quite defendable in that case as it is a kind of exception, but the main condition remains a simple if block. Final argument, the structured version is shorter than the unstructured version, with just the two continues removed, and one negation also removed. to also turn the ereport()s into elog()s since the user should never see them. I'm not sure why elog is better than ereport in that case: ISTM that it is an error worth reporting if it ever happens, say there is another syntax added later on which is not caught for some reason by the compile-time check, so I would not change it. one note: this patch can enforce a compatibility issues - a partially broken functions, where some badly written RAISE statements was executed newer. I am not against this patch, but it should be in extra check probably ?? Or we have to documented it as potential compatibility issue Regards Pavel -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Incremental Backup
Hi Claudio, 2014-08-12 15:25 GMT+02:00 Claudio Freire klaussfre...@gmail.com: Still not safe. Checksum collisions do happen, especially in big data sets. Can I ask you what you are currently using for backing up large data sets with Postgres? Thanks, Gabriele -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Incremental Backup
Il 12/08/14 15:25, Claudio Freire ha scritto: On Tue, Aug 12, 2014 at 6:41 AM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: To declared two files identical they must have same size, same mtime and same *checksum*. Still not safe. Checksum collisions do happen, especially in big data sets. IMHO it is still good-enough. We are not trying to protect from a malicious attack, we are using it to protect against some *casual* event. Even cosmic rays have a not null probability of corrupting your database in a not-noticeable way. And you can probably notice it better with a checksum than with a LSN :-) Given that, I think that whatever solution we choose, we should include checksums in it. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it signature.asc Description: OpenPGP digital signature
Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
Robert Haas robertmh...@gmail.com writes: I'd support back-porting that commit to 9.1 and 9.2 as a fix for this problem. As the commit message says, it's dead simple. From: Tom Lane t...@sss.pgh.pa.us While I have no great objection to back-porting Heikki's patch, it seems like a very large stretch to call this a root-cause fix. At best it's band-aiding one symptom in a rather fragile way. Thank you, Robert san. I'll be waiting for it to be back-ported to the next 9.1/9.2 release. Yes, I think this failure is only one potential symptom caused by the implemnentation mistake -- handling both latch wakeup and other tasks that wait on a latch in the SIGUSR1 handler. Although there may be no such tasks now, I'd like to correct and clean up the implementation as follows to avoid similar problems in the future. I think it's enough to do this only for 9.5. Please correct me before I go deeper in the wrong direction. * The SIGUSR1 handler only does latch wakeup. Any other task is done in other signal handlers such as SIGUSR2. Many daemon postgres processes follow this style, but the normal backend, autovacuum daemons, and background workers don't now. * InitializeLatchSupport() in unix_latch.c calls pqsignal(SIGUSR1, latch_sigusr1_handler). Change the argument of latch_sigusr1_handler() accordingly. * Remove SIGUSR1 handler registration and process-specific SIGUSR1 handler functions from all processes. We can eliminate many SIGUSR1 handler functions which have the same contents. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Incremental Backup
On 2014-08-12 10:25:21 -0300, Claudio Freire wrote: On Tue, Aug 12, 2014 at 6:41 AM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: To declared two files identical they must have same size, same mtime and same *checksum*. Still not safe. Checksum collisions do happen, especially in big data sets. If you use an appropriate algorithm for appropriate amounts of data that's not a relevant concern. You can easily do different checksums for every 1GB segment of data. If you do it right the likelihood of conflicts doing that is so low it doesn't matter at all. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
Amit Khandekar amit.khande...@enterprisedb.com wrote: On 7 August 2014 19:49, Kevin Grittner kgri...@ymail.com wrote: Amit Khandekar amit.khande...@enterprisedb.com wrote: I tried to google some SQLs that use REFERENCING clause with triggers. It looks like in some database systems, even the WHEN clause of CREATE TRIGGER can refer to a transition table, just like how it refers to NEW and OLD row variables. For e.g. : CREATE TRIGGER notify_dept AFTER UPDATE ON weather REFERENCING NEW_TABLE AS N_TABLE NEW AS N_ROW FOR EACH ROW WHEN ((SELECT AVG (temperature) FROM N_TABLE) 10) BEGIN notify_department(N_ROW.temperature, N_ROW.city); END Above, it is used to get an aggregate value of all the changed rows. I think we do not currently support aggregate expressions in the where clause, but with transition tables, it makes more sense to support it later if not now. Interesting point; I had not thought about that. Will see if I can include support for that in the patch for the next CF; failing that; I will at least be careful to not paint myself into a corner where it is unduly hard to do later. We currently do the WHEN checks while saving the AFTER trigger events, and also add the tuples one by one while saving the trigger events. If and when we support WHEN, we would need to make all of these tuples saved *before* the first WHEN clause execution, and that seems to demand more changes in the current trigger code. In that case my inclination is to get things working with the less invasive patch that doesn't try to support transition table usage in WHEN clauses, and make support for that a later patch. --- Are we later going to extend this support for constraint triggers as well ? I think these variables would make sense even for deferred constraint triggers. I think we would need some more changes if we want to support this, because there is no query depth while executing deferred triggers and so the tuplestores might be inaccessible with the current design. Hmm, I would also prefer to exclude that from an initial patch, but this and the WHEN clause issue may influence a decision I've been struggling with. This is my first non-trivial foray into the planner territory, and I've been struggling with how best to bridge the gap between where the tuplestores are *captured* in the trigger code and where they are referenced by name in a query and incorporated into a plan for the executor. (The execution level itself was almost trivial; it's getting the tuplestore reference through the parse analysis and planning phases that is painful for me.) At one point I create a tuplestore registry using a process-local hashmap where each Tuplestorestate and its associated name, TupleDesc, etc. would be registered, yielding a Tsrid (based on Oid) to use through the parse analysis and planning steps, but then I ripped it back out again in favor of just passing the pointer to the structure which was stored in the registry; because the registry seemed to me to introduce more risk of memory leaks, references to freed memory, etc. While it helped a little with abstraction, it seemed to make things more fragile. But I'm still torn on this, and unsure whether such a registry is a good idea. Any thoughts on that? --- The following (and similarly other) statements : trigdesc-trig_insert_new_table |= (TRIGGER_FOR_INSERT(tgtype) TRIGGER_USES_TRANSITION_TABLE(trigger-tgnewtable)) ? true : false; can be simplfied to : trigdesc-trig_insert_new_table |= (TRIGGER_FOR_INSERT(tgtype) TRIGGER_USES_TRANSITION_TABLE(trigger-tgnewtable)); Yeah, I did recognize that, but I always get squeamish about logical operations with values which do not equal true or false. TRIGGER_FOR_INSERT and similar macros don't necessarily return true for something which is not false. I should just get over that and trust the compiler a bit more :-) --- AfterTriggerExecute() { . /* * Set up the tuplestore information. */ if (trigdesc-trig_delete_old_table || trigdesc-trig_update_old_table) LocTriggerData.tg_olddelta = GetCurrentTriggerDeltaTuplestore(afterTriggers-old_tuplestores); . } Above, trigdesc-trig_update_old_table is true if at least one of the triggers of the table has transition tables. So this will cause the delta table to be available on all of the triggers of the table even if only one out of them uses transition tables. May be this would be solved by using LocTriggerData.tg_trigger-tg_olddelta rather than trigdesc-trig_update_old_table to decide whether LocTriggerData.tg_olddelta should be assigned. Good point. Will do. --- GetCurrentTriggerDeltaTuplestore() is now used for getting fdw tuplestore also, so it should have a more general name. Well, delta *was* my attempt at a more general name. I need to do another pass over the naming choices to make sure I'm being consistent, but I
Re: [HACKERS] Proposal: Incremental Backup
On Tue, Aug 12, 2014 at 11:17 AM, Gabriele Bartolini gabriele.bartol...@2ndquadrant.it wrote: 2014-08-12 15:25 GMT+02:00 Claudio Freire klaussfre...@gmail.com: Still not safe. Checksum collisions do happen, especially in big data sets. Can I ask you what you are currently using for backing up large data sets with Postgres? Currently, a time-delayed WAL archive hot standby, pg_dump sparingly, filesystem snapshots (incremental) of the standby more often, with the standby down. When I didn't have the standby, I did online filesystem snapshots of the master with WAL archiving to prevent inconsistency due to snapshots not being atomic. On Tue, Aug 12, 2014 at 11:25 AM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: Il 12/08/14 15:25, Claudio Freire ha scritto: On Tue, Aug 12, 2014 at 6:41 AM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: To declared two files identical they must have same size, same mtime and same *checksum*. Still not safe. Checksum collisions do happen, especially in big data sets. IMHO it is still good-enough. We are not trying to protect from a malicious attack, we are using it to protect against some *casual* event. I'm not talking about malicious attacks, with big enough data sets, checksum collisions are much more likely to happen than with smaller ones, and incremental backups are supposed to work for the big sets. You could use strong cryptographic checksums, but such strong checksums still aren't perfect, and even if you accept the slim chance of collision, they are quite expensive to compute, so it's bound to be a bottleneck with good I/O subsystems. Checking the LSN is much cheaper. Still, do as you will. As everybody keeps saying it's better than nothing, lets let usage have the final word. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvement of versioning on Windows, take two
From: Michael Paquier michael.paqu...@gmail.com Oh yes, right. I don't really know how I missed this error when testing v1. Adding an explicit call to RemoveFile('src\timezone\win32ver.rc') for project postgres calms down the build. Is the attached working for you? Yes, the build succeeded. I confirmed that the following files have version info. However, unlike other files, they don't have file description. Is this intended? bin\isolationtester.exe bin\pg_isolation_regress bin\pg_regress.exe bin\pg_regress_ecpg.exe bin\zic.exe lib\regress.dll lib\dict_snowball.dll has no version properties. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format and API changes (9.5)
Heikki Linnakangas wrote: On 08/05/2014 03:50 PM, Michael Paquier wrote: - All the functions in xlogconstruct.c could be defined in a separate header xlogconstruct.h. What they do is rather independent from xlog.h. The same applies to all the structures and functions of xlogconstruct.c in xlog_internal.h: XLogRecordAssemble, XLogRecordAssemble, InitXLogRecordConstruct. (worth noticing as well that the only reason XLogRecData is defined externally of xlogconstruct.c is that it is being used in XLogInsert()). Hmm. I left the defines for xlogconstruct.c functions that are used to build a WAL record in xlog.h, so that it's not necessary to include both xlog.h and xlogconstruct.h in all files that write a WAL record (XLogInsert() is defined in xlog.h). But perhaps it would be better to move the prototype for XLogInsert to xlogconstruct.h too? That might be a better division; everything needed to insert a WAL record would be in xlogconstruct.h, and xlog.h would left for more internal functions related to WAL. And perhaps rename the files to xloginsert.c and xloginsert.h. Yes, IMO ideally modules that only construct WAL records to insert, but are not directly involved with other XLog stuff, should only be using a lean header file, not the whole xlog.h. I imagine xlogconstruct.h would be such a file. (The patch you just posted doesn't have such a file -- AFAICS that stuff is all in xlog.h still). No opinion on xlogconstruct vs. xloginsert as file names. Both seem like good enough names to me. Unless others have stronger opinions, I would left that decision to you. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format and API changes (9.5)
Heikki Linnakangas wrote: On 08/05/2014 03:50 PM, Michael Paquier wrote: - XLogRecGetBlockRefIds needing an already-allocated array for *out is not user-friendly. Cannot we just do all the work inside this function? I agree it's not a nice API. Returning a palloc'd array would be nicer, but this needs to work outside the backend too (e.g. pg_xlogdump). It could return a malloc'd array in frontend code, but it's not as nice. On the whole, do you think that be better than the current approach? I don't see the issue. Right now, in frontend code you can call palloc() and pfree(), and they behave like malloc() and free() (see fe_memutils.c). Your module doesn't need to do anything special. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb format is pessimal for toast compression
On Mon, Aug 11, 2014 at 3:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: ... I think it would be a better idea to arrange some method by which JSONB (and perhaps other data types) can provide compression hints to pglz. I agree with that as a long-term goal, but not sure if it's sane to push into 9.4. What we could conceivably do now is (a) add a datatype OID argument to toast_compress_datum, and (b) hard-wire the selection of a different compression-parameters struct if it's JSONBOID. The actual fix would then be to increase the first_success_by field of this alternate struct. I think it would be perfectly sane to do that for 9.4. It may not be perfect, but neither is what we have now. A longer-term solution would be to make this some sort of type property that domains could inherit, like typstorage is already. (Somebody suggested dealing with this by adding more typstorage values, but I don't find that attractive; typstorage is known in too many places.) We'd need some thought about exactly what we want to expose, since the specific knobs that pglz_compress has today aren't necessarily good for the long term. What would really be ideal here is if the JSON code could inform the toast compression code this many initial bytes are likely incompressible, just pass them through without trying, and then start compressing at byte N, where N is the byte following the TOC. But I don't know that there's a reasonable way to implement that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
On Mon, Aug 11, 2014 at 7:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Aug 8, 2014 at 10:30 AM, MauMau maumau...@gmail.com wrote: I've tracked down the real root cause. The fix is very simple. Please check the attached one-liner patch. I'd support back-porting that commit to 9.1 and 9.2 as a fix for this problem. As the commit message says, it's dead simple. While I have no great objection to back-porting Heikki's patch, it seems like a very large stretch to call this a root-cause fix. At best it's band-aiding one symptom in a rather fragile way. Yeah, that's true, but I'm not clear that we have another back-patchable fix, so doing something almost-certainly-harmless to alleviate the immediate pain seems worthwhile. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
On 2014-08-12 11:04:00 -0400, Robert Haas wrote: On Mon, Aug 11, 2014 at 7:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Aug 8, 2014 at 10:30 AM, MauMau maumau...@gmail.com wrote: I've tracked down the real root cause. The fix is very simple. Please check the attached one-liner patch. I'd support back-porting that commit to 9.1 and 9.2 as a fix for this problem. As the commit message says, it's dead simple. While I have no great objection to back-porting Heikki's patch, it seems like a very large stretch to call this a root-cause fix. At best it's band-aiding one symptom in a rather fragile way. Yeah, that's true, but I'm not clear that we have another back-patchable fix, so doing something almost-certainly-harmless to alleviate the immediate pain seems worthwhile. Isn't that still leaving the very related issue of waits due to hot pruning open? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hokey wrong versions of libpq in apt.postgresql.org
Re: Joshua D. Drake 2014-08-11 53e83e9c.6030...@commandprompt.com The issue that I specifically ran into is that by using apt.postgresql.org in its default configuration, I can't add certain extensions (without jumping through hoops). Simple: Assume a running 9.2.9 from apt.postgresql.org apt-get install pgxnclient sudo pgxnclient install pg_repack Doesn't work. Because it is looking for libs in /var/lib/postgresql/9.2 not /var/lib/postgresql/9.3. It also failed on another extension (but I don't recall which one it is). I'd be interested in which other extension this is. As I told you repeatedly, we are building tons of extensions for something like 5 branches in parallel on apt.postgresql.org, and things generally just work. We don't have pg_repack packages there, but there is pg_reorg. This indeed required some patch to compile, namely manual pg_config_ext.h and postgres_ext.h includes: http://anonscm.debian.org/cgit/pkg-postgresql/pg-reorg.git/tree/debian/patches/headers Trying pg_repack manually here yields this: pg_repack-1.2.1/bin $ make gcc -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -pie -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -I/usr/include/postgresql -DREPACK_VERSION=1.2.1 -I. -I. -I/usr/include/postgresql/9.1/server -I/usr/include/postgresql/internal -D_FORTIFY_SOURCE=2 -DLINUX_OOM_ADJ=0 -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/tcl8.5 -c -o pgut/pgut.o pgut/pgut.c In file included from pgut/pgut.c:10:0: /usr/include/postgresql/postgres_fe.h:27:32: fatal error: common/fe_memutils.h: No such file or directory This can be fixed by moving -I/usr/include/postgresql past the server includes, but then there's other issues: $ gcc -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -pie -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -DREPACK_VERSION=1.2.1 -I. -I. -I/usr/include/postgresql/9.1/server -I/usr/include/postgresql/internal -I/usr/include/postgresql -D_FORTIFY_SOURCE=2 -DLINUX_OOM_ADJ=0 -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/tcl8.5 -c -o pgut/pgut.o pgut/pgut.c In file included from pgut/pgut.h:23:0, from pgut/pgut.c:17: /usr/include/postgresql/libpq-fe.h:547:1: error: unknown type name 'pg_int64' /usr/include/postgresql/libpq-fe.h:547:50: error: unknown type name 'pg_int64' /usr/include/postgresql/libpq-fe.h:551:1: error: unknown type name 'pg_int64' /usr/include/postgresql/libpq-fe.h:553:48: error: unknown type name 'pg_int64' This seems to be another instance of the (includedir_internal) headers are not self-contained problem, see http://www.postgresql.org/message-id/20140426122548.ga7...@msgid.df7cb.de Possibly we need to move some files in libpq-dev to/from postgresql-server-dev-*, though I believe the Debian packages are now just replicating whatever header layout the PostgreSQL makefiles construct on install. (Namely server - /usr/include/postgresql/9.4/server, Rest - /usr/include/postgresql/) I haven't tried to check all include paths with various combinations of libpq-dev and postgresql-server-dev-one-or-the-other-version installed, though that would be an interesting exercise. Any volunteers? Christoph -- c...@df7cb.de | http://www.df7cb.de/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
On Tue, Aug 12, 2014 at 11:06 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-08-12 11:04:00 -0400, Robert Haas wrote: On Mon, Aug 11, 2014 at 7:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Aug 8, 2014 at 10:30 AM, MauMau maumau...@gmail.com wrote: I've tracked down the real root cause. The fix is very simple. Please check the attached one-liner patch. I'd support back-porting that commit to 9.1 and 9.2 as a fix for this problem. As the commit message says, it's dead simple. While I have no great objection to back-porting Heikki's patch, it seems like a very large stretch to call this a root-cause fix. At best it's band-aiding one symptom in a rather fragile way. Yeah, that's true, but I'm not clear that we have another back-patchable fix, so doing something almost-certainly-harmless to alleviate the immediate pain seems worthwhile. Isn't that still leaving the very related issue of waits due to hot pruning open? Yes. Do you have a back-patchable solution for that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Incremental Backup
On Tue, Aug 12, 2014 at 10:30 AM, Andres Freund and...@2ndquadrant.com wrote: Still not safe. Checksum collisions do happen, especially in big data sets. If you use an appropriate algorithm for appropriate amounts of data that's not a relevant concern. You can easily do different checksums for every 1GB segment of data. If you do it right the likelihood of conflicts doing that is so low it doesn't matter at all. True, but if you use LSNs the likelihood is 0. Comparing the LSN is also most likely a heck of a lot faster than checksumming the entire page. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
On 2014-08-12 11:56:41 -0400, Robert Haas wrote: On Tue, Aug 12, 2014 at 11:06 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-08-12 11:04:00 -0400, Robert Haas wrote: On Mon, Aug 11, 2014 at 7:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Aug 8, 2014 at 10:30 AM, MauMau maumau...@gmail.com wrote: I've tracked down the real root cause. The fix is very simple. Please check the attached one-liner patch. I'd support back-porting that commit to 9.1 and 9.2 as a fix for this problem. As the commit message says, it's dead simple. While I have no great objection to back-porting Heikki's patch, it seems like a very large stretch to call this a root-cause fix. At best it's band-aiding one symptom in a rather fragile way. Yeah, that's true, but I'm not clear that we have another back-patchable fix, so doing something almost-certainly-harmless to alleviate the immediate pain seems worthwhile. Isn't that still leaving the very related issue of waits due to hot pruning open? Yes. Do you have a back-patchable solution for that? The easiest thing I can think of is sprinkling a few SetConfigOption('synchronous_commit', 'off', PGC_INTERNAL, PGC_S_OVERRIDE, GUC_ACTION_LOCAL, true, ERROR); in some strategic places. From a quick look: * all of autovacuum * locally in ProcessCompletedNotifies * locally in ProcessIncomingNotify * locally in ProcessCatchupEvent * locally in InitPostgres Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Incremental Backup
On Wed, Aug 13, 2014 at 12:58 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Aug 12, 2014 at 10:30 AM, Andres Freund and...@2ndquadrant.com wrote: Still not safe. Checksum collisions do happen, especially in big data sets. If you use an appropriate algorithm for appropriate amounts of data that's not a relevant concern. You can easily do different checksums for every 1GB segment of data. If you do it right the likelihood of conflicts doing that is so low it doesn't matter at all. True, but if you use LSNs the likelihood is 0. Comparing the LSN is also most likely a heck of a lot faster than checksumming the entire page. If we use LSN, the strong safeguard seems to be required to prevent a user from taking the incremental backup against wrong instance. For example, it's the case where the first full backup is taken, PITR to a certain past location, then the incremental backup is taken between that first full backup and the current database after PITR. PITR rewinds LSN, so such incremental backup might be corrupted. If so, the safeguard for those problematic cases should be needed. Otherwise, I'm afraid that a user can easily mistake the incremental backup. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] PostgreSQL 9.4 mmap(2) performance regression on FreeBSD...
One of the patches that I've been sitting on and am derelict in punting upstream is the attached mmap(2) flags patch for the BSDs. Is there any chance this can be squeezed in to the PostreSQL 9.4 release? The patch is trivial in size and is used to add one flag to mmap(2) calls in dsm_impl.c. Alan Cox (FreeBSD alc, not Linux) and I went back and forth regarding PostgreSQL's use of mmap(2) and determined that the following is correct and will prevent a likely performance regression in PostgreSQL 9.4. In PostgreSQL 9.3, all mmap(2) calls were called with the flags MAP_ANON | MAP_SHARED, whereas in PostgreSQL 9.4 this is not the case. Digging in to the patch, in reviewing src/backend/storage/ipc/dsm_impl.c, it's clear that rhaas@ understood the consequences of mmap(2), and the possible consequences of having dirty pages gratuitously flushed to disk: src/backend/storage/ipc/dsm_impl.c:781 * Operating system primitives to support mmap-based shared memory. * * Calling this shared memory is somewhat of a misnomer, because what * we're really doing is creating a bunch of files and mapping them into * our address space. The operating system may feel obliged to * synchronize the contents to disk even if nothing is being paged out, * which will not serve us well. The user can relocate the pg_dynshmem * directory to a ramdisk to avoid this problem, if available. In order for the above comment to be true for FreeBSD, an extra flag needs to be passed to mmap(2). From FreeBSD 10's mmap(2) page[2]: MAP_NOSYNC Causes data dirtied via this VM map to be flushed to physical media only when necessary (usually by the pager) rather than gratuitously. Typically this pre- vents the update daemons from flushing pages dirtied through such maps and thus allows efficient sharing of memory across unassociated processes using a file- backed shared memory map. Without this option any VM pages you dirty may be flushed to disk every so often (every 30-60 seconds usually) which can create perfor- mance problems if you do not need that to occur (such as when you are using shared file-backed mmap regions for IPC purposes). Note that VM/file system coherency is maintained whether you use MAP_NOSYNC or not. This option is not portable across UNIX platforms (yet), though some may implement the same behavior by default. WARNING! Extending a file with ftruncate(2), thus creating a big hole, and then filling the hole by mod- ifying a shared mmap() can lead to severe file frag- mentation. In order to avoid such fragmentation you should always pre-allocate the file's backing store by write()ing zero's into the newly extended area prior to modifying the area via your mmap(). The fragmenta- tion problem is especially sensitive to MAP_NOSYNC pages, because pages may be flushed to disk in a totally random order. The same applies when using MAP_NOSYNC to implement a file-based shared memory store. It is recommended that you create the backing store by write()ing zero's to the backing file rather than ftruncate()ing it. You can test file fragmentation by observing the KB/t (kilobytes per transfer) results from an ``iostat 1'' while reading a large file sequentially, e.g. using ``dd if=filename of=/dev/null bs=32k''. The fsync(2) system call will flush all dirty data and metadata associated with a file, including dirty NOSYNC VM data, to physical media. The sync(8) com- mand and sync(2) system call generally do not flush dirty NOSYNC VM data. The msync(2) system call is usually not needed since BSD implements a coherent file system buffer cache. However, it may be used to associate dirty VM pages with file system buffers and thus cause them to be flushed to physical media sooner rather than later. The man page for madvise(2) has more pointed advise[3]: MADV_NOSYNC Request that the system not flush the data associated
Re: [HACKERS] [PATCH] PostgreSQL 9.4 mmap(2) performance regression on FreeBSD...
On 2014-08-12 09:42:30 -0700, Sean Chittenden wrote: One of the patches that I've been sitting on and am derelict in punting upstream is the attached mmap(2) flags patch for the BSDs. Is there any chance this can be squeezed in to the PostreSQL 9.4 release? The patch is trivial in size and is used to add one flag to mmap(2) calls in dsm_impl.c. Alan Cox (FreeBSD alc, not Linux) and I went back and forth regarding PostgreSQL's use of mmap(2) and determined that the following is correct and will prevent a likely performance regression in PostgreSQL 9.4. In PostgreSQL 9.3, all mmap(2) calls were called with the flags MAP_ANON | MAP_SHARED, whereas in PostgreSQL 9.4 this is not the case. The performancewise important call to mmap will still use that set of flags, no? That's the one backing shared_buffers. The mmap backend for *dynamic* shared memory (aka dsm) is *NOT* supposed to be used on common platforms. Both posix and sysv shared memory will be used before falling back to the mmap() backend. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
Andres Freund and...@2ndquadrant.com writes: On 2014-08-12 11:56:41 -0400, Robert Haas wrote: Yes. Do you have a back-patchable solution for that? The easiest thing I can think of is sprinkling a few SetConfigOption('synchronous_commit', 'off', PGC_INTERNAL, PGC_S_OVERRIDE, GUC_ACTION_LOCAL, true, ERROR); This still seems to me to be applying a band-aid that covers over some symptoms; it's not dealing with the root cause that we've overloaded the signal handling mechanism too much. What reason is there to think that there are no other symptoms of that? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/PgSQL: RAISE and the number of parameters
one note: this patch can enforce a compatibility issues - a partially broken functions, where some badly written RAISE statements was executed newer. I am not against this patch, but it should be in extra check probably ?? I'm not sure about what you mean by it should be in extra check. Or we have to documented it as potential compatibility issue. Indeed, as a potential execution error is turned into a certain compilation error. If this compatibility point is a blocker, the compilation error can be turned into a warning, but I would prefer to keep it an error: I'm quite sure I fell into that pit at least once or twice. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
On 2014-08-12 13:11:55 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-08-12 11:56:41 -0400, Robert Haas wrote: Yes. Do you have a back-patchable solution for that? The easiest thing I can think of is sprinkling a few SetConfigOption('synchronous_commit', 'off', PGC_INTERNAL, PGC_S_OVERRIDE, GUC_ACTION_LOCAL, true, ERROR); This still seems to me to be applying a band-aid that covers over some symptoms; it's not dealing with the root cause that we've overloaded the signal handling mechanism too much. What reason is there to think that there are no other symptoms of that? I'm not arguing against fixing that. I think we need to do both, although I am wary of fixing the signal handling in the back branches. Fixing the signal handling won't get rid of the problem that one e.g. might not be able to log in anymore if all synchronous standbys are down and login caused hot pruning. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hokey wrong versions of libpq in apt.postgresql.org
On 08/07/2014 04:30 PM, Joshua D. Drake wrote: Hello, I know this has been brought up before: http://www.postgresql.org/message-id/20140724080902.ga28...@msg.df7cb.de For reference, libpq and packaging issues discussed here as well: http://www.postgresql.org/message-id/53a304bc.40...@pinpointresearch.com http://www.postgresql.org/message-id/53989c91.6050...@pinpointresearch.com But this is just plain wrong. I don't care that the FAQ (on the wiki) says we are doing it wrong for good reasons. When I (or anyone else) pulls postgresql-$version-dev, I want the libpq for my version. I do not want 9.3. Yes, it should (because of protocol compatibility) work but it doesn't always (as stated in that email and in a similar problem we just ran into). There can be unintended circumstances on machines when you mix and match like that. Can we please do some proper packaging on this? +1 Cheers, Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/PgSQL: RAISE and the number of parameters
2014-08-12 19:14 GMT+02:00 Fabien COELHO coe...@cri.ensmp.fr: one note: this patch can enforce a compatibility issues - a partially broken functions, where some badly written RAISE statements was executed newer. I am not against this patch, but it should be in extra check probably ?? I'm not sure about what you mean by it should be in extra check. Or we have to documented it as potential compatibility issue. Indeed, as a potential execution error is turned into a certain compilation error. If this compatibility point is a blocker, the compilation error can be turned into a warning, but I would prefer to keep it an error: I'm quite sure I fell into that pit at least once or twice. I prefer error, although there is possibility to control a force of exception - error or warning via extra plpgsql checks - this kind of error is strange - and stored procedures usually can be fixed later because plpgsql source is available. There was a precedent with more precious query syntax check more times. Just it should be well documented. Regards Pavel -- Fabien.
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
On 08/06/2014 08:37 PM, Jeff Janes wrote: But now it looks like 0002 needs a rebase I've committed the refactoring patch, and here's a rebased and improved version of the Windows SChannel implementation over that. Server-side support is now implemented too, but it's all very crude and work-in-progress. CRLs are not supported, intermediary CAs are not supported, and probably many other bells and whistles are missing too. But the basics work, including cert authentication. Consider this a Proof of Concept. One issue came up with managing private keys: In the server, it's necessary to import the private key into a permanent key container that's managed by the Windows Crypto API. That can be done programmatically (as I do in the patch), but the keys are permanently stored in the system (in the user's profile). They will be left behind even if you completely delete the data directory. That's not the end of the world, but it would be nicer if we could use some kind of a temporary key container that only lives in memory, but the Crypto API doesn't seem to have such a concept. You can acquire an ephemeral context by passing the CRYPT_VERIFYCONTEXT flag to CryptAcquireContext function, and that's exactly what I'm doing in the client, but that method doesn't seem to work when acting as an SSL server. Also, the key container needs to be given a name, or we can use the default container, but either way all the keys are shared among all applications that use the same container. We'll have to figure out how to set that up so that there are no conflicts, if you try to use the same server certificate for two PostgreSQL instances running on the same host (useful while developing/testing replication). This isn't a showstopper, but needs some thought. As the patch stands, it uses a single key container called PostgreSQL server key container, and makes no attempt to delete the keys after they're no longer used. That works, but it leaves the key lying on the system. - Heikki diff --git a/configure b/configure index 0f435b5..d00290a 100755 --- a/configure +++ b/configure @@ -707,6 +707,7 @@ XML2_CONFIG UUID_EXTRA_OBJS with_uuid with_selinux +with_winschannel with_openssl krb_srvtab with_python @@ -823,6 +824,7 @@ with_pam with_ldap with_bonjour with_openssl +with_winschannel with_selinux with_readline with_libedit_preferred @@ -1509,6 +1511,7 @@ Optional Packages: --with-ldap build with LDAP support --with-bonjour build with Bonjour support --with-openssl build with OpenSSL support + --with-winschannel build with Windows SChannel support --with-selinux build with SELinux support --without-readline do not use GNU Readline nor BSD Libedit for editing --with-libedit-preferred @@ -5514,6 +5517,46 @@ $as_echo $with_openssl 6; } # +# Windows SChannel +# +{ $as_echo $as_me:${as_lineno-$LINENO}: checking whether to build with native Windows SSL support 5 +$as_echo_n checking whether to build with native Windows SSL support... 6; } + + + +# Check whether --with-winschannel was given. +if test ${with_winschannel+set} = set; then : + withval=$with_winschannel; + case $withval in +yes) + +$as_echo #define USE_WINDOWS_SCHANNEL 1 confdefs.h + + ;; +no) + : + ;; +*) + as_fn_error $? no argument expected for --with-winschannel option $LINENO 5 + ;; + esac + +else + with_winschannel=no + +fi + + +{ $as_echo $as_me:${as_lineno-$LINENO}: result: $with_winschannel 5 +$as_echo $with_winschannel 6; } + + +if test $with_openssl = yes -a $with_winschannel = yes ; then + as_fn_error $? +*** Cannot select both OpenSSL and Windows SChannel. $LINENO 5 +fi + +# # SELinux # { $as_echo $as_me:${as_lineno-$LINENO}: checking whether to build with SELinux support 5 diff --git a/configure.in b/configure.in index f8a4507..132fb0a 100644 --- a/configure.in +++ b/configure.in @@ -662,6 +662,20 @@ AC_MSG_RESULT([$with_openssl]) AC_SUBST(with_openssl) # +# Windows SChannel +# +AC_MSG_CHECKING([whether to build with native Windows SSL support]) +PGAC_ARG_BOOL(with, winschannel, no, [build with Windows SChannel support], + [AC_DEFINE([USE_WINDOWS_SCHANNEL], 1, [Define to build with Windows SChannel support. (--with-winschannel)])]) +AC_MSG_RESULT([$with_winschannel]) +AC_SUBST(with_winschannel) + +if test $with_openssl = yes -a $with_winschannel = yes ; then + AC_MSG_ERROR([ +*** Cannot select both OpenSSL and Windows SChannel.]) +fi + +# # SELinux # AC_MSG_CHECKING([whether to build with SELinux support]) diff --git a/src/backend/libpq/Makefile b/src/backend/libpq/Makefile index 8be0572..88d0b8b 100644 --- a/src/backend/libpq/Makefile +++ b/src/backend/libpq/Makefile @@ -21,4 +21,8 @@ ifeq ($(with_openssl),yes) OBJS += be-secure-openssl.o endif +ifeq ($(with_winschannel),yes) +OBJS += be-secure-winschannel.o +endif + include $(top_srcdir)/src/backend/common.mk diff
[HACKERS] proposal for 9.5: monitoring lock time for slow queries
Hello I was asked about possibility to show a lock time of slow queries. I proposed a design based on log line prefix, but it was rejected. Any idea how to show a lock time in some practical form together with logged slow query? Regards Pavel
[HACKERS] minor typo in pgbench doc (2)
Yet another very minor typo in pgbench doc. I'm not sure of the best way to show formula in the doc. -- Fabien.diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml index 1551686..7d09a2d 100644 --- a/doc/src/sgml/pgbench.sgml +++ b/doc/src/sgml/pgbench.sgml @@ -782,7 +782,7 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/ (PHI(2.0 * threshold * (i - min - mu + 0.5) / (max - min + 1)) - PHI(2.0 * threshold * (i - min - mu - 0.5) / (max - min + 1))) / (2.0 * PHI(threshold) - 1.0) - / + /. Intuitively, the larger the replaceablethreshold/, the more frequently values close to the middle of the interval are drawn, and the less frequently values close to the replaceablemin/ and -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replication commands and log_statements
On Tue, Aug 12, 2014 at 10:07:34AM +0530, Amit Kapila wrote: On Tue, Aug 12, 2014 at 9:29 AM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Aug 12, 2014 at 2:34 AM, Robert Haas robertmh...@gmail.com wrote: If you have a user devoted to it, I suppose that's true. I still think it shouldn't get munged together like that. Why do we need to treat only replication commands as special ones and add new parameter to display them? One difference is that replication commands are internally generated commands. Do we anywhere else log such internally generated commands with log_statement = all? Good point --- we do not. In fact, this is similar to how we don't log SPI queries, e.g. SQL queries inside functions. We might want to enable that someday too. Could we enable logging of both with a single GUC? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On 2014-08-06 15:42:08 +0530, Amit Kapila wrote: On Tue, Aug 5, 2014 at 9:21 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Jun 5, 2014 at 4:43 AM, Amit Kapila amit.kapil...@gmail.com wrote: This essentially removes BgWriterDelay, but it's still mentioned in BgBufferSync(). Looking further, I see that with the patch applied, BgBufferSync() is still present in the source code but is no longer called from anywhere. Please don't submit patches that render things unused without actually removing them; it makes it much harder to see what you've changed. I realize you probably left it that way for testing purposes, but you need to clean such things up before submitting. Likewise, if you've rendered GUCs or statistics counters removed, you need to rip them out, so that the scope of the changes you've made is clear to reviewers. FWIW, I found this email amost unreadable because it misses quoting signs after linebreaks in quoted content. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format and API changes (9.5)
On 2014-08-12 12:44:22 +0300, Heikki Linnakangas wrote: Here's a new patch with those little things fixed. I've so far ignored this thread... I'm now taking a look. Unfortunately I have to admit I'm not unequivocally happy. * XLogRegisterData/XLogRegisterRecData imo don't really form a consistent API namewise. What does 'Rec' mean here? * I'm distinctly not a fan of passing around both the LSN and the struct XLogRecord to functions. We should bit the bullet and use something encapsulating both. * XLogReplayBuffer imo isn't a very good name - the buffer isn't replayed. If anything it's sometimes replaced by the backup block, but the real replay still happens outside of that function. * Why do we need XLogBeginInsert(). Right now this leads to uglyness like duplicated if (RelationNeedsWAL()) wal checks and XLogCancelInsert() of inserts that haven't been started. And if we have Begin, why do we also need Cancel? * XXX: do we need to do this for every page? - yes, and it's every tuple, not every page... And It needs to be done *after* the tuple has been put on the page, not before. Otherwise the location will be wrong. * The patch mixes the API changes around WAL records with changes of how individual actions are logged. That makes it rather hard to review - and it's a 500kb patch already. I realize it's hard to avoid because the new API changes which information about blocks is available, but it does make it really hard to see whether the old/new code is doing something equivalent. It's hard to find more critical code than this :/ Have you done any performance evaluation? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
Heikki Linnakangas wrote: I couldn't resist starting to hack on this, and implemented the scheme I've been having in mind: 1. MMTuple contains the block number of the heap page (range) that the tuple represents. Vacuum is no longer needed to clean up old tuples; when an index tuples is updated, the old tuple is deleted atomically with the insertion of a new tuple and updating the revmap, so no garbage is left behind. 2. LockTuple is gone. When following the pointer from revmap to MMTuple, the block number is used to check that you land on the right tuple. If not, the search is started over, looking at the revmap again. Thanks, looks good, yeah. Did you just forget to attach the access/rmgrdesc/minmaxdesc.c file, or did you ignore it altogether? Anyway I hacked one up, and cleaned up some other things. I'm sure this still needs some cleanup, but here's the patch, based on your v14. Now that I know what this approach looks like, I still like it much better. The insert and update code is somewhat more complicated, because you have to be careful to lock the old page, new page, and revmap page in the right order. But it's not too bad, and it gets rid of all the complexity in vacuum. It seems there is some issue here, because pageinspect tells me the index is not growing properly for some reason. minmax_revmap_data gives me this array of TIDs after a bunch of insert/vacuum/delete/ etc: (2,1),(2,2),(2,3),(2,4),(2,5),(4,1),(5,1),(6,1),(7,1),(8,1),(9,1),(10,1),(11,1),(12,1),(13,1),(14,1),(15,1),(16,1),(17,1),(18,1),(19,1),(20,1),(21,1),(22,1),(23,1),(24,1),(25,1),(26,1),(27,1),(28,1),(29,1),(30,1),(31,1),(32,1),(33,1),(34,1),(35,1),(36,1),(37,1),(38,1),(39,1),(40,1),(41,1),(42,1),(43,1),(44,1),(45,1),(46,1),(47,1),(48,1),(49,1),(50,1),(51,1),(52,1),(53,1),(54,1),(55,1),(56,1),(57,1),(58,1),(59,1),(60,1),(61,1),(62,1),(63,1),(64,1),(65,1),(66,1),(67,1),(68,1),(69,1),(70,1),(71,1),(72,1),(73,1),(74,1),(75,1),(76,1),(77,1),(78,1),(79,1),(80,1),(81,1),(82,1),(83,1),(84,1),(85,1),(86,1),(87,1),(88,1),(89,1),(90,1),(91,1),(92,1),(93,1),(94,1),(95,1),(96,1),(97,1),(98,1),(99,1),(100,1),(101,1),(102,1),(103,1),(104,1),(105,1),(106,1),(107,1),(108,1),(109,1),(110,1),(111,1),(112,1),(113,1),(114,1),(115,1),(116,1),(117,1),(118,1),(119,1),(120,1),(121,1),(122,1),(123,1),(124,1),(125,1),(126,1),(127,1),(128,1),(129,1),(130,1),(131,1),(132,1),(133,1),(134,1) There are some who would think that getting one item per page is suboptimal. (Maybe it's just a missing FSM update somewhere.) I've been hacking away a bit more at this; will post updated patch probably tomorrow (was about to post but just found a memory stomp in pageinspect.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replication commands and log_statements
* Bruce Momjian (br...@momjian.us) wrote: On Tue, Aug 12, 2014 at 10:07:34AM +0530, Amit Kapila wrote: On Tue, Aug 12, 2014 at 9:29 AM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Aug 12, 2014 at 2:34 AM, Robert Haas robertmh...@gmail.com wrote: If you have a user devoted to it, I suppose that's true. I still think it shouldn't get munged together like that. Why do we need to treat only replication commands as special ones and add new parameter to display them? One difference is that replication commands are internally generated commands. Do we anywhere else log such internally generated commands with log_statement = all? Not entirely sure what you're referring to as 'internally generated' here.. While they can come from another backend, they don't have to. Good point --- we do not. In fact, this is similar to how we don't log SPI queries, e.g. SQL queries inside functions. We might want to enable that someday too. Could we enable logging of both with a single GUC? I don't see those as the same at all. I'd like to see improved flexibility in this area, certainly, but don't want two independent considerations like these tied to one GUC.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] jsonb format is pessimal for toast compression
On Mon, Aug 11, 2014 at 01:44:05PM -0700, Peter Geoghegan wrote: On Mon, Aug 11, 2014 at 1:01 PM, Stephen Frost sfr...@snowman.net wrote: We've got a clear example of someone, quite reasonably, expecting their JSONB object to be compressed using the normal TOAST mechanism, and we're failing to do that in cases where it's actually a win to do so. That's the focus of this discussion and what needs to be addressed before 9.4 goes out. Sure. I'm not trying to minimize that. We should fix it, certainly. However, it does bear considering that JSON data, with each document stored in a row is not an effective target for TOAST compression in general, even as text. Seems we have two issues: 1) the header makes testing for compression likely to fail 2) use of pointers rather than offsets reduces compression potential I understand we are focusing on #1, but how much does compression reduce the storage size with and without #2? Seems we need to know that answer before deciding if it is worth reducing the ability to do fast lookups with #2. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] PostgreSQL 9.4 mmap(2) performance regression on FreeBSD...
On Tue, Aug 12, 2014 at 12:59 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-08-12 09:42:30 -0700, Sean Chittenden wrote: One of the patches that I've been sitting on and am derelict in punting upstream is the attached mmap(2) flags patch for the BSDs. Is there any chance this can be squeezed in to the PostreSQL 9.4 release? The patch is trivial in size and is used to add one flag to mmap(2) calls in dsm_impl.c. Alan Cox (FreeBSD alc, not Linux) and I went back and forth regarding PostgreSQL's use of mmap(2) and determined that the following is correct and will prevent a likely performance regression in PostgreSQL 9.4. In PostgreSQL 9.3, all mmap(2) calls were called with the flags MAP_ANON | MAP_SHARED, whereas in PostgreSQL 9.4 this is not the case. The performancewise important call to mmap will still use that set of flags, no? That's the one backing shared_buffers. The mmap backend for *dynamic* shared memory (aka dsm) is *NOT* supposed to be used on common platforms. Both posix and sysv shared memory will be used before falling back to the mmap() backend. Hmm, yeah. This might still be a good thing to do (because what do we lose?) but it shouldn't really be an issue in practice. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Incremental Backup
Claudio, * Claudio Freire (klaussfre...@gmail.com) wrote: I'm not talking about malicious attacks, with big enough data sets, checksum collisions are much more likely to happen than with smaller ones, and incremental backups are supposed to work for the big sets. This is an issue when you're talking about de-duplication, not when you're talking about testing if two files are the same or not for incremental backup purposes. The size of the overall data set in this case is not relevant as you're only ever looking at the same (at most 1G) specific file in the PostgreSQL data directory. Were you able to actually produce a file with a colliding checksum as an existing PG file, the chance that you'd be able to construct one which *also* has a valid page layout sufficient that it wouldn't be obviously massivly corrupted is very quickly approaching zero. You could use strong cryptographic checksums, but such strong checksums still aren't perfect, and even if you accept the slim chance of collision, they are quite expensive to compute, so it's bound to be a bottleneck with good I/O subsystems. Checking the LSN is much cheaper. For my 2c on this- I'm actually behind the idea of using the LSN (though I have not followed this thread in any detail), but there's plenty of existing incremental backup solutions (PG specific and not) which work just fine by doing checksums. If you truely feel that this is a real concern, I'd suggest you review the rsync binary diff protocol which is used extensively around the world and show reports of it failing in the field. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] WAL format and API changes (9.5)
On 08/13/2014 01:04 AM, Andres Freund wrote: On 2014-08-12 12:44:22 +0300, Heikki Linnakangas wrote: Here's a new patch with those little things fixed. I've so far ignored this thread... I'm now taking a look. Unfortunately I have to admit I'm not unequivocally happy. * XLogRegisterData/XLogRegisterRecData imo don't really form a consistent API namewise. What does 'Rec' mean here? The latter function is actually called XLogRegisterBufData. The README was wrong, will fix. * I'm distinctly not a fan of passing around both the LSN and the struct XLogRecord to functions. We should bit the bullet and use something encapsulating both. You mean, the redo functions? Seems fine to me, and always it's been like that... * XLogReplayBuffer imo isn't a very good name - the buffer isn't replayed. If anything it's sometimes replaced by the backup block, but the real replay still happens outside of that function. I agree, but haven't come up with a better name. * Why do we need XLogBeginInsert(). Right now this leads to uglyness like duplicated if (RelationNeedsWAL()) wal checks and XLogCancelInsert() of inserts that haven't been started. I like clearly marking the beginning of the insert, with XLogBeginInsert(). We certainly could design it so that it's not needed, and you could just start registering stuff without calling XLogBeginInsert() first. But I believe it's more robust this way. For example, you will get an error at the next XLogBeginInsert(), if a previous operation had already registered some data without calling XLogInsert(). And if we have Begin, why do we also need Cancel? We could just automatically cancel any previous insertion when XLogBeginInsert() is called, but again it seems like bad form to leave references to buffers and data just lying there, if an operation bails out after registering some stuff and doesn't finish the insertion. * XXX: do we need to do this for every page? - yes, and it's every tuple, not every page... And It needs to be done *after* the tuple has been put on the page, not before. Otherwise the location will be wrong. That comment is in heap_multi_insert(). All the inserted tuples have the same command id, seems redundant to log multiple NEW_CID records for them. * The patch mixes the API changes around WAL records with changes of how individual actions are logged. That makes it rather hard to review - and it's a 500kb patch already. I realize it's hard to avoid because the new API changes which information about blocks is available, but it does make it really hard to see whether the old/new code is doing something equivalent. It's hard to find more critical code than this :/ Yeah, I hear you. I considered doing this piecemeal, just adding the new functions first so that you could still use the old XLogRecData API, until all the functions have been converted. But in the end, I figured it's not worth it, as sooner or later we'd want to convert all the functions anyway. Have you done any performance evaluation? No, not yet. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inconsistent use of --slot/-S in pg_receivexlog and pg_recvlogical
On Tue, Aug 12, 2014 at 7:27 PM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Aug 12, 2014 at 4:50 PM, Michael Paquier michael.paqu...@gmail.com wrote: Hi, I noticed that pg_receivexlog is able to use --slot but not -S, even if the code is written this way. Attached is a patch correcting that. This makes pg_receivexlog consistent with pg_recvlogical regarding the slot option. IMHO, this should be backpatched to REL9_4_STABLE. Regards, Looks good to me. And, I could not find the reason why only -S was not added, in the past discussion. Anyway, barring any objection, I will commit this. Applied. Thanks! Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] strncpy is not a safe version of strcpy
On Sat, Nov 16, 2013 at 12:53:10PM +1300, David Rowley wrote: I went on a bit of a strncpy cleanup rampage this morning and ended up finding quite a few places where strncpy is used wrongly. I'm not quite sure if I have got them all in this patch, but I' think I've got the obvious ones at least. For the hash_search in jsconfuncs.c after thinking about it a bit more... Can we not just pass the attname without making a copy of it? I see keyPtr in hash_search is const void * so it shouldn't get modified in there. I can't quite see the reason for making the copy. +1 for the goal of this patch. Another commit took care of your jsonfuncs.c concerns, and the patch for CVE-2014-0065 fixed several of the others. Plenty remain, though. Attached is a patch with various cleanups where I didn't like the look of the strncpy. I didn't go overboard with this as I know making this sort of small changes all over can be a bit scary and I thought maybe it would get rejected on that basis. I also cleaned up things like strncpy(dest, src, strlen(src)); which just seems a bit weird and I'm failing to get my head around why it was done. I replaced these with memcpy instead, but they could perhaps be a plain old strcpy. I suggest preparing one or more patches that focus on the cosmetic-only changes, such as strncpy() - memcpy() when strncpy() is guaranteed not to reach a NUL byte. With that noise out of the way, it will be easier to give the rest the attention it deserves. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Production block comparison facility
On Wed, Jul 23, 2014 at 11:14 PM, Michael Paquier michael.paqu...@gmail.com wrote: Things could be refactored and improved for sure, but this patch is already useful as-is so I am going to add it to the next commit fest. After some more investigation, I am going to mark this patch as Returned with feedback for the time being (mainly to let it show up on the commit fest app and for the sake of archives), Mainly for two reasons: - We can do better than what I sent: instead of checking if the FPW and the current page are somewhat consistent, we could actually check if the current page is equal with the FPW after applying WAL on it. In order to do that, we would need to bypass the FPW replay and to apply WAL on the current page (if the page is already initialized), then control RestoreBackupBlock (or its equivalent) that with an additional flag to tell that block is not restored, but can get WAL applied to it safely. Then a comparison with the FPW contained in the WAL record can be made. - The patch of Heikki to change the WAL APIs and track more easily the blocks changes is going to make this implementation far easier. It also improves the status checks on which block has been restored, so it is more easily extensible for what could be done here. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW
(2014/08/12 18:34), Shigeru Hanada wrote: Issues addressed by Eitoku-san were fixed properly, but he found a bug and a possible enhancement in the v2 patch. Thank you for the review, Hanada-san and Eitoku-san! * push-down check misses delete triggers update_is_pushdown_safe() seems to have a bug that it misses the existence of row-level delete trigger. DELETE statement executed against a foreign table which has row-level delete trigger is pushed down to remote, and consequently no row-level delete trigger is fired. Ah, I noticed that the current code for that is not correct. Will fix. * further optimization Is there any chance to consider further optimization by passing the operation type (UPDATE|DELETE) of undergoing statement to update_is_pushdown_safe()? It seems safe to push down UPDATE statement when the target foreign table has no update trigger even it has a delete trigger (of course the opposite combination would be also fine). Good idea! Will improve that too. * Documentation The requirement of pushing down UPDATE/DELETE statements would not be easy to understand for non-expert users, so it seems that there is a room to enhance documentation. An idea is to define which expression is safe to send to remote first (it might need to mention the difference of semantics), and refer the definition from the place describing the requirement of pushing-down for SELECT, UPDATE and DELETE. Yeah, I also think that it would not necessarily easy for the users to understand which expression is safe to send. So I agree with that enhancement, but ISTM that it would be better to do that as a separate patch. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Mon, Aug 11, 2014 at 8:27 PM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Aug 11, 2014 at 4:46 PM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2011-10-04 20:52:59 +0900, Fujii Masao wrote: *** a/src/backend/access/transam/xact.c --- b/src/backend/access/transam/xact.c *** *** 1066,1071 RecordTransactionCommit(void) --- 1066,1074 (void) XLogInsert(RM_XACT_ID, XLOG_XACT_COMMIT_COMPACT, rdata); } + + /* Save timestamp of latest transaction commit record */ + pgstat_report_xact_end_timestamp(xactStopTimestamp); } Perhaps that pgstat_report() should instead be combined with the pgstat_report_xact_timestamp(0) in CommitTransaction()? Then the number of changecount increases and cacheline references would stay the same. The only thing that'd change would be a single additional assignment. Sounds good suggestion. I attached the updated version of the patch. I changed pgstat_report_xx functions like Andres suggested. While reading the patch again, I found it didn't handle the COMMIT/ABORT PREPARED case properly. According to the commit e74e090, now pg_last_xact_replay_timestamp() returns the timestamp of COMMIT/ABORT PREPARED. pg_last_xact_insert_timestamp() is mainly expected to be used to calculate the replication delay, so it also needs to return that timestam. But the patch didn't change 2PC code at all. We need to add pgstat_report_xact_end_timestamp() into FinishPreparedTransaction(), RecordTransactionCommitPrepared() or RecordTransactionAbortPrepared(). Done. Regards, -- Fujii Masao *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** *** 16116,16121 SELECT set_config('log_statement_stats', 'off', false); --- 16116,16124 primarypg_current_xlog_location/primary /indexterm indexterm + primarypg_last_xact_insert_timestamp/primary +/indexterm +indexterm primarypg_start_backup/primary /indexterm indexterm *** *** 16180,16185 SELECT set_config('log_statement_stats', 'off', false); --- 16183,16195 /row row entry +literalfunctionpg_last_xact_insert_timestamp()/function/literal +/entry +entrytypetimestamp with time zone/type/entry +entryGet last transaction log insert time stamp/entry + /row + row + entry literalfunctionpg_start_backup(parameterlabel/ typetext/ optional, parameterfast/ typeboolean/ /optional)/function/literal /entry entrytypepg_lsn/type/entry *** *** 16334,16339 postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup()); --- 16344,16356 /para para + functionpg_last_xact_insert_timestamp/ displays the time stamp of last inserted + transaction. This is the time at which the commit or abort WAL record for that transaction. + If there has been no transaction committed or aborted yet since the server has started, + this function returns NULL. +/para + +para For details about proper usage of these functions, see xref linkend=continuous-archiving. /para *** a/doc/src/sgml/high-availability.sgml --- b/doc/src/sgml/high-availability.sgml *** *** 862,867 primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass' --- 862,876 commandps/ command (see xref linkend=monitoring-ps for details). /para para + You can also calculate the lag in time stamp by comparing the last + WAL insert time stamp on the primary with the last WAL replay + time stamp on the standby. They can be retrieved using + functionpg_last_xact_insert_timestamp/ on the primary and + the functionpg_last_xact_replay_timestamp/ on the standby, + respectively (see xref linkend=functions-admin-backup-table and + xref linkend=functions-recovery-info-table for details). + /para + para You can retrieve a list of WAL sender processes via the link linkend=monitoring-stats-views-table literalpg_stat_replication//link view. Large differences between *** a/src/backend/access/transam/twophase.c --- b/src/backend/access/transam/twophase.c *** *** 156,167 static void RecordTransactionCommitPrepared(TransactionId xid, RelFileNode *rels, int ninvalmsgs, SharedInvalidationMessage *invalmsgs, ! bool initfileinval); static void RecordTransactionAbortPrepared(TransactionId xid, int nchildren, TransactionId *children, int nrels, ! RelFileNode *rels); static void ProcessRecords(char *bufptr, TransactionId xid, const TwoPhaseCallback callbacks[]); static void RemoveGXact(GlobalTransaction gxact); --- 156,169 RelFileNode *rels, int ninvalmsgs, SharedInvalidationMessage
Re: [HACKERS] replication commands and log_statements
On Wed, Aug 13, 2014 at 4:24 AM, Stephen Frost sfr...@snowman.net wrote: * Bruce Momjian (br...@momjian.us) wrote: On Tue, Aug 12, 2014 at 10:07:34AM +0530, Amit Kapila wrote: One difference is that replication commands are internally generated commands. Do we anywhere else log such internally generated commands with log_statement = all? Not entirely sure what you're referring to as 'internally generated' here.. Here 'internally generated' means that user doesn't execute those statements, rather the replication/backup code form these statements (IDENTIFY_SYSTEM, TIMELINE_HISTORY, BASE_BACKUP, ...) and send to server to get the appropriate results. While they can come from another backend, they don't have to. Good point --- we do not. In fact, this is similar to how we don't log SPI queries, e.g. SQL queries inside functions. We might want to enable that someday too. Yes, few days back I have seen one of the user expects such functionality. Refer below mail: http://www.postgresql.org/message-id/caa4ek1lvuirqnmjv1vtmrg_f+1f9e9-8rdgnwidscqvtps1...@mail.gmail.com Could we enable logging of both with a single GUC? I don't see those as the same at all. I'd like to see improved flexibility in this area, certainly, but don't want two independent considerations like these tied to one GUC.. Agreed, I also think both are different and shouldn't be logged with one GUC. Here the important thing to decide is which is the better way to proceed for allowing logging of replication commands such that it can allow us to extend it for more things. We have discussed below options: a. Make log_statement a list of comma separated options b. Have a separate parameter something like log_replication* c. Log it when user has mentioned option 'all' in log_statement. From future extensibility pov, I think option (a) is a good choice or we could even invent a new scheme for logging commands which would be something similar to log_min_messages where we can define levels level - 0 (none) level - 1 (dml) level - 2 (ddl) level - 4 (replication) level - 8 (all) Here if user specifies level = 2, then DDL commands will get logged and level = 3 would mean log dml and ddl commands. From backward compatibility, we can keep current parameter as it is and introduce this a new parameter. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] failures on barnacle (CLOBBER_CACHE_RECURSIVELY) because of memory leaks
I wrote: Tomas Vondra t...@fuzzy.cz writes: So after 83 days, the regression tests on barnacle completed, Hah, that's perseverance! and it smells like another memory leak in CacheMemoryContext, similar to those fixed in 078b2ed on May 18. Ugh, will look. I've been experimenting by running the create_view test in isolation (it's not quite self-contained, but close enough) and I find that there are two memory leaks exposed here: 1. The relcache.c functions that provide on-demand caching, namely RelationGetIndexList and RelationGetIndexAttrBitmap, are not careful to free the old values (if any) of the relcache fields they fill. I think we believed that the old values would surely always be null ... but that's not necessarily the case when doing a recursive cache reload of a system catalog, because we might've used/filled the fields while fetching the data needed to fill them. This results in a session-lifespan leak of data in CacheMemoryContext, which is what Tomas saw. (I'm not real sure that this is a live issue for RelationGetIndexAttrBitmap, but it demonstrably is for RelationGetIndexList.) 2. reloptions.c's parseRelOptions() leaks memory when disassembling a reloptions array. The leak is in the caller's memory context which is typically query-lifespan, so under normal circumstances this is not significant. However, a CLOBBER_CACHE_RECURSIVELY build can call this an awful lot of times within a single query. The queries in create_view that operate on views with security_barrier reloptions manage to eat quite a lot of memory this way. The attached patch fixes both of these things. I'm inclined to think it is not worth back-patching because neither effect could amount to anything noticeable without CLOBBER_CACHE_RECURSIVELY. Anyone think otherwise? regards, tom lane diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c index c7ad6f9..e5f0240 100644 *** a/src/backend/access/common/reloptions.c --- b/src/backend/access/common/reloptions.c *** parseRelOptions(Datum options, bool vali *** 912,925 /* Done if no options */ if (PointerIsValid(DatumGetPointer(options))) { ! ArrayType *array; Datum *optiondatums; int noptions; - array = DatumGetArrayTypeP(options); - - Assert(ARR_ELEMTYPE(array) == TEXTOID); - deconstruct_array(array, TEXTOID, -1, false, 'i', optiondatums, NULL, noptions); --- 912,921 /* Done if no options */ if (PointerIsValid(DatumGetPointer(options))) { ! ArrayType *array = DatumGetArrayTypeP(options); Datum *optiondatums; int noptions; deconstruct_array(array, TEXTOID, -1, false, 'i', optiondatums, NULL, noptions); *** parseRelOptions(Datum options, bool vali *** 959,964 --- 955,965 errmsg(unrecognized parameter \%s\, s))); } } + + /* It's worth avoiding memory leaks in this function */ + pfree(optiondatums); + if (((void *) array) != DatumGetPointer(options)) + pfree(array); } *numrelopts = numoptions; diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c index 10d300a..fd84853 100644 *** a/src/backend/utils/cache/relcache.c --- b/src/backend/utils/cache/relcache.c *** RelationGetIndexList(Relation relation) *** 3646,3651 --- 3646,3652 ScanKeyData skey; HeapTuple htup; List *result; + List *oldlist; char replident = relation-rd_rel-relreplident; Oid oidIndex = InvalidOid; Oid pkeyIndex = InvalidOid; *** RelationGetIndexList(Relation relation) *** 3737,3742 --- 3738,3744 /* Now save a copy of the completed list in the relcache entry. */ oldcxt = MemoryContextSwitchTo(CacheMemoryContext); + oldlist = relation-rd_indexlist; relation-rd_indexlist = list_copy(result); relation-rd_oidindex = oidIndex; if (replident == REPLICA_IDENTITY_DEFAULT OidIsValid(pkeyIndex)) *** RelationGetIndexList(Relation relation) *** 3748,3753 --- 3750,3758 relation-rd_indexvalid = 1; MemoryContextSwitchTo(oldcxt); + /* Don't leak the old list, if there is one */ + list_free(oldlist); + return result; } *** RelationGetIndexAttrBitmap(Relation rela *** 4141,4146 --- 4146,4159 list_free(indexoidlist); + /* Don't leak any old values of these bitmaps */ + bms_free(relation-rd_indexattr); + relation-rd_indexattr = NULL; + bms_free(relation-rd_keyattr); + relation-rd_keyattr = NULL; + bms_free(relation-rd_idattr); + relation-rd_idattr = NULL; + /* * Now save copies of the bitmaps in the relcache entry. We intentionally * set rd_indexattr last, because that's the one that signals validity of -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal for 9.5: monitoring lock time for slow queries
On Wed, Aug 13, 2014 at 4:59 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Any idea how to show a lock time in some practical form together with logged slow query? Doing a join on pg_stat_activity and pg_locks is not going to help much as you could only get the moment when query has started or its state has changed. Have you thought about the addition of a new column in pg_locks containing the timestamp of the moment a lock has been taken? I am sure that we are concerned about the performance impact that extra calls to gettimeofday could have though... Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On Wed, Aug 13, 2014 at 2:32 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-08-06 15:42:08 +0530, Amit Kapila wrote: On Tue, Aug 5, 2014 at 9:21 PM, Robert Haas robertmh...@gmail.com wrote: This essentially removes BgWriterDelay, but it's still mentioned in BgBufferSync(). Looking further, I see that with the patch applied, BgBufferSync() is still present in the source code but is no longer called from anywhere. Please don't submit patches that render things unused without actually removing them; it makes it much harder to see what you've changed. I realize you probably left it that way for testing purposes, but you need to clean such things up before submitting. Likewise, if you've rendered GUCs or statistics counters removed, you need to rip them out, so that the scope of the changes you've made is clear to reviewers. FWIW, I found this email amost unreadable because it misses quoting signs after linebreaks in quoted content. I think I have done something wrong while replying to Robert's mail, the main point in that mail was trying to see if there is any major problem incase we have separate process (bgreclaim) to populate freelist. One thing which I thought could be problematic is to put a buf in freelist which has usage_count as zero and is *dirty*. Please do let me know if you want clarification for something in particular. Overall, the main changes required in patch as per above feedback are: 1. add an additional counter for the number of those allocations not satisfied from the free list, with a name like buffers_alloc_clocksweep. 2. Autotune the low and high threshold values for buffers in freelist. In the patch, I have kept them as hard-coded values. 3. For populating freelist, have a separate process (bgreclaim) instead of doing it by bgwriter. There are other things also which I need to take care as per feedback like some change in locking strategy and code. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] 9.5: Memory-bounded HashAgg
On Tue, 2014-08-12 at 14:58 +0200, Tomas Vondra wrote: CREATE AGGREGATE myaggregate ( ... SERIALIZE_FUNC = 'dump_data', DESERIALIZE_FUNC = 'read_data', ... ); Seems reasonable. I don't see why it should get messy? In the end, you have a chunk of data and a hash for it. Perhaps it's fine; I'd have to see the approach. It just means you need to walk through the hash table, look at the hashes and dump ~50% of the groups to a file. If you have fixed-size states, why would you *want* to remove the group? What is gained? One thing I like about my simple approach is that it returns a good number of groups after each pass, and then those are completely finished (returned to the operator above, even). That's impossible with HashJoin because the hashing all needs to be done before the probe phase begins. The weakness of my approach is the array_agg case that you mention, because this approach doesn't offer a way to dump out transition states. It seems like that could be added later, but let me know if you see a problem there. I think you're missing the point, here. You need to compute the hash in both cases. And then you either can do a lookup or just peek at the first few bits of the hash to see whether it's in the current batch or not. I understood that. The point I was trying to make (which might or might not be true) was that: (a) this only matters for a failed lookup, because a successful lookup would just go in the hash table anyway; and (b) a failed lookup probably doesn't cost much compared to all of the other things that need to happen along that path. I should have chosen a better example though. For instance: if the lookup fails, we need to write the tuple, and writing the tuple is sure to swamp the cost of a failed hash lookup. is much faster than a lookup. Also, as the hash table grows (beyond L3 cache size, which is a few MBs today), it becomes much slower in my experience - that's one of the lessons I learnt while hacking on the hashjoin. And we're dealing with hashagg not fitting into work_mem, so this seems to be relevant. Could be, but this is also the path that goes to disk, so I'm not sure how significant it is. Because I suspect there are costs in having an extra file around that I'm not accounting for directly. We are implicitly assuming that the OS will keep around enough buffers for each BufFile to do sequential writes when needed. If we create a zillion partitions, then either we end up with random I/O or we push the memory burden into the OS buffer cache. Assuming I understand it correctly, I think this logic is broken. Are you saying We'll try to do memory-bounded hashagg, but not for the really large datasets because of fear we might cause random I/O? No, the memory is still bounded even for very high cardinality inputs (ignoring array_agg case for now). When a partition is processed later, it also might exhaust work_mem, and need to write out tuples to its own set of partitions. This allows memory-bounded execution to succeed even if the number of partitions each iteration is one, though it will result in repeated I/O for the same tuple. While I certainly understand your concerns about generating excessive amount of random I/O, I think the modern filesystem are handling that just fine (coalescing the writes into mostly sequential writes, etc.). Also, current hardware is really good at handling this (controllers with write cache, SSDs etc.). All of that requires memory. We shouldn't dodge a work_mem limit by using the kernel's memory, instead. Also, if hash-join does not worry about number of batches, why should hashagg worry about that? I expect the I/O patterns to be very similar. One difference with HashJoin is that, to create a large number of batches, the inner side must be huge, which is not the expected operating mode for HashJoin[1]. Regardless, every partition that is active *does* have a memory cost. HashJoin might ignore that cost, but that doesn't make it right. I think the right analogy here is to Sort's poly-phase merge -- it doesn't merge all of the runs at once; see the comments at the top of tuplesort.c. In other words, sometimes it's better to have fewer partitions (for hashing) or merge fewer runs at once (for sorting). It does more repeated I/O, but the I/O is more sequential. In any case, trying to fix this by limiting number of partitions seems like a bad approach. I think factoring those concerns into a costing model is more appropriate. Fair enough. I haven't modeled the cost yet; and I agree that an upper limit is quite crude. (a) COUNT(DISTINCT) - this is solved by a custom aggregate Is there a reason we can't offer a hash-based strategy for this one? It would have to be separate hash tables for different aggregates, but it seems like it could work. (b) bad estimate of required memory - this is common for aggregates passing 'internal' state (planner uses some
Re: [HACKERS] Support for N synchronous standby servers
On Mon, Aug 11, 2014 at 4:38 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Mon, Aug 11, 2014 at 4:26 PM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Aug 11, 2014 at 2:10 PM, Michael Paquier michael.paqu...@gmail.com wrote: Oh, that worked in my machine, too, this time... I did something wrong. Sorry for the noise. No problem, thanks for spending time testing. Probably I got the similar but another problem. I set synchronous_standby_num to 2 and started up two synchronous standbys. When I ran write transactions, they were successfully completed. That's OK. I sent the SIGSTOP signal to the walreceiver process in one of sync standbys, and then ran write transactions again. In this case, they must not be completed because their WAL cannot be replicated to the standby that its walreceiver was stopped. But they were successfully completed. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal for 9.5: monitoring lock time for slow queries
Michael Paquier michael.paqu...@gmail.com writes: Doing a join on pg_stat_activity and pg_locks is not going to help much as you could only get the moment when query has started or its state has changed. Have you thought about the addition of a new column in pg_locks containing the timestamp of the moment a lock has been taken? I am sure that we are concerned about the performance impact that extra calls to gettimeofday could have though... In theory this could be driven off the same gettimeofday needed to start the deadlock_timeout timer. Not sure how messy that'd be. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal for 9.5: monitoring lock time for slow queries
Hi 2014-08-13 6:18 GMT+02:00 Michael Paquier michael.paqu...@gmail.com: On Wed, Aug 13, 2014 at 4:59 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Any idea how to show a lock time in some practical form together with logged slow query? Doing a join on pg_stat_activity and pg_locks is not going to help much as you could only get the moment when query has started or its state has changed. Have you thought about the addition of a new column in pg_locks containing the timestamp of the moment a lock has been taken? I am sure that we are concerned about the performance impact that extra calls to gettimeofday could have though... Regards, There are two relative independent tasks a) monitor and show total lock time of living queries b) monitor and log total lock time of executed queries. I am interested by @b now. When we work with slow query log, then we would to identify reason for long duration. Locks are important source of these queries on some systems. What I know, we do monitoring these times for deadlock identification trigger and for log_lock_waits - so there should not be any new pressure to performance. Regards Pavel -- Michael
Re: [HACKERS] proposal for 9.5: monitoring lock time for slow queries
2014-08-13 7:19 GMT+02:00 Tom Lane t...@sss.pgh.pa.us: Michael Paquier michael.paqu...@gmail.com writes: Doing a join on pg_stat_activity and pg_locks is not going to help much as you could only get the moment when query has started or its state has changed. Have you thought about the addition of a new column in pg_locks containing the timestamp of the moment a lock has been taken? I am sure that we are concerned about the performance impact that extra calls to gettimeofday could have though... In theory this could be driven off the same gettimeofday needed to start the deadlock_timeout timer. Not sure how messy that'd be. we use it in out custom patch without problems Pavel regards, tom lane