Re: [HACKERS] delta relations in AFTER triggers

2014-08-12 Thread Amit Khandekar
On 7 August 2014 19:49, Kevin Grittner kgri...@ymail.com wrote:
 Amit Khandekar amit.khande...@enterprisedb.com wrote:
 On 21 June 2014 23:36, Kevin Grittner kgri...@ymail.com wrote:
 Kevin Grittner kgri...@ymail.com wrote:
 I didn't change the tuplestores to TID because it seemed to me that
 it would preclude using transition relations with FDW triggers, and
 it seemed bad not to support that.  Does anyone see a way around
 that, or feel that it's OK to not support FDW triggers in this
 regard?

 I think it is ok to use tuplestores for now, but as mentioned by you
 somewhere else in the thread, later on we should change this to using
 tids as an optimization.

 Well, the optimization would probably be to use a tuplestore of
 tids referencing modified tuples in the base table, rather than a
 tuplestore of the data itself.  But I think we're in agreement.
Right, that's what I meant.

 I see that the tupelstores for transition tables are stored per query
 depth. If the
 DML involves a table that has multiple child tables, it seems as though
 a single tuplestore would be shared among all these tables. That means
 if we define
 such triggers using transition table clause for all the child tables, then
 the trigger function for a child table will see tuples from other child 
 tables
 as well. Is that true ?

 I don't think so.  I will make a note of the concern to confirm by testing.
Thanks. I will wait for this.


 I tried to google some SQLs that use REFERENCING clause with triggers.
 It looks like in some database systems, even the WHEN clause of CREATE 
 TRIGGER
 can refer to a transition table, just like how it refers to NEW and
 OLD row variables.

 For e.g. :
 CREATE TRIGGER notify_dept
   AFTER UPDATE ON weather
   REFERENCING NEW_TABLE AS N_TABLE
   NEW AS N_ROW
   FOR EACH ROW
   WHEN ((SELECT AVG (temperature) FROM N_TABLE)  10)
   BEGIN
 notify_department(N_ROW.temperature, N_ROW.city);
   END

 Above, it is used to get an aggregate value of all the changed rows. I think
 we do not currently support aggregate expressions in the where clause, but 
 with
 transition tables, it makes more sense to support it later if not now.

 Interesting point; I had not thought about that.  Will see if I can
 include support for that in the patch for the next CF; failing
 that; I will at least be careful to not paint myself into a corner
 where it is unduly hard to do later.
We currently do the WHEN checks while saving the AFTER trigger events,
and also add the tuples one by one while saving the trigger events. If
and when we support WHEN, we would need to make all of these tuples
saved *before* the first WHEN clause execution, and that seems to
demand more changes in the current trigger code.

More comments below :

---

Are we later going to extend this support for constraint triggers as
well ? I think these variables would make sense even for deferred
constraint triggers. I think we would need some more changes if we
want to support this, because there is no query depth while executing
deferred triggers and so the tuplestores might be inaccessible with
the current design.

---

The following (and similarly other) statements :
trigdesc-trig_insert_new_table |=
(TRIGGER_FOR_INSERT(tgtype) 
TRIGGER_USES_TRANSITION_TABLE(trigger-tgnewtable)) ? true : false;

can be simplfied to :

trigdesc-trig_insert_new_table |=
(TRIGGER_FOR_INSERT(tgtype) 
TRIGGER_USES_TRANSITION_TABLE(trigger-tgnewtable));

-

AfterTriggerExecute()
{
.
/*
* Set up the tuplestore information.
*/
if (trigdesc-trig_delete_old_table || trigdesc-trig_update_old_table)
LocTriggerData.tg_olddelta =
GetCurrentTriggerDeltaTuplestore(afterTriggers-old_tuplestores);
.
}
Above, trigdesc-trig_update_old_table is true if at least one of the
triggers of the table has transition tables. So this will cause the
delta table to be available on all of the triggers of the table even
if only one out of them uses transition tables. May be this would be
solved by using LocTriggerData.tg_trigger-tg_olddelta rather than
trigdesc-trig_update_old_table to decide whether
LocTriggerData.tg_olddelta should be assigned.

---

GetCurrentTriggerDeltaTuplestore() is now used for getting fdw
tuplestore also, so it should have a more general name.

---

#define TRIGGER_USES_TRANSITION_TABLE(namepointer) \
((namepointer) != (char *) NULL  (*(namepointer)) != '\0')
Since all other code sections assume trigger-tgoldtable to be
non-NULL, we can skip the NULL check above.

---

We should add a check to make sure the user does not supply same name
for OLD TABLE and NEW TABLE.

---

The below code comment needs to be changed.
 * Only tables are initially supported, and only for AFTER EACH STATEMENT
 * triggers, but other permutations are accepted by the parser so we can give
 * a meaningful message from C code.
The comment implies that with ROW triggers we do not support TABLE
transition 

[HACKERS] Incorrect log message and checks in pgrecvlogical

2014-08-12 Thread Michael Paquier
Hi,

While looking at pg_recvlogical code, I noticed that a couple of error
messages are incorrect when invoking some combinations of --create,
--start or --drop.

For example, here --init should be --create, --stop should be --drop:
$ pg_recvlogical --create --drop --slot foo
pg_recvlogical: cannot use --init or --start together with --stop
Try pg_recvlogical --help for more information.

Also, when checking the combination startpos  (create || drop), I
think that we should check that startpos is InvalidXLogRecPtr.

The patch attached fixes all those things. It should be back-patched to 9.4.
Regards,
-- 
Michael
diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c
index dbc002d..f3b0f34 100644
--- a/src/bin/pg_basebackup/pg_recvlogical.c
+++ b/src/bin/pg_basebackup/pg_recvlogical.c
@@ -814,15 +814,15 @@ main(int argc, char **argv)
 
 	if (do_drop_slot  (do_create_slot || do_start_slot))
 	{
-		fprintf(stderr, _(%s: cannot use --init or --start together with --stop\n), progname);
+		fprintf(stderr, _(%s: cannot use --create or --start together with --drop\n), progname);
 		fprintf(stderr, _(Try \%s --help\ for more information.\n),
 progname);
 		exit(1);
 	}
 
-	if (startpos  (do_create_slot || do_drop_slot))
+	if (startpos != InvalidXLogRecPtr  (do_create_slot || do_drop_slot))
 	{
-		fprintf(stderr, _(%s: cannot use --init or --stop together with --startpos\n), progname);
+		fprintf(stderr, _(%s: cannot use --create or --drop together with --startpos\n), progname);
 		fprintf(stderr, _(Try \%s --help\ for more information.\n),
 progname);
 		exit(1);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog add synchronous mode

2014-08-12 Thread Michael Paquier
On Fri, Aug 8, 2014 at 5:10 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Thanks for the review! Applied the patch.
I noticed that the tab padding for the new option -F in the getops
switch is incorrect. Attached patch corrects that. pgindent would have
caught that anyway, but it doesn't hurt to be correct now.
Thanks,
-- 
Michael
diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c
index 0b7af54..4483c87 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -441,15 +441,15 @@ main(int argc, char **argv)
 			case 'n':
 noloop = 1;
 break;
-		case 'F':
-			fsync_interval = atoi(optarg) * 1000;
-			if (fsync_interval  -1000)
-			{
-fprintf(stderr, _(%s: invalid fsync interval \%s\\n),
-		progname, optarg);
-exit(1);
-			}
-			break;
+			case 'F':
+fsync_interval = atoi(optarg) * 1000;
+if (fsync_interval  -1000)
+{
+	fprintf(stderr, _(%s: invalid fsync interval \%s\\n),
+			progname, optarg);
+	exit(1);
+}
+break;
 			case 'v':
 verbose++;
 break;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Inconsistent use of --slot/-S in pg_receivexlog and pg_recvlogical

2014-08-12 Thread Michael Paquier
Hi,

I noticed that pg_receivexlog is able to use --slot but not -S, even
if the code is written this way. Attached is a patch correcting that.
This makes pg_receivexlog consistent with pg_recvlogical regarding the
slot option.
IMHO, this should be backpatched to REL9_4_STABLE.
Regards,
-- 
Michael
diff --git a/doc/src/sgml/ref/pg_receivexlog.sgml b/doc/src/sgml/ref/pg_receivexlog.sgml
index c15776f..5916b8f 100644
--- a/doc/src/sgml/ref/pg_receivexlog.sgml
+++ b/doc/src/sgml/ref/pg_receivexlog.sgml
@@ -242,6 +242,7 @@ PostgreSQL documentation
  /varlistentry
 
  varlistentry
+  termoption-S replaceableslotname/replaceable/option/term
   termoption--slot=replaceable class=parameterslotname/replaceable/option/term
   listitem
 para
diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c
index 0b7af54..a8b9ad3 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -77,7 +77,7 @@ usage(void)
 	printf(_(  -U, --username=NAMEconnect as specified database user\n));
 	printf(_(  -w, --no-password  never prompt for password\n));
 	printf(_(  -W, --password force password prompt (should happen automatically)\n));
-	printf(_(  --slot=SLOTNAMEreplication slot to use\n));
+	printf(_(  -S, --slot=SLOTNAMEreplication slot to use\n));
 	printf(_(\nReport bugs to pgsql-b...@postgresql.org.\n));
 }
 
@@ -394,7 +394,7 @@ main(int argc, char **argv)
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, D:d:h:p:U:s:nF:wWv,
+	while ((c = getopt_long(argc, argv, D:d:h:p:U:s:S:nF:wWv,
 			long_options, option_index)) != -1)
 	{
 		switch (c)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.auto.conf and reload

2014-08-12 Thread Fujii Masao
On Tue, Aug 12, 2014 at 1:23 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Mon, Aug 11, 2014 at 11:08 PM, Fujii Masao masao.fu...@gmail.com wrote:


 While updating the patch, I found that the ConfigVariable which
 is removed from list has the fields that the memory has been already
 allocated but we forgot to free them. So I included the code to free
 them in the patch.

 Yes, that is right.

 ! /*
 ! * Pick up only the data_directory if DataDir is not set, which
 ! * means that the configuration file is read for the first time and
 ! * PG_AUTOCONF_FILENAME file cannot be read yet. In this case,
 ! * we shouldn't pick any settings except the data_directory
 ! * from postgresql.conf because they might be overwritten
 ! * with the settings in PG_AUTOCONF_FILENAME file which will be
 ! * read later. OTOH, since it's ensured that data_directory doesn't
 ! * exist in PG_AUTOCONF_FILENAME file, it will never be overwritten
 ! * later.
 ! */
 ! else

 It is bit incovinient to read this part of code, some other places in
 same file use comment inside else construct which seems to be
 better. one example is as below:

Yep, updated that way.


 else
 {
 /*
 * ordinary variable, append to list.  For multiple items of
 * same parameter, retain only which comes later.
 */


 I have marked this as Ready For Committer.

Thanks for the review! Committed.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] minor pgbench doc fix

2014-08-12 Thread Fabien COELHO


Attached patch removes spurious brackets from pgbench documentation.

--
Fabien.diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index b7d88f3..1551686 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -748,7 +748,7 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/
 
varlistentry
 term
- literal\setrandom replaceablevarname/ replaceablemin/ replaceablemax/ [ uniform | [ { gaussian | exponential } replaceablethreshold/ ] ]/literal
+ literal\setrandom replaceablevarname/ replaceablemin/ replaceablemax/ [ uniform | { gaussian | exponential } replaceablethreshold/ ]/literal
  /term
 
 listitem

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enhancing pgbench parameter checking

2014-08-12 Thread Tatsuo Ishii
Fabien,

 Find attached a patch which adds these changes to your current
 version.
 
 Thank you for the review and patch. Looks good. I changed the status
 to Ready for Committer. I will wait for a fewd days and if there's
 no objection, I will commit the patch.

I have committed the patch. Thanks!

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4 logical replication - walsender keepalive replies

2014-08-12 Thread Andres Freund
On 2014-08-11 23:52:32 +0200, Andres Freund wrote:
 On 2014-08-11 17:22:27 -0400, Steve Singer wrote:
  On 07/14/2014 01:19 PM, Steve Singer wrote:
  On 07/06/2014 10:11 AM, Andres Freund wrote:
  Hi Steve,
  
  Right. I thought about this for a while, and I think we should change
  two things. For one, don't request replies here. It's simply not needed,
  as this isn't dealing with timeouts. For another don't just check
  -flush
   sentPtr but also  -write  sentPtr. The reason we're sending these
  feedback messages is to inform the 'logical standby' that there's been
  WAL activity which it can't see because they don't correspond to
  anything that's logically decoded (e.g. vacuum stuff).
  Would that suit your needs?
  
  Greetings,
  
  Yes I think that will work for me.
  I tested with the attached patch that I think  does what you describe and
  it seems okay.
  
  
  
  
  Any feedback on this?  Do we want that change for 9.4, or do we want
  something else?
 
 I plan to test and apply it in the next few days. Digging myself from
 under stuff from before my holiday right now...

Committed. Thanks and sorry for the delay.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] minor pgbench doc fix

2014-08-12 Thread Fujii Masao
On Tue, Aug 12, 2014 at 5:12 PM, Fabien COELHO coe...@cri.ensmp.fr wrote:

 Attached patch removes spurious brackets from pgbench documentation.

Applied. Thanks!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_receivexlog --status-interval add fsync feedback

2014-08-12 Thread furuyao
Hi all,

This patch is to add setting to send status packets after fsync to 
--status-interval of pg_receivexlog.

If -1 is specified to --status-interval, status packets is sent as soon as 
after fsync.

Others are the same as when 0 is specified to --status-interval.
When requested by the server, send the status packet. 

To use replication slot, and specify -1 to --fsync-interval.
As a result, simple WAL synchronous replication can be performed.

In simple WAL synchronization, it is possible to test the replication.
If there was F/O in synchronous replication, 
it is available in the substitute until standby is restored.

Regards,

--
Furuya Osamu



pg_receivexlog-fsync-feedback-v1.patch
Description: pg_receivexlog-fsync-feedback-v1.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog add synchronous mode

2014-08-12 Thread Fujii Masao
On Tue, Aug 12, 2014 at 4:34 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Fri, Aug 8, 2014 at 5:10 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Thanks for the review! Applied the patch.
 I noticed that the tab padding for the new option -F in the getops
 switch is incorrect. Attached patch corrects that. pgindent would have
 caught that anyway, but it doesn't hurt to be correct now.

Yeah, that's a problem. But, as you clearly said, upcoming pgindent
would fix this kind of problem. So I'm not sure if it's really worth fixing
only this case independently.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2014-08-12 Thread Shigeru Hanada
Hi Fujita-san,

Issues addressed by Eitoku-san were fixed properly, but he found a bug
and a possible enhancement  in the v2 patch.

* push-down check misses delete triggers
update_is_pushdown_safe() seems to have a bug that it misses the
existence of row-level delete trigger.  DELETE statement executed
against a foreign table which has row-level delete trigger is pushed
down to remote, and consequently no row-level delete trigger is fired.

* further optimization
Is there any chance to consider further optimization by passing the
operation type (UPDATE|DELETE) of undergoing statement to
update_is_pushdown_safe()?  It seems safe to push down UPDATE
statement when the target foreign table has no update trigger even it
has a delete trigger (of course the opposite combination would be also
fine).

* Documentation
The requirement of pushing down UPDATE/DELETE statements would not be
easy to understand for non-expert users, so it seems that there is a
room to enhance documentation.  An idea is to define which expression
is safe to send to remote first (it might need to mention the
difference of semantics), and refer the definition from the place
describing the requirement of pushing-down for SELECT, UPDATE and
DELETE.

2014-08-04 20:30 GMT+09:00 Etsuro Fujita fujita.ets...@lab.ntt.co.jp:
 (2014/07/30 17:22), Etsuro Fujita wrote:

 (2014/07/29 0:58), Robert Haas wrote:

 On Fri, Jul 25, 2014 at 3:39 AM, Albe Laurenz
 laurenz.a...@wien.gv.at wrote:

 Shigeru Hanada wrote:

 * Naming of new behavior
 You named this optimization Direct Update, but I'm not sure that
 this is intuitive enough to express this behavior.  I would like to
 hear opinions of native speakers.


 How about batch foreign update or batch foreign modification?
 (Disclaimer: I'm not a native speaker either.)


 I think direct update sounds pretty good.  Batch does not sound as
 good to me, since it doesn't clearly describe what makes this patch
 special as opposed to some other grouping of updates that happens to
 produce a speedup.


 I agree with Robert on that point.

 Another term that might be used is update pushdown, since we are
 pushing the whole update to the remote server instead of having the
 local server participate.  Without looking at the patch, I don't have
 a strong opinion on whether that's better than direct update in this
 context.


 Update Pushdown is fine with me.

 If there are no objections of others, I'll change the name from Direct
 Update to Update Pushdown.


 Done.  (I've left deparseDirectUpdateSql/deparseDirectDeleteSql as-is,
 though.)

 Other changes:

 * Address the comments from Eitoku-san.
 * Add regression tests.
 * Fix a bug, which fails to show the actual row counts in EXPLAIN ANALYZE
 for UPDATE/DELETE without a RETURNING clause.
 * Rebase to HEAD.

 Please find attached an updated version of the patch.


 Thanks,

 Best regards,
 Etsuro Fujita



-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-12 Thread Marco Nenciarini
As I already stated, timestamps will be only used to early detect
changed files. To declared two files identical they must have same size,
same mtime and same *checksum*.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] WAL format and API changes (9.5)

2014-08-12 Thread Heikki Linnakangas

On 08/05/2014 03:50 PM, Michael Paquier wrote:

- When testing pg_xlogdump I found an assertion failure for record
XLOG_GIN_INSERT:
Assertion failed: (((bool) (((const void*)(insertData-newitem.key) !=
((void*)0))  ((insertData-newitem.key)-ip_posid != 0, function
gin_desc, file gindesc.c, line 127.


I could not reproduce this. Do you have a test case?


- Would it be a win to add an assertion check for (CritSectionCount == 0)
in XLogEnsureRecordSpace?


Maybe; added.


- There is no mention of REGBUF_NO_IMAGE in transam/README.


Fixed


- This patch adds a lot of blocks like that in the redo code:
+ if (BufferIsValid(buffer))
+   UnlockReleaseBuffer(buffer);
What do you think about adding a new macro like UnlockBufferIfValid(buffer)?


I don't think such a macro would be an improvement in readability, TBH.


- Don't we need a call to XLogBeginInsert() in XLogSaveBufferForHint():
+   int flags = REGBUF_FORCE_IMAGE;
+   if (buffer_std)
+   flags |= REGBUF_STANDARD;
+   XLogRegisterBuffer(0, buffer, flags);
[...]
-   recptr = XLogInsert(RM_XLOG_ID, XLOG_FPI, rdata);
+   recptr = XLogInsert(RM_XLOG_ID, XLOG_FPI);


Indeed, fixed. With that, initdb -k works, I had not tested the patch 
with page checksums before.



- There is still a TODO item in ValidXLogRecordHeader to check if block
data header is valid or not. Just mentioning.


Removed.

Previously, we ValidXLogRecordHeader checked that the xl_tot_len of a 
record doesn't exceed the size of header + xl_len + (space needed for 
max number of bkp blocks). But the new record format has no limit on the 
amount of data registered with a buffer, so such a test is not possible 
anymore. I added the TODO item there to remind me that I need to check 
if we need to do something else there instead, but I think it's fine as 
it is. We still sanity-check the block data in ValidXLogRecord; the 
ValidXLogRecordHeader() check happens before we have read the whole 
record header in memory.



- XLogRecGetBlockRefIds needing an already-allocated array for *out is not
user-friendly. Cannot we just do all the work inside this function?


I agree it's not a nice API. Returning a palloc'd array would be nicer, 
but this needs to work outside the backend too (e.g. pg_xlogdump). It 
could return a malloc'd array in frontend code, but it's not as nice. On 
the whole, do you think that be better than the current approach?



- All the functions in xlogconstruct.c could be defined in a separate
header xlogconstruct.h. What they do is rather independent from xlog.h. The
same applies to all the structures and functions of xlogconstruct.c in
xlog_internal.h: XLogRecordAssemble, XLogRecordAssemble,
InitXLogRecordConstruct. (worth noticing as well that the only reason
XLogRecData is defined externally of xlogconstruct.c is that it is being
used in XLogInsert()).


Hmm. I left the defines for xlogconstruct.c functions that are used to 
build a WAL record in xlog.h, so that it's not necessary to include both 
xlog.h and xlogconstruct.h in all files that write a WAL record 
(XLogInsert() is defined in xlog.h).


But perhaps it would be better to move the prototype for XLogInsert to 
xlogconstruct.h too? That might be a better division; everything needed 
to insert a WAL record would be in xlogconstruct.h, and xlog.h would 
left for more internal functions related to WAL. And perhaps rename the 
files to xloginsert.c and xloginsert.h.


Here's a new patch with those little things fixed.

- Heikki


wal-format-and-api-changes-6.patch.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #9652: inet types don't support min/max

2014-08-12 Thread Haribabu Kommi
On Mon, Aug 4, 2014 at 3:22 PM, Asif Naeem anaeem...@gmail.com wrote:
 Sorry for not being clear, above mentioned test is related to following doc 
 (sgml) changes that seems not working as described i.e.

 Table 9-35. cidr and inet Functions

 FunctionReturn TypeDescriptionExampleResult

 max(inet, inet) inetlarger of two inet typesmax('192.168.1.5', 
 '192.168.1.4')192.168.1.5
 min(inet, inet) inetsmaller of two inet typesmin('192.168.1.5', 
 '192.168.1.4')192.168.1.4

 Can you please elaborate these sgml change ?

I thought of adding them for newly added network functions but
mistakenly I kept the names as max and min.
Anyway with your suggestion in the earlier mail, these changes are not required.

I removed these changes in the attached patch.
Thanks for reviewing the patch.

Regards,
Hari Babu
Fujitsu Australia


inet_agg_v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback

2014-08-12 Thread Fujii Masao
On Tue, Aug 12, 2014 at 6:19 PM,  furu...@pm.nttdata.co.jp wrote:
 Hi all,

 This patch is to add setting to send status packets after fsync to 
 --status-interval of pg_receivexlog.

 If -1 is specified to --status-interval, status packets is sent as soon as 
 after fsync.

I don't think that it's good idea to control that behavior by using
--status-interval. I'm sure that there are some users who both want
that behavior and want set the maximum interval between a feedback
is sent back to the server because these settings are available in
walreceiver. But your version of --status-interval doesn't allow those
settings at all. That is, if set to -1, the maximum interval between
each feedback cannot be set. OTOH, if set to the positive number,
feedback-as-soon-as-fsync behavior cannot be enabled. Therefore,
I'm thinking that it's better to introduce new separate option for that
behavior.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inconsistent use of --slot/-S in pg_receivexlog and pg_recvlogical

2014-08-12 Thread Fujii Masao
On Tue, Aug 12, 2014 at 4:50 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Hi,

 I noticed that pg_receivexlog is able to use --slot but not -S, even
 if the code is written this way. Attached is a patch correcting that.
 This makes pg_receivexlog consistent with pg_recvlogical regarding the
 slot option.
 IMHO, this should be backpatched to REL9_4_STABLE.
 Regards,

Looks good to me. And, I could not find the reason why only -S was not
added, in the past discussion.

Anyway, barring any objection, I will commit this.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL regression test suite

2014-08-12 Thread Heikki Linnakangas

On 08/05/2014 10:46 PM, Robert Haas wrote:

On Mon, Aug 4, 2014 at 10:38 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

Now that we use TAP for testing client tools, I think we can use that to
test various SSL options too. I came up with the attached. Comments?

It currently assumes that the client's and the server's hostnames are
postgres-client.test and postgres-server.test, respectively. That makes
it a bit tricky to run on a single systme. The README includes instructions;
basically you need to set up an additional loopback device, and add entries
to /etc/hosts for that.


That seems so onerous that I think few people will do it, and not
regularly, resulting in the tests breaking and nobody noticing.
Reconfiguring the loopback interface like that requires root
privilege, and won't survive a reboot, and doing it in the system
configuration will require hackery specific to the particular flavor
of Linux you're running, and probably other hackery on non-Linux
systems (never mind Windows).


Yeah, you're probably right.


Why can't you make it work over 127.0.0.1?


I wanted it to be easy to run the client and the server on different 
hosts. As soon as we have more than one SSL implementation, it would be 
really nice to do interoperability testing between a client and a server 
using different implementations.


Also, to test sslmode=verify-full, where the client checks that the 
server certificate's hostname matches the hostname that it connected to, 
you need to have two aliases for the same server, one that matches the 
certificate and one that doesn't. But I think I found a way around that 
part; if the certificate is set up for localhost, and connect to 
127.0.0.1, you get a mismatch.


So, I got rid of the DNS setup, it only depends localhost/127.0.0.1 now. 
Patch attached. That means that it's not easy to run the client and the 
server on different hosts, but we can improve that later.


- Heikki
commit 140c590ca86a0ba4a6b422e4b618cd459b84175f
Author: Heikki Linnakangas heikki.linnakan...@iki.fi
Date:   Wed Aug 6 18:43:39 2014 +0300

Refactor cert file stuff in client

diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index f950fc3..cee7b2e 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -780,57 +780,21 @@ destroy_ssl_system(void)
 static int
 initialize_SSL(PGconn *conn)
 {
-	struct stat buf;
-	char		homedir[MAXPGPATH];
-	char		fnbuf[MAXPGPATH];
-	char		sebuf[256];
-	bool		have_homedir;
-	bool		have_cert;
 	EVP_PKEY   *pkey = NULL;
-
-	/*
-	 * We'll need the home directory if any of the relevant parameters are
-	 * defaulted.  If pqGetHomeDirectory fails, act as though none of the
-	 * files could be found.
-	 */
-	if (!(conn-sslcert  strlen(conn-sslcert)  0) ||
-		!(conn-sslkey  strlen(conn-sslkey)  0) ||
-		!(conn-sslrootcert  strlen(conn-sslrootcert)  0) ||
-		!(conn-sslcrl  strlen(conn-sslcrl)  0))
-		have_homedir = pqGetHomeDirectory(homedir, sizeof(homedir));
-	else	/* won't need it */
-		have_homedir = false;
-
-	/* Read the client certificate file */
-	if (conn-sslcert  strlen(conn-sslcert)  0)
-		strncpy(fnbuf, conn-sslcert, sizeof(fnbuf));
-	else if (have_homedir)
-		snprintf(fnbuf, sizeof(fnbuf), %s/%s, homedir, USER_CERT_FILE);
-	else
-		fnbuf[0] = '\0';
-
-	if (fnbuf[0] == '\0')
-	{
-		/* no home directory, proceed without a client cert */
-		have_cert = false;
-	}
-	else if (stat(fnbuf, buf) != 0)
-	{
-		/*
-		 * If file is not present, just go on without a client cert; server
-		 * might or might not accept the connection.  Any other error,
-		 * however, is grounds for complaint.
-		 */
-		if (errno != ENOENT  errno != ENOTDIR)
-		{
-			printfPQExpBuffer(conn-errorMessage,
-			   libpq_gettext(could not open certificate file \%s\: %s\n),
-			  fnbuf, pqStrerror(errno, sebuf, sizeof(sebuf)));
-			return -1;
-		}
-		have_cert = false;
-	}
-	else
+	char	   *sslcertfile = NULL;
+	char	   *engine = NULL;
+	char	   *keyname = NULL;
+	char	   *sslkeyfile = NULL;
+	char	   *sslrootcert = NULL;
+	char	   *sslcrl = NULL;
+	int			ret = -1;
+
+	if (!pqsecure_get_ssl_files(conn,
+sslcertfile, sslkeyfile, engine, keyname,
+sslrootcert, sslcrl))
+		return PGRES_POLLING_READING;
+
+	if (sslcertfile)
 	{
 		/*
 		 * Cert file exists, so load it.  Since OpenSSL doesn't provide the
@@ -855,216 +819,146 @@ initialize_SSL(PGconn *conn)
 		{
 			printfPQExpBuffer(conn-errorMessage,
 			   libpq_gettext(could not acquire mutex: %s\n), strerror(rc));
-			return -1;
+			goto fail;
 		}
 #endif
-		if (SSL_CTX_use_certificate_chain_file(SSL_context, fnbuf) != 1)
+		if (SSL_CTX_use_certificate_chain_file(SSL_context, sslcertfile) != 1)
 		{
 			char	   *err = SSLerrmessage();
 
 			printfPQExpBuffer(conn-errorMessage,
 			   libpq_gettext(could not read certificate file \%s\: %s\n),
-			  fnbuf, err);
+			  sslcertfile, err);
 			SSLerrfree(err);
 
 #ifdef 

Re: [HACKERS] Incorrect log message and checks in pgrecvlogical

2014-08-12 Thread Andres Freund
On 2014-08-12 16:28:56 +0900, Michael Paquier wrote:
 Hi,
 
 While looking at pg_recvlogical code, I noticed that a couple of error
 messages are incorrect when invoking some combinations of --create,
 --start or --drop.
 
 For example, here --init should be --create, --stop should be --drop:
 $ pg_recvlogical --create --drop --slot foo
 pg_recvlogical: cannot use --init or --start together with --stop
 Try pg_recvlogical --help for more information.
 
 Also, when checking the combination startpos  (create || drop), I
 think that we should check that startpos is InvalidXLogRecPtr.
 
 The patch attached fixes all those things. It should be back-patched to 9.4.

Thanks, applied.

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/PgSQL: RAISE and the number of parameters

2014-08-12 Thread Fabien COELHO


Hello Marko,

Here's a patch for making PL/PgSQL throw an error during compilation (instead 
of runtime) if the number of parameters passed to RAISE don't match the 
number of placeholders in error message.  I'm sure people can see the pros of 
doing it this way.


Patch scanned, applied  tested without problem on head.

The compile-time raise parameter checking is a good move.

3 minor points:

- I would suggest to avoid continue within a loop so that the code is 
simpler to understand, at least for me.


 - I would suggest to update the documentation accordingly.

 - The execution code now contains error detections which should never be 
raised, but I suggest to keep it in place anyway. However I would suggest 
to add comments about the fact that it should not be triggered.


See the attached patch which implements these suggestions on top of your 
patch.


--
Fabien.diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index 91fadb0..a3c763a 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -3403,6 +3403,8 @@ RAISE ;
 Inside the format string, literal%/literal is replaced by the
 string representation of the next optional argument's value. Write
 literal%%/literal to emit a literal literal%/literal.
+The number of optional arguments must match the number of literal%/
+placeholders in the format string, otherwise a compilation error is reported.
/para
 
para
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 69d1965..a3cd6fc 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2939,6 +2939,10 @@ exec_stmt_raise(PLpgSQL_execstate *estate, PLpgSQL_stmt_raise *stmt)
 	continue;
 }
 
+/*
+ * Too few arguments to raise:
+ * This should never happen as a compile-time check is performed.
+ */
 if (current_param == NULL)
 	ereport(ERROR,
 			(errcode(ERRCODE_SYNTAX_ERROR),
@@ -2965,7 +2969,8 @@ exec_stmt_raise(PLpgSQL_execstate *estate, PLpgSQL_stmt_raise *stmt)
 
 		/*
 		 * If more parameters were specified than were required to process the
-		 * format string, throw an error
+		 * format string, throw an error.
+		 * This should never happen as a compile-time check is performed.
 		 */
 		if (current_param != NULL)
 			ereport(ERROR,
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index ba928e3..03976e4 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -3785,15 +3785,12 @@ check_raise_parameters(PLpgSQL_stmt_raise *stmt)
 
 	for (cp = stmt-message; *cp; cp++)
 	{
-		if (cp[0] != '%')
-			continue;
-		/* literal %, ignore */
-		if (cp[1] == '%')
-		{
-			cp++;
-			continue;
+		if (cp[0] == '%') {
+			if (cp[1] == '%') /* literal %, ignore */
+cp++;
+			else
+expected_nparams++;
 		}
-		expected_nparams++;
 	}
 
 	if (expected_nparams  list_length(stmt-params))

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/PgSQL: RAISE and the number of parameters

2014-08-12 Thread Marko Tiikkaja

Hi Fabien,

On 8/12/14 1:09 PM, Fabien COELHO wrote:

Here's a patch for making PL/PgSQL throw an error during compilation (instead
of runtime) if the number of parameters passed to RAISE don't match the
number of placeholders in error message.  I'm sure people can see the pros of
doing it this way.


Patch scanned, applied  tested without problem on head.


Thanks!


The compile-time raise parameter checking is a good move.

3 minor points:

- I would suggest to avoid continue within a loop so that the code is
simpler to understand, at least for me.


I personally find the code easier to read with the continue.


   - I would suggest to update the documentation accordingly.


I scanned the docs trying to find any mentions of the run-time error but 
couldn't find any.  I don't object to adding this, though.



   - The execution code now contains error detections which should never be
raised, but I suggest to keep it in place anyway. However I would suggest
to add comments about the fact that it should not be triggered.


Good point.  I actually realized I hadn't sent the latest version of the 
patch, sorry about that.  I did this:


https://github.com/johto/postgres/commit/1acab6fc5387c893ca29fed9284e09258e0c5c56

to also turn the ereport()s into elog()s since the user should never see 
them.



See the attached patch which implements these suggestions on top of your
patch.


Thanks for reviewing!  I'll incorporate the doc changes, but I'm going 
to let others decide on the logic in the check_raise_parameters() loop 
before doing any changes.


Will send an updated patch shortly.


.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL regression test suite

2014-08-12 Thread Andres Freund
On 2014-08-12 14:01:18 +0300, Heikki Linnakangas wrote:
 On 08/05/2014 10:46 PM, Robert Haas wrote:
 Why can't you make it work over 127.0.0.1?
 
 I wanted it to be easy to run the client and the server on different hosts.
 As soon as we have more than one SSL implementation, it would be really nice
 to do interoperability testing between a client and a server using different
 implementations.
 
 Also, to test sslmode=verify-full, where the client checks that the server
 certificate's hostname matches the hostname that it connected to, you need
 to have two aliases for the same server, one that matches the certificate
 and one that doesn't. But I think I found a way around that part; if the
 certificate is set up for localhost, and connect to 127.0.0.1, you get a
 mismatch.

Alternatively, and to e.g. test wildcard certs and such, I think you can
specify both host and hostaddr to connect to connect without actually
doing a dns lookup.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-12 Thread Marti Raudsepp
On Fri, Aug 8, 2014 at 10:50 PM, Hannu Krosing ha...@2ndquadrant.com wrote:
 How hard and how expensive would it be to teach pg_lzcompress to
 apply a delta filter on suitable data ?

 So that instead of integers their deltas will be fed to the real
 compressor

Has anyone given this more thought? I know this might not be 9.4
material, but to me it sounds like the most promising approach, if
it's workable. This isn't a made up thing, the 7z and LZMA formats
also have an optional delta filter.

Of course with JSONB the problem is figuring out which parts to apply
the delta filter to, and which parts not.

This would also help with integer arrays, containing for example
foreign key values to a serial column. There's bound to be some
redundancy, as nearby serial values are likely to end up close
together. In one of my past projects we used to store large arrays of
integer fkeys, deliberately sorted for duplicate elimination.

For an ideal case comparison, intar2 could be as large as intar1 when
compressed with a 4-byte wide delta filter:

create table intar1 as select array(select 1::int from
generate_series(1,100)) a;
create table intar2 as select array(select generate_series(1,100)::int) a;

In PostgreSQL 9.3 the sizes are:
select pg_column_size(a) from intar1;
  45810
select pg_column_size(a) from intar2;
420

So a factor of 87 difference.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL regression test suite

2014-08-12 Thread Heikki Linnakangas

On 08/12/2014 02:28 PM, Andres Freund wrote:

On 2014-08-12 14:01:18 +0300, Heikki Linnakangas wrote:

Also, to test sslmode=verify-full, where the client checks that the server
certificate's hostname matches the hostname that it connected to, you need
to have two aliases for the same server, one that matches the certificate
and one that doesn't. But I think I found a way around that part; if the
certificate is set up for localhost, and connect to 127.0.0.1, you get a
mismatch.


Alternatively, and to e.g. test wildcard certs and such, I think you can
specify both host and hostaddr to connect to connect without actually
doing a dns lookup.


Oh, I didn't know that's possible! Yeah, that's a good solution.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-12 Thread Tomas Vondra
On 12 Srpen 2014, 7:06, Jeff Davis wrote:
 On Mon, 2014-08-11 at 01:29 +0200, Tomas Vondra wrote:
 On 10.8.2014 23:26, Jeff Davis wrote:
  This patch is requires the Memory Accounting patch, or something
  similar to track memory usage.

 I think the patch you sent actually includes the accounting patch. Is
 that on purpose, or by accident?

 Accident, thank you.

 So once a group gets into memory, it stays there? That's going to work
 fine for aggregates with fixed-size state (int4, or generally state that
 gets allocated and does not grow), but I'm afraid for aggregates with
 growing state (as for example array_agg and similar) that's not really a
 solution.

 I agree in theory, but for now I'm just not handling that case at all
 because there is other work that needs to be done first. For one thing,
 we would need a way to save the transition state, and we don't really
 have that. In the case of array_agg, the state is not serialized and
 there's no generic way to ask it to serialize itself without finalizing.

Yes and no.

It's true we don't have this ability for aggregates passing state using
'internal', and arguably these are the cases that matter (because those
are the states that tend to bloat as more values are passed to the
aggregate).

We can do that for states with a known type (because we have serialize
deserialize methods for them), but we can't really require all aggregates
to use only known types. The 'internal' is there for a reason.

So I think eventually we should to support something like this:

CREATE AGGREGATE myaggregate (
...
SERIALIZE_FUNC = 'dump_data',
DESERIALIZE_FUNC = 'read_data',
...
);

That being said, we can't require this from all existing aggregates.
There'll always be aggregates not providing this (for example some old
ones).

So even if we have this, we'll have to support the case when it's not
provided - possibly by using the batching algorithm you provided. What
I imagine is this:

   hitting work_mem limit - do we know how to dump the aggregate state?

 yes (known type or serialize/deserialize)
 = use the batching algorithm from hash join

 no (unknown type, ...)
 = use the batching algorithm described in the original message

Now, I'm not trying to make you implement all this - I'm willing to work
on that. Implementing this CREATE AGGREGATE extension is however tightly
coupled with your patch, because that's the only place where it might be
used (that I'm aware of).


 I'm open to ideas. Do you think my patch is going generally in the right
 direction, and we can address this problem later; or do you think we
 need a different approach entirely?

I certainly think having memory-bounded hashagg is a great improvement,
and yes - this patch can get us there. Maybe it won't get us all the way
to the perfect solution but so what? We can improve that by further
patches (and I'm certainly willing to spend some time on that).

So thanks a lot for working on this!


 While hacking on the hash join, I envisioned the hash aggregate might
 work in a very similar manner, i.e. something like this:

   * nbatches=1, nbits=0
   * when work_mem gets full = nbatches *= 2, nbits += 1
   * get rid of half the groups, using nbits from the hash
  = dump the current states into 'states.batchno' file
  = dump further tuples to 'tuples.batchno' file
   * continue until the end, or until work_mem gets full again

 It would get a little messy with HashAgg. Hashjoin is dealing entirely
 with tuples; HashAgg deals with tuples and groups.

I don't see why it should get messy? In the end, you have a chunk of
data and a hash for it.


 Also, if the transition state is fixed-size (or even nearly so), it
 makes no sense to remove groups from the hash table before they are
 finished. We'd need to detect that somehow, and it seems almost like two
 different algorithms (though maybe not a bad idea to use a different
 algorithm for things like array_agg).

It just means you need to walk through the hash table, look at the
hashes and dump ~50% of the groups to a file. I'm not sure how difficult
that is with dynahash, though (hashjoin uses a custom hashtable, that
makes this very simple).


 Not saying that it can't be done, but (unless you have an idea) requires
 quite a bit more work than what I did here.

 It also seems to me that the logic of the patch is about this:

   * try to lookup the group in the hash table
 * found = call the transition function
 * not found
 * enough space = call transition function
 * not enough space = tuple/group goes to a batch

 Which pretty much means all tuples need to do the lookup first. The nice
 thing on the hash-join approach is that you don't really need to do the
 lookup - you just need to peek at the hash whether the group belongs to
 the current batch (and if not, to which batch it should go).

 That's an interesting point. I suspect that, in practice, the cost of
 hashing the tuple is 

Re: [HACKERS] pg_receivexlog and replication slots

2014-08-12 Thread Michael Paquier
On Fri, Jul 11, 2014 at 6:23 PM, Andres Freund and...@2ndquadrant.com wrote:
 Ok. Do you plan to take care of it? If, I'd be fine with backpatching
 it. I'm not likely to get to it right now :(

Actually I came up with the same need as Magnus, but a bit later,
so... Attached is a patch to support physical slot creation and drop
in pg_receivexlog with the addition of new options --create and
--drop. It would be nice to have that in 9.4, but I will not the one
deciding that at the end :) Code has been refactored with what is
already available in pg_recvlogical for the slot creation and drop.

Regards,
-- 
Michael
From 62ef9bd097453648e4f313f1aedf91ba2b4a3f1e Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Tue, 12 Aug 2014 21:54:13 +0900
Subject: [PATCH] Add support for physical slot creation/deletion in
 pg_receivexlog

Physical slot creation can be done with --create and drop with --drop.
In both cases --slot is needed. Code for replication slot creation and
drop is refactored with what was available in pg_recvlogical.
---
 doc/src/sgml/ref/pg_receivexlog.sgml   | 29 ++
 src/bin/pg_basebackup/pg_receivexlog.c | 73 +
 src/bin/pg_basebackup/pg_recvlogical.c | 66 +++
 src/bin/pg_basebackup/streamutil.c | 97 ++
 src/bin/pg_basebackup/streamutil.h |  7 +++
 5 files changed, 203 insertions(+), 69 deletions(-)

diff --git a/doc/src/sgml/ref/pg_receivexlog.sgml b/doc/src/sgml/ref/pg_receivexlog.sgml
index c15776f..9251f85 100644
--- a/doc/src/sgml/ref/pg_receivexlog.sgml
+++ b/doc/src/sgml/ref/pg_receivexlog.sgml
@@ -72,6 +72,35 @@ PostgreSQL documentation
   titleOptions/title
 
para
+applicationpg_receivexlog/application can run in one of two following
+modes, which control physical replication slot:
+
+variablelist
+
+ varlistentry
+  termoption--create/option/term
+  listitem
+   para
+Create a new physical replication slot with the name specified in
+option--slot/option, then exit.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
+  termoption--drop/option/term
+  listitem
+   para
+Drop the replication slot with the name specified
+in option--slot/option, then exit.
+   /para
+  /listitem
+ /varlistentry
+/variablelist
+
+   /para
+
+   para
 The following command-line options control the location and format of the
 output.
 
diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c
index 0b7af54..67c56cb 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -38,6 +38,8 @@ static int	noloop = 0;
 static int	standby_message_timeout = 10 * 1000;		/* 10 sec = default */
 static int	fsync_interval = 0; /* 0 = default */
 static volatile bool time_to_abort = false;
+static bool do_create_slot = false;
+static bool do_drop_slot = false;
 
 
 static void usage(void);
@@ -78,6 +80,9 @@ usage(void)
 	printf(_(  -w, --no-password  never prompt for password\n));
 	printf(_(  -W, --password force password prompt (should happen automatically)\n));
 	printf(_(  --slot=SLOTNAMEreplication slot to use\n));
+	printf(_(\nOptional actions:\n));
+	printf(_(  --create   create a new replication slot (for the slot's name see --slot)\n));
+	printf(_(  --drop drop the replication slot (for the slot's name see --slot)\n));
 	printf(_(\nReport bugs to pgsql-b...@postgresql.org.\n));
 }
 
@@ -261,14 +266,6 @@ StreamLog(void)
 	uint32		hi,
 lo;
 
-	/*
-	 * Connect in replication mode to the server
-	 */
-	conn = GetConnection();
-	if (!conn)
-		/* Error message already written in GetConnection() */
-		return;
-
 	if (!CheckServerVersionForStreaming(conn))
 	{
 		/*
@@ -370,6 +367,9 @@ main(int argc, char **argv)
 		{status-interval, required_argument, NULL, 's'},
 		{slot, required_argument, NULL, 'S'},
 		{verbose, no_argument, NULL, 'v'},
+/* action */
+		{create, no_argument, NULL, 1},
+		{drop, no_argument, NULL, 2},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -453,6 +453,13 @@ main(int argc, char **argv)
 			case 'v':
 verbose++;
 break;
+/* action */
+			case 1:
+do_create_slot = true;
+break;
+			case 2:
+do_drop_slot = true;
+break;
 			default:
 
 /*
@@ -477,10 +484,26 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
+	if (replication_slot == NULL  (do_drop_slot || do_create_slot))
+	{
+		fprintf(stderr, _(%s: replication slot needed with action --create or --drop\n), progname);
+		fprintf(stderr, _(Try \%s --help\ for more information.\n),
+progname);
+		exit(1);
+	}
+
+	if (do_drop_slot  do_create_slot)
+	{
+		fprintf(stderr, _(%s: cannot use --create together with --drop\n), progname);
+		fprintf(stderr, _(Try \%s --help\ for more information.\n),
+progname);
+		exit(1);
+	}
+
 	/*
 	 * Required arguments
 	 */
-	if (basedir == NULL)
+	

Re: [HACKERS] PL/PgSQL: RAISE and the number of parameters

2014-08-12 Thread Fabien COELHO


Hello,


- I would suggest to avoid continue within a loop so that the code is
simpler to understand, at least for me.


I personally find the code easier to read with the continue.


Hmmm. I had to read the code to check it, and I did it twice. The point is 
that there is 3 exit points instead of 1 in the block, which is not in 
itself very bad, but:


  for(...) {
if (ccc)
  xxx;
...
foo++;
  }

It looks like foo++ is always executed, and you have to notice that xxx 
a few lines above is a continue to realise that it is only when ccc is 
false. This is also disconcerting if it happens to be the rare case, 
i.e. here most of the time the char is not '%', so most of the time foo is 
not incremented, although it is a the highest level. Also the code with 
continue does not really put forward that what is counted is '%' followed 
by a non '%'. Note that the corresponding execution code 
(pgplsql/src/pl_exec.c) uses one continue to get rid of the second '%', 
which is quite defendable in that case as it is a kind of exception, but 
the main condition remains a simple if block. Final argument, the 
structured version is shorter than the unstructured version, with just the 
two continues removed, and one negation also removed.


to also turn the ereport()s into elog()s since the user should never see 
them.


I'm not sure why elog is better than ereport in that case: ISTM that it is 
an error worth reporting if it ever happens, say there is another syntax 
added later on which is not caught for some reason by the compile-time 
check, so I would not change it.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-12 Thread Claudio Freire
On Tue, Aug 12, 2014 at 6:41 AM, Marco Nenciarini
marco.nenciar...@2ndquadrant.it wrote:
 To declared two files identical they must have same size,
 same mtime and same *checksum*.

Still not safe. Checksum collisions do happen, especially in big data sets.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/PgSQL: RAISE and the number of parameters

2014-08-12 Thread Pavel Stehule
2014-08-12 15:09 GMT+02:00 Fabien COELHO coe...@cri.ensmp.fr:


 Hello,


  - I would suggest to avoid continue within a loop so that the code is
 simpler to understand, at least for me.


 I personally find the code easier to read with the continue.


 Hmmm. I had to read the code to check it, and I did it twice. The point is
 that there is 3 exit points instead of 1 in the block, which is not in
 itself very bad, but:

   for(...) {
 if (ccc)
   xxx;
 ...
 foo++;
   }

 It looks like foo++ is always executed, and you have to notice that xxx
 a few lines above is a continue to realise that it is only when ccc is
 false. This is also disconcerting if it happens to be the rare case, i.e.
 here most of the time the char is not '%', so most of the time foo is not
 incremented, although it is a the highest level. Also the code with
 continue does not really put forward that what is counted is '%' followed
 by a non '%'. Note that the corresponding execution code
 (pgplsql/src/pl_exec.c) uses one continue to get rid of the second '%',
 which is quite defendable in that case as it is a kind of exception, but
 the main condition remains a simple if block. Final argument, the
 structured version is shorter than the unstructured version, with just the
 two continues removed, and one negation also removed.


  to also turn the ereport()s into elog()s since the user should never see
 them.


 I'm not sure why elog is better than ereport in that case: ISTM that it is
 an error worth reporting if it ever happens, say there is another syntax
 added later on which is not caught for some reason by the compile-time
 check, so I would not change it.


one note: this patch can enforce a compatibility issues - a partially
broken functions, where some badly written RAISE statements was executed
newer.

I am not against this patch, but it should be in extra check probably ?? Or
we have to documented it as potential compatibility issue

Regards

Pavel


  --
 Fabien.



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Proposal: Incremental Backup

2014-08-12 Thread Gabriele Bartolini
Hi Claudio,

2014-08-12 15:25 GMT+02:00 Claudio Freire klaussfre...@gmail.com:
 Still not safe. Checksum collisions do happen, especially in big data sets.

Can I ask you what you are currently using for backing up large data
sets with Postgres?

Thanks,
Gabriele


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-12 Thread Marco Nenciarini
Il 12/08/14 15:25, Claudio Freire ha scritto:
 On Tue, Aug 12, 2014 at 6:41 AM, Marco Nenciarini
 marco.nenciar...@2ndquadrant.it wrote:
 To declared two files identical they must have same size,
 same mtime and same *checksum*.
 
 Still not safe. Checksum collisions do happen, especially in big data sets.
 

IMHO it is still good-enough. We are not trying to protect from a
malicious attack, we are using it to protect against some *casual* event.

Even cosmic rays have a not null probability of corrupting your database
in a not-noticeable way. And you can probably notice it better with a
checksum than with a LSN :-)

Given that, I think that whatever solution we choose, we should include
 checksums in it.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-08-12 Thread MauMau

Robert Haas robertmh...@gmail.com writes:

I'd support back-porting that commit to 9.1 and 9.2 as a fix for this
problem.  As the commit message says, it's dead simple.


From: Tom Lane t...@sss.pgh.pa.us

While I have no great objection to back-porting Heikki's patch, it seems
like a very large stretch to call this a root-cause fix.  At best it's
band-aiding one symptom in a rather fragile way.


Thank you, Robert san.  I'll be waiting for it to be back-ported to the next 
9.1/9.2 release.


Yes, I think this failure is only one potential symptom caused by the 
implemnentation mistake -- handling both latch wakeup and other tasks that 
wait on a latch in the SIGUSR1 handler.  Although there may be no such tasks 
now, I'd like to correct and clean up the implementation as follows to avoid 
similar problems in the future.  I think it's enough to do this only for 
9.5.  Please correct me before I go deeper in the wrong direction.


* The SIGUSR1 handler only does latch wakeup.  Any other task is done in 
other signal handlers such as SIGUSR2.  Many daemon postgres processes 
follow this style, but the normal backend, autovacuum daemons, and 
background workers don't now.


* InitializeLatchSupport() in unix_latch.c calls pqsignal(SIGUSR1, 
latch_sigusr1_handler).  Change the argument of latch_sigusr1_handler() 
accordingly.


* Remove SIGUSR1 handler registration and process-specific SIGUSR1 handler 
functions from all processes.  We can eliminate many SIGUSR1 handler 
functions which have the same contents.


Regards
MauMau





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-12 Thread Andres Freund
On 2014-08-12 10:25:21 -0300, Claudio Freire wrote:
 On Tue, Aug 12, 2014 at 6:41 AM, Marco Nenciarini
 marco.nenciar...@2ndquadrant.it wrote:
  To declared two files identical they must have same size,
  same mtime and same *checksum*.
 
 Still not safe. Checksum collisions do happen, especially in big data sets.

If you use an appropriate algorithm for appropriate amounts of data
that's not a relevant concern. You can easily do different checksums for
every 1GB segment of data. If you do it right the likelihood of
conflicts doing that is so low it doesn't matter at all.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2014-08-12 Thread Kevin Grittner
Amit Khandekar amit.khande...@enterprisedb.com wrote:
 On 7 August 2014 19:49, Kevin Grittner kgri...@ymail.com wrote:
 Amit Khandekar amit.khande...@enterprisedb.com wrote:

 I tried to google some SQLs that use REFERENCING clause with triggers.
 It looks like in some database systems, even the WHEN clause of CREATE 
 TRIGGER
 can refer to a transition table, just like how it refers to NEW and
 OLD row variables.

 For e.g. :
 CREATE TRIGGER notify_dept
   AFTER UPDATE ON weather
   REFERENCING NEW_TABLE AS N_TABLE
   NEW AS N_ROW
   FOR EACH ROW
   WHEN ((SELECT AVG (temperature) FROM N_TABLE)  10)
   BEGIN
 notify_department(N_ROW.temperature, N_ROW.city);
   END

 Above, it is used to get an aggregate value of all the changed rows. I think
 we do not currently support aggregate expressions in the where clause, but 
 with
 transition tables, it makes more sense to support it later if not now.

 Interesting point; I had not thought about that.  Will see if I can
 include support for that in the patch for the next CF; failing
 that; I will at least be careful to not paint myself into a corner
 where it is unduly hard to do later.
 We currently do the WHEN checks while saving the AFTER trigger events,
 and also add the tuples one by one while saving the trigger events. If
 and when we support WHEN, we would need to make all of these tuples
 saved *before* the first WHEN clause execution, and that seems to
 demand more changes in the current trigger code.

In that case my inclination is to get things working with the less
invasive patch that doesn't try to support transition table usage
in WHEN clauses, and make support for that a later patch.

 ---

 Are we later going to extend this support for constraint triggers as
 well ? I think these variables would make sense even for deferred
 constraint triggers. I think we would need some more changes if we
 want to support this, because there is no query depth while executing
 deferred triggers and so the tuplestores might be inaccessible with
 the current design.

Hmm, I would also prefer to exclude that from an initial patch, but
this and the WHEN clause issue may influence a decision I've been
struggling with.  This is my first non-trivial foray into the
planner territory, and I've been struggling with how best to bridge
the gap between where the tuplestores are *captured* in the trigger
code and where they are referenced by name in a query and
incorporated into a plan for the executor.  (The execution level
itself was almost trivial; it's getting the tuplestore reference
through the parse analysis and planning phases that is painful for
me.)  At one point I create a tuplestore registry using a
process-local hashmap where each Tuplestorestate and its associated
name, TupleDesc, etc. would be registered, yielding a Tsrid (based
on Oid) to use through the parse analysis and planning steps, but
then I ripped it back out again in favor of just passing the
pointer to the structure which was stored in the registry; because
the registry seemed to me to introduce more risk of memory leaks,
references to freed memory, etc.  While it helped a little with
abstraction, it seemed to make things more fragile.  But I'm still
torn on this, and unsure whether such a registry is a good idea.

Any thoughts on that?

 ---

 The following (and similarly other) statements :
 trigdesc-trig_insert_new_table |=
 (TRIGGER_FOR_INSERT(tgtype) 
 TRIGGER_USES_TRANSITION_TABLE(trigger-tgnewtable)) ? true : false;

 can be simplfied to :

 trigdesc-trig_insert_new_table |=
 (TRIGGER_FOR_INSERT(tgtype) 
 TRIGGER_USES_TRANSITION_TABLE(trigger-tgnewtable));

Yeah, I did recognize that, but I always get squeamish about
logical operations with values which do not equal true or false.
TRIGGER_FOR_INSERT and similar macros don't necessarily return true
for something which is not false.  I should just get over that and
trust the compiler a bit more  :-)

 ---

 AfterTriggerExecute()
 {
 .
 /*
 * Set up the tuplestore information.
 */
 if (trigdesc-trig_delete_old_table || trigdesc-trig_update_old_table)
 LocTriggerData.tg_olddelta =
 GetCurrentTriggerDeltaTuplestore(afterTriggers-old_tuplestores);
 .
 }
 Above, trigdesc-trig_update_old_table is true if at least one of the
 triggers of the table has transition tables. So this will cause the
 delta table to be available on all of the triggers of the table even
 if only one out of them uses transition tables. May be this would be
 solved by using LocTriggerData.tg_trigger-tg_olddelta rather than
 trigdesc-trig_update_old_table to decide whether
 LocTriggerData.tg_olddelta should be assigned.

Good point.  Will do.

 ---

 GetCurrentTriggerDeltaTuplestore() is now used for getting fdw
 tuplestore also, so it should have a more general name.

Well, delta *was* my attempt at a more general name.  I need to
do another pass over the naming choices to make sure I'm being
consistent, but I 

Re: [HACKERS] Proposal: Incremental Backup

2014-08-12 Thread Claudio Freire
On Tue, Aug 12, 2014 at 11:17 AM, Gabriele Bartolini
gabriele.bartol...@2ndquadrant.it wrote:

 2014-08-12 15:25 GMT+02:00 Claudio Freire klaussfre...@gmail.com:
 Still not safe. Checksum collisions do happen, especially in big data sets.

 Can I ask you what you are currently using for backing up large data
 sets with Postgres?

Currently, a time-delayed WAL archive hot standby, pg_dump sparingly,
filesystem snapshots (incremental) of the standby more often, with the
standby down.

When I didn't have the standby, I did online filesystem snapshots of
the master with WAL archiving to prevent inconsistency due to
snapshots not being atomic.

On Tue, Aug 12, 2014 at 11:25 AM, Marco Nenciarini
marco.nenciar...@2ndquadrant.it wrote:
 Il 12/08/14 15:25, Claudio Freire ha scritto:
 On Tue, Aug 12, 2014 at 6:41 AM, Marco Nenciarini
 marco.nenciar...@2ndquadrant.it wrote:
 To declared two files identical they must have same size,
 same mtime and same *checksum*.

 Still not safe. Checksum collisions do happen, especially in big data sets.


 IMHO it is still good-enough. We are not trying to protect from a
 malicious attack, we are using it to protect against some *casual* event.

I'm not talking about malicious attacks, with big enough data sets,
checksum collisions are much more likely to happen than with smaller
ones, and incremental backups are supposed to work for the big sets.

You could use strong cryptographic checksums, but such strong
checksums still aren't perfect, and even if you accept the slim chance
of collision, they are quite expensive to compute, so it's bound to be
a bottleneck with good I/O subsystems. Checking the LSN is much
cheaper.

Still, do as you will. As everybody keeps saying it's better than
nothing, lets let usage have the final word.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvement of versioning on Windows, take two

2014-08-12 Thread MauMau

From: Michael Paquier michael.paqu...@gmail.com

Oh yes, right. I don't really know how I missed this error when
testing v1. Adding an explicit call to
RemoveFile('src\timezone\win32ver.rc') for project postgres calms down
the build. Is the attached working for you?


Yes, the build succeeded.  I confirmed that the following files have version 
info.  However, unlike other files, they don't have file description.  Is 
this intended?


bin\isolationtester.exe
bin\pg_isolation_regress
bin\pg_regress.exe
bin\pg_regress_ecpg.exe
bin\zic.exe
lib\regress.dll


lib\dict_snowball.dll has no version properties.

Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format and API changes (9.5)

2014-08-12 Thread Alvaro Herrera
Heikki Linnakangas wrote:
 On 08/05/2014 03:50 PM, Michael Paquier wrote:

 - All the functions in xlogconstruct.c could be defined in a separate
 header xlogconstruct.h. What they do is rather independent from xlog.h. The
 same applies to all the structures and functions of xlogconstruct.c in
 xlog_internal.h: XLogRecordAssemble, XLogRecordAssemble,
 InitXLogRecordConstruct. (worth noticing as well that the only reason
 XLogRecData is defined externally of xlogconstruct.c is that it is being
 used in XLogInsert()).
 
 Hmm. I left the defines for xlogconstruct.c functions that are used
 to build a WAL record in xlog.h, so that it's not necessary to
 include both xlog.h and xlogconstruct.h in all files that write a
 WAL record (XLogInsert() is defined in xlog.h).
 
 But perhaps it would be better to move the prototype for XLogInsert
 to xlogconstruct.h too? That might be a better division; everything
 needed to insert a WAL record would be in xlogconstruct.h, and
 xlog.h would left for more internal functions related to WAL. And
 perhaps rename the files to xloginsert.c and xloginsert.h.

Yes, IMO ideally modules that only construct WAL records to insert, but
are not directly involved with other XLog stuff, should only be using a
lean header file, not the whole xlog.h.  I imagine xlogconstruct.h would
be such a file.  (The patch you just posted doesn't have such a file --
AFAICS that stuff is all in xlog.h still).

No opinion on xlogconstruct vs. xloginsert as file names.  Both seem
like good enough names to me.  Unless others have stronger opinions, I
would left that decision to you.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format and API changes (9.5)

2014-08-12 Thread Alvaro Herrera
Heikki Linnakangas wrote:
 On 08/05/2014 03:50 PM, Michael Paquier wrote:

 - XLogRecGetBlockRefIds needing an already-allocated array for *out is not
 user-friendly. Cannot we just do all the work inside this function?
 
 I agree it's not a nice API. Returning a palloc'd array would be
 nicer, but this needs to work outside the backend too (e.g.
 pg_xlogdump). It could return a malloc'd array in frontend code, but
 it's not as nice. On the whole, do you think that be better than the
 current approach?

I don't see the issue.  Right now, in frontend code you can call
palloc() and pfree(), and they behave like malloc() and free() (see
fe_memutils.c).  Your module doesn't need to do anything special.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-12 Thread Robert Haas
On Mon, Aug 11, 2014 at 3:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 ... I think it would be a better idea to arrange some method by
 which JSONB (and perhaps other data types) can provide compression
 hints to pglz.

 I agree with that as a long-term goal, but not sure if it's sane to
 push into 9.4.

 What we could conceivably do now is (a) add a datatype OID argument to
 toast_compress_datum, and (b) hard-wire the selection of a different
 compression-parameters struct if it's JSONBOID.  The actual fix would
 then be to increase the first_success_by field of this alternate struct.

I think it would be perfectly sane to do that for 9.4.  It may not be
perfect, but neither is what we have now.

 A longer-term solution would be to make this some sort of type property
 that domains could inherit, like typstorage is already.  (Somebody
 suggested dealing with this by adding more typstorage values, but
 I don't find that attractive; typstorage is known in too many places.)
 We'd need some thought about exactly what we want to expose, since
 the specific knobs that pglz_compress has today aren't necessarily
 good for the long term.

What would really be ideal here is if the JSON code could inform the
toast compression code this many initial bytes are likely
incompressible, just pass them through without trying, and then start
compressing at byte N, where N is the byte following the TOC.  But I
don't know that there's a reasonable way to implement that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-08-12 Thread Robert Haas
On Mon, Aug 11, 2014 at 7:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Aug 8, 2014 at 10:30 AM, MauMau maumau...@gmail.com wrote:
 I've tracked down the real root cause.  The fix is very simple.  Please
 check the attached one-liner patch.

 I'd support back-porting that commit to 9.1 and 9.2 as a fix for this
 problem.  As the commit message says, it's dead simple.

 While I have no great objection to back-porting Heikki's patch, it seems
 like a very large stretch to call this a root-cause fix.  At best it's
 band-aiding one symptom in a rather fragile way.

Yeah, that's true, but I'm not clear that we have another
back-patchable fix, so doing something almost-certainly-harmless to
alleviate the immediate pain seems worthwhile.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-08-12 Thread Andres Freund
On 2014-08-12 11:04:00 -0400, Robert Haas wrote:
 On Mon, Aug 11, 2014 at 7:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  On Fri, Aug 8, 2014 at 10:30 AM, MauMau maumau...@gmail.com wrote:
  I've tracked down the real root cause.  The fix is very simple.  Please
  check the attached one-liner patch.
 
  I'd support back-porting that commit to 9.1 and 9.2 as a fix for this
  problem.  As the commit message says, it's dead simple.
 
  While I have no great objection to back-porting Heikki's patch, it seems
  like a very large stretch to call this a root-cause fix.  At best it's
  band-aiding one symptom in a rather fragile way.
 
 Yeah, that's true, but I'm not clear that we have another
 back-patchable fix, so doing something almost-certainly-harmless to
 alleviate the immediate pain seems worthwhile.

Isn't that still leaving the very related issue of waits due to hot
pruning open?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hokey wrong versions of libpq in apt.postgresql.org

2014-08-12 Thread Christoph Berg
Re: Joshua D. Drake 2014-08-11 53e83e9c.6030...@commandprompt.com
 The issue that I specifically ran into is that by using apt.postgresql.org
 in its default configuration, I can't add certain extensions (without
 jumping through hoops). Simple:
 
 Assume a running 9.2.9 from apt.postgresql.org
 apt-get install pgxnclient
 sudo pgxnclient install pg_repack
 
 
 
 Doesn't work. Because it is looking for libs in /var/lib/postgresql/9.2 not
 /var/lib/postgresql/9.3.
 
 It also failed on another extension (but I don't recall which one it is).

I'd be interested in which other extension this is. As I told you
repeatedly, we are building tons of extensions for something like 5
branches in parallel on apt.postgresql.org, and things generally just
work.

We don't have pg_repack packages there, but there is pg_reorg. This
indeed required some patch to compile, namely manual pg_config_ext.h
and postgres_ext.h includes:
http://anonscm.debian.org/cgit/pkg-postgresql/pg-reorg.git/tree/debian/patches/headers

Trying pg_repack manually here yields this:

pg_repack-1.2.1/bin $ make
gcc -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat 
-Werror=format-security -fPIC -pie -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wformat-security 
-fno-strict-aliasing -fwrapv -fexcess-precision=standard -g 
-I/usr/include/postgresql -DREPACK_VERSION=1.2.1 -I. -I. 
-I/usr/include/postgresql/9.1/server -I/usr/include/postgresql/internal 
-D_FORTIFY_SOURCE=2 -DLINUX_OOM_ADJ=0 -D_GNU_SOURCE -I/usr/include/libxml2  
-I/usr/include/tcl8.5  -c -o pgut/pgut.o pgut/pgut.c
In file included from pgut/pgut.c:10:0:
/usr/include/postgresql/postgres_fe.h:27:32: fatal error: common/fe_memutils.h: 
No such file or directory

This can be fixed by moving -I/usr/include/postgresql past the server
includes, but then there's other issues:

$ gcc -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat 
-Werror=format-security -fPIC -pie -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wformat-security 
-fno-strict-aliasing -fwrapv -fexcess-precision=standard -g 
-DREPACK_VERSION=1.2.1 -I. -I. -I/usr/include/postgresql/9.1/server 
-I/usr/include/postgresql/internal -I/usr/include/postgresql 
-D_FORTIFY_SOURCE=2 -DLINUX_OOM_ADJ=0 -D_GNU_SOURCE -I/usr/include/libxml2  
-I/usr/include/tcl8.5  -c -o pgut/pgut.o pgut/pgut.c
In file included from pgut/pgut.h:23:0,
 from pgut/pgut.c:17:
/usr/include/postgresql/libpq-fe.h:547:1: error: unknown type name 'pg_int64'
/usr/include/postgresql/libpq-fe.h:547:50: error: unknown type name 'pg_int64'
/usr/include/postgresql/libpq-fe.h:551:1: error: unknown type name 'pg_int64'
/usr/include/postgresql/libpq-fe.h:553:48: error: unknown type name 'pg_int64'

This seems to be another instance of the (includedir_internal)
headers are not self-contained problem, see
http://www.postgresql.org/message-id/20140426122548.ga7...@msgid.df7cb.de

Possibly we need to move some files in libpq-dev to/from
postgresql-server-dev-*, though I believe the Debian packages are now
just replicating whatever header layout the PostgreSQL makefiles
construct on install. (Namely server - /usr/include/postgresql/9.4/server,
Rest - /usr/include/postgresql/)

I haven't tried to check all include paths with various combinations
of libpq-dev and postgresql-server-dev-one-or-the-other-version
installed, though that would be an interesting exercise. Any
volunteers?

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-08-12 Thread Robert Haas
On Tue, Aug 12, 2014 at 11:06 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-08-12 11:04:00 -0400, Robert Haas wrote:
 On Mon, Aug 11, 2014 at 7:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  On Fri, Aug 8, 2014 at 10:30 AM, MauMau maumau...@gmail.com wrote:
  I've tracked down the real root cause.  The fix is very simple.  Please
  check the attached one-liner patch.
 
  I'd support back-porting that commit to 9.1 and 9.2 as a fix for this
  problem.  As the commit message says, it's dead simple.
 
  While I have no great objection to back-porting Heikki's patch, it seems
  like a very large stretch to call this a root-cause fix.  At best it's
  band-aiding one symptom in a rather fragile way.

 Yeah, that's true, but I'm not clear that we have another
 back-patchable fix, so doing something almost-certainly-harmless to
 alleviate the immediate pain seems worthwhile.

 Isn't that still leaving the very related issue of waits due to hot
 pruning open?

Yes.  Do you have a back-patchable solution for that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-12 Thread Robert Haas
On Tue, Aug 12, 2014 at 10:30 AM, Andres Freund and...@2ndquadrant.com wrote:
 Still not safe. Checksum collisions do happen, especially in big data sets.

 If you use an appropriate algorithm for appropriate amounts of data
 that's not a relevant concern. You can easily do different checksums for
 every 1GB segment of data. If you do it right the likelihood of
 conflicts doing that is so low it doesn't matter at all.

True, but if you use LSNs the likelihood is 0.  Comparing the LSN is
also most likely a heck of a lot faster than checksumming the entire
page.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-08-12 Thread Andres Freund
On 2014-08-12 11:56:41 -0400, Robert Haas wrote:
 On Tue, Aug 12, 2014 at 11:06 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2014-08-12 11:04:00 -0400, Robert Haas wrote:
  On Mon, Aug 11, 2014 at 7:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
   Robert Haas robertmh...@gmail.com writes:
   On Fri, Aug 8, 2014 at 10:30 AM, MauMau maumau...@gmail.com wrote:
   I've tracked down the real root cause.  The fix is very simple.  Please
   check the attached one-liner patch.
  
   I'd support back-porting that commit to 9.1 and 9.2 as a fix for this
   problem.  As the commit message says, it's dead simple.
  
   While I have no great objection to back-porting Heikki's patch, it seems
   like a very large stretch to call this a root-cause fix.  At best it's
   band-aiding one symptom in a rather fragile way.
 
  Yeah, that's true, but I'm not clear that we have another
  back-patchable fix, so doing something almost-certainly-harmless to
  alleviate the immediate pain seems worthwhile.
 
  Isn't that still leaving the very related issue of waits due to hot
  pruning open?
 
 Yes.  Do you have a back-patchable solution for that?

The easiest thing I can think of is sprinkling a few
SetConfigOption('synchronous_commit', 'off',
PGC_INTERNAL, PGC_S_OVERRIDE,
GUC_ACTION_LOCAL, true, ERROR);
in some strategic places. From a quick look:
* all of autovacuum
* locally in ProcessCompletedNotifies
* locally in ProcessIncomingNotify
* locally in ProcessCatchupEvent
* locally in InitPostgres

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-12 Thread Fujii Masao
On Wed, Aug 13, 2014 at 12:58 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Aug 12, 2014 at 10:30 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
 Still not safe. Checksum collisions do happen, especially in big data sets.

 If you use an appropriate algorithm for appropriate amounts of data
 that's not a relevant concern. You can easily do different checksums for
 every 1GB segment of data. If you do it right the likelihood of
 conflicts doing that is so low it doesn't matter at all.

 True, but if you use LSNs the likelihood is 0.  Comparing the LSN is
 also most likely a heck of a lot faster than checksumming the entire
 page.

If we use LSN, the strong safeguard seems to be required to prevent a user
from taking the incremental backup against wrong instance. For example,
it's the case where the first full backup is taken, PITR to a certain
past location,
then the incremental backup is taken between that first full backup and
the current database after PITR. PITR rewinds LSN, so such incremental
backup might be corrupted. If so, the safeguard for those problematic cases
should be needed. Otherwise, I'm afraid that a user can easily mistake the
incremental backup.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] PostgreSQL 9.4 mmap(2) performance regression on FreeBSD...

2014-08-12 Thread Sean Chittenden
One of the patches that I've been sitting on and am derelict in punting 
upstream is the attached mmap(2) flags patch for the BSDs. Is there any 
chance this can be squeezed in to the PostreSQL 9.4 release?


The patch is trivial in size and is used to add one flag to mmap(2) 
calls in dsm_impl.c.  Alan Cox (FreeBSD alc, not Linux) and I went back 
and forth regarding PostgreSQL's use of mmap(2) and determined that the 
following is correct and will prevent a likely performance regression in 
PostgreSQL 9.4. In PostgreSQL 9.3, all mmap(2) calls were called with 
the flags MAP_ANON | MAP_SHARED, whereas in PostgreSQL 9.4 this is not 
the case.


Digging in to the patch, in reviewing 
src/backend/storage/ipc/dsm_impl.c, it's clear that rhaas@ understood 
the consequences of mmap(2), and the possible consequences of having 
dirty pages gratuitously flushed to disk:


src/backend/storage/ipc/dsm_impl.c:781
 * Operating system primitives to support mmap-based shared memory.
 *
 * Calling this shared memory is somewhat of a misnomer, because what
 * we're really doing is creating a bunch of files and mapping them into
 * our address space.  The operating system may feel obliged to
 * synchronize the contents to disk even if nothing is being paged out,
 * which will not serve us well.  The user can relocate the pg_dynshmem
 * directory to a ramdisk to avoid this problem, if available.

In order for the above comment to be true for FreeBSD, an extra flag 
needs to be passed to mmap(2). From FreeBSD 10's mmap(2) page[2]:


 MAP_NOSYNC Causes data dirtied via this VM map to be 
flushed to

physical media only when necessary (usually by the
pager) rather than gratuitously.  Typically 
this pre-
vents the update daemons from flushing pages 
dirtied
through such maps and thus allows efficient 
sharing of

memory across unassociated processes using a file-
backed shared memory map.  Without this option 
any VM
pages you dirty may be flushed to disk every so 
often
(every 30-60 seconds usually) which can create 
perfor-
mance problems if you do not need that to occur 
(such
as when you are using shared file-backed mmap 
regions
for IPC purposes).  Note that VM/file system 
coherency
is maintained whether you use MAP_NOSYNC or 
not.  This

option is not portable across UNIX platforms (yet),
though some may implement the same behavior by
default.

WARNING!  Extending a file with ftruncate(2), thus
creating a big hole, and then filling the hole 
by mod-
ifying a shared mmap() can lead to severe file 
frag-
mentation.  In order to avoid such 
fragmentation you
should always pre-allocate the file's backing 
store by
write()ing zero's into the newly extended area 
prior
to modifying the area via your mmap().  The 
fragmenta-

tion problem is especially sensitive to MAP_NOSYNC
pages, because pages may be flushed to disk in a
totally random order.

The same applies when using MAP_NOSYNC to 
implement a

file-based shared memory store.  It is recommended
that you create the backing store by write()ing 
zero's

to the backing file rather than ftruncate()ing it.
You can test file fragmentation by observing 
the KB/t
(kilobytes per transfer) results from an 
``iostat 1''

while reading a large file sequentially, e.g. using
``dd if=filename of=/dev/null bs=32k''.

The fsync(2) system call will flush all dirty 
data and

metadata associated with a file, including dirty
NOSYNC VM data, to physical media.  The sync(8) 
com-

mand and sync(2) system call generally do not flush
dirty NOSYNC VM data.  The msync(2) system call is
usually not needed since BSD implements a coherent
file system buffer cache.  However, it may be 
used to
associate dirty VM pages with file system 
buffers and
thus cause them to be flushed to physical media 
sooner

rather than later.

The man page for madvise(2) has more pointed advise[3]:

 MADV_NOSYNC  Request that the system not flush the data associated
  

Re: [HACKERS] [PATCH] PostgreSQL 9.4 mmap(2) performance regression on FreeBSD...

2014-08-12 Thread Andres Freund
On 2014-08-12 09:42:30 -0700, Sean Chittenden wrote:
 One of the patches that I've been sitting on and am derelict in punting
 upstream is the attached mmap(2) flags patch for the BSDs. Is there any
 chance this can be squeezed in to the PostreSQL 9.4 release?
 
 The patch is trivial in size and is used to add one flag to mmap(2) calls in
 dsm_impl.c.  Alan Cox (FreeBSD alc, not Linux) and I went back and forth
 regarding PostgreSQL's use of mmap(2) and determined that the following is
 correct and will prevent a likely performance regression in PostgreSQL 9.4.
 In PostgreSQL 9.3, all mmap(2) calls were called with the flags MAP_ANON |
 MAP_SHARED, whereas in PostgreSQL 9.4 this is not the case.

The performancewise important call to mmap will still use that set of
flags, no? That's the one backing shared_buffers.

The mmap backend for *dynamic* shared memory (aka dsm) is *NOT* supposed
to be used on common platforms. Both posix and sysv shared memory will
be used before falling back to the mmap() backend.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-08-12 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-08-12 11:56:41 -0400, Robert Haas wrote:
 Yes.  Do you have a back-patchable solution for that?

 The easiest thing I can think of is sprinkling a few
 SetConfigOption('synchronous_commit', 'off',
 PGC_INTERNAL, PGC_S_OVERRIDE,
 GUC_ACTION_LOCAL, true, ERROR);

This still seems to me to be applying a band-aid that covers over some
symptoms; it's not dealing with the root cause that we've overloaded
the signal handling mechanism too much.   What reason is there to think
that there are no other symptoms of that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/PgSQL: RAISE and the number of parameters

2014-08-12 Thread Fabien COELHO


one note: this patch can enforce a compatibility issues - a partially 
broken functions, where some badly written RAISE statements was executed 
newer.



I am not against this patch, but it should be in extra check probably ??


I'm not sure about what you mean by it should be in extra check.


Or we have to documented it as potential compatibility issue.


Indeed, as a potential execution error is turned into a certain 
compilation error.


If this compatibility point is a blocker, the compilation error can be 
turned into a warning, but I would prefer to keep it an error: I'm quite 
sure I fell into that pit at least once or twice.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-08-12 Thread Andres Freund
On 2014-08-12 13:11:55 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-08-12 11:56:41 -0400, Robert Haas wrote:
  Yes.  Do you have a back-patchable solution for that?
 
  The easiest thing I can think of is sprinkling a few
  SetConfigOption('synchronous_commit', 'off',
  PGC_INTERNAL, PGC_S_OVERRIDE,
  GUC_ACTION_LOCAL, true, ERROR);
 
 This still seems to me to be applying a band-aid that covers over some
 symptoms; it's not dealing with the root cause that we've overloaded
 the signal handling mechanism too much.   What reason is there to think
 that there are no other symptoms of that?

I'm not arguing against fixing that. I think we need to do both,
although I am wary of fixing the signal handling in the back
branches. Fixing the signal handling won't get rid of the problem that
one e.g. might not be able to log in anymore if all synchronous standbys
are down and login caused hot pruning.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hokey wrong versions of libpq in apt.postgresql.org

2014-08-12 Thread Steve Crawford

On 08/07/2014 04:30 PM, Joshua D. Drake wrote:


Hello,

I know this has been brought up before:

http://www.postgresql.org/message-id/20140724080902.ga28...@msg.df7cb.de


For reference, libpq and packaging issues discussed here as well:
http://www.postgresql.org/message-id/53a304bc.40...@pinpointresearch.com
http://www.postgresql.org/message-id/53989c91.6050...@pinpointresearch.com



But this is just plain wrong. I don't care that the FAQ (on the wiki) 
says we are doing it wrong for good reasons. When I (or anyone else) 
pulls postgresql-$version-dev, I want the libpq for my version. I do 
not want 9.3.


Yes, it should (because of protocol compatibility) work but it 
doesn't always (as stated in that email and in a similar problem we 
just ran into).


There can be unintended circumstances on machines when you mix and 
match like that. Can we please do some proper packaging on this?


+1

Cheers,
Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/PgSQL: RAISE and the number of parameters

2014-08-12 Thread Pavel Stehule
2014-08-12 19:14 GMT+02:00 Fabien COELHO coe...@cri.ensmp.fr:


  one note: this patch can enforce a compatibility issues - a partially
 broken functions, where some badly written RAISE statements was executed
 newer.


  I am not against this patch, but it should be in extra check probably ??


 I'm not sure about what you mean by it should be in extra check.

  Or we have to documented it as potential compatibility issue.


 Indeed, as a potential execution error is turned into a certain
 compilation error.

 If this compatibility point is a blocker, the compilation error can be
 turned into a warning, but I would prefer to keep it an error: I'm quite
 sure I fell into that pit at least once or twice.


I prefer error, although there is possibility to control a force of
exception - error or warning via extra plpgsql checks - this kind of error
is strange - and stored procedures usually can be fixed later because
plpgsql source is available.

There was a precedent with more precious query syntax check more times.
Just it should be well documented.

Regards

Pavel





 --
 Fabien.



Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement

2014-08-12 Thread Heikki Linnakangas

On 08/06/2014 08:37 PM, Jeff Janes wrote:

But now it looks like 0002 needs a rebase


I've committed the refactoring patch, and here's a rebased and improved 
version of the Windows SChannel implementation over that.


Server-side support is now implemented too, but it's all very crude and 
work-in-progress. CRLs are not supported, intermediary CAs are not 
supported, and probably many other bells and whistles are missing too. 
But the basics work, including cert authentication. Consider this a 
Proof of Concept.


One issue came up with managing private keys: In the server, it's 
necessary to import the private key into a permanent key container 
that's managed by the Windows Crypto API. That can be done 
programmatically (as I do in the patch), but the keys are permanently 
stored in the system (in the user's profile). They will be left behind 
even if you completely delete the data directory. That's not the end of 
the world, but it would be nicer if we could use some kind of a 
temporary key container that only lives in memory, but the Crypto API 
doesn't seem to have such a concept. You can acquire an ephemeral 
context by passing the CRYPT_VERIFYCONTEXT flag to CryptAcquireContext 
function, and that's exactly what I'm doing in the client, but that 
method doesn't seem to work when acting as an SSL server.


Also, the key container needs to be given a name, or we can use the 
default container, but either way all the keys are shared among all 
applications that use the same container. We'll have to figure out how 
to set that up so that there are no conflicts, if you try to use the 
same server certificate for two PostgreSQL instances running on the same 
host (useful while developing/testing replication).


This isn't a showstopper, but needs some thought. As the patch stands, 
it uses a single key container called PostgreSQL server key container, 
and makes no attempt to delete the keys after they're no longer used. 
That works, but it leaves the key lying on the system.


- Heikki

diff --git a/configure b/configure
index 0f435b5..d00290a 100755
--- a/configure
+++ b/configure
@@ -707,6 +707,7 @@ XML2_CONFIG
 UUID_EXTRA_OBJS
 with_uuid
 with_selinux
+with_winschannel
 with_openssl
 krb_srvtab
 with_python
@@ -823,6 +824,7 @@ with_pam
 with_ldap
 with_bonjour
 with_openssl
+with_winschannel
 with_selinux
 with_readline
 with_libedit_preferred
@@ -1509,6 +1511,7 @@ Optional Packages:
   --with-ldap build with LDAP support
   --with-bonjour  build with Bonjour support
   --with-openssl  build with OpenSSL support
+  --with-winschannel  build with Windows SChannel support
   --with-selinux  build with SELinux support
   --without-readline  do not use GNU Readline nor BSD Libedit for editing
   --with-libedit-preferred
@@ -5514,6 +5517,46 @@ $as_echo $with_openssl 6; }
 
 
 #
+# Windows SChannel
+#
+{ $as_echo $as_me:${as_lineno-$LINENO}: checking whether to build with native Windows SSL support 5
+$as_echo_n checking whether to build with native Windows SSL support...  6; }
+
+
+
+# Check whether --with-winschannel was given.
+if test ${with_winschannel+set} = set; then :
+  withval=$with_winschannel;
+  case $withval in
+yes)
+
+$as_echo #define USE_WINDOWS_SCHANNEL 1 confdefs.h
+
+  ;;
+no)
+  :
+  ;;
+*)
+  as_fn_error $? no argument expected for --with-winschannel option $LINENO 5
+  ;;
+  esac
+
+else
+  with_winschannel=no
+
+fi
+
+
+{ $as_echo $as_me:${as_lineno-$LINENO}: result: $with_winschannel 5
+$as_echo $with_winschannel 6; }
+
+
+if test $with_openssl = yes -a $with_winschannel = yes ; then
+  as_fn_error $? 
+*** Cannot select both OpenSSL and Windows SChannel. $LINENO 5
+fi
+
+#
 # SELinux
 #
 { $as_echo $as_me:${as_lineno-$LINENO}: checking whether to build with SELinux support 5
diff --git a/configure.in b/configure.in
index f8a4507..132fb0a 100644
--- a/configure.in
+++ b/configure.in
@@ -662,6 +662,20 @@ AC_MSG_RESULT([$with_openssl])
 AC_SUBST(with_openssl)
 
 #
+# Windows SChannel
+#
+AC_MSG_CHECKING([whether to build with native Windows SSL support])
+PGAC_ARG_BOOL(with, winschannel, no, [build with Windows SChannel support],
+  [AC_DEFINE([USE_WINDOWS_SCHANNEL], 1, [Define to build with Windows SChannel support. (--with-winschannel)])])
+AC_MSG_RESULT([$with_winschannel])
+AC_SUBST(with_winschannel)
+
+if test $with_openssl = yes -a $with_winschannel = yes ; then
+  AC_MSG_ERROR([
+*** Cannot select both OpenSSL and Windows SChannel.])
+fi
+
+#
 # SELinux
 #
 AC_MSG_CHECKING([whether to build with SELinux support])
diff --git a/src/backend/libpq/Makefile b/src/backend/libpq/Makefile
index 8be0572..88d0b8b 100644
--- a/src/backend/libpq/Makefile
+++ b/src/backend/libpq/Makefile
@@ -21,4 +21,8 @@ ifeq ($(with_openssl),yes)
 OBJS += be-secure-openssl.o
 endif
 
+ifeq ($(with_winschannel),yes)
+OBJS += be-secure-winschannel.o
+endif
+
 include $(top_srcdir)/src/backend/common.mk
diff 

[HACKERS] proposal for 9.5: monitoring lock time for slow queries

2014-08-12 Thread Pavel Stehule
Hello

I was asked about possibility to show a lock time of slow queries.

I proposed a design based on log line prefix, but it was rejected.

Any idea how to show a lock time in some practical form together with
logged slow query?


Regards

Pavel


[HACKERS] minor typo in pgbench doc (2)

2014-08-12 Thread Fabien COELHO


Yet another very minor typo in pgbench doc.

I'm not sure of the best way to show formula in the doc.

--
Fabien.diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index 1551686..7d09a2d 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -782,7 +782,7 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/
 (PHI(2.0 * threshold * (i - min - mu + 0.5) / (max - min + 1)) -
  PHI(2.0 * threshold * (i - min - mu - 0.5) / (max - min + 1))) /
  (2.0 * PHI(threshold) - 1.0)
-  /
+  /.
   Intuitively, the larger the replaceablethreshold/, the more
   frequently values close to the middle of the interval are drawn, and the
   less frequently values close to the replaceablemin/ and

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replication commands and log_statements

2014-08-12 Thread Bruce Momjian
On Tue, Aug 12, 2014 at 10:07:34AM +0530, Amit Kapila wrote:
 On Tue, Aug 12, 2014 at 9:29 AM, Fujii Masao masao.fu...@gmail.com wrote:
  On Tue, Aug 12, 2014 at 2:34 AM, Robert Haas robertmh...@gmail.com wrote:
  
   If you have a user devoted to it, I suppose that's true.  I still
   think it shouldn't get munged together like that.
 
  Why do we need to treat only replication commands as special ones and
  add new parameter to display them?
 
 One difference is that replication commands are internally generated
 commands. Do we anywhere else log such internally generated
 commands with log_statement = all?

Good point --- we do not.  In fact, this is similar to how we don't log
SPI queries, e.g. SQL queries inside functions.  We might want to enable
that someday too.  Could we enable logging of both with a single GUC?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-08-12 Thread Andres Freund
On 2014-08-06 15:42:08 +0530, Amit Kapila wrote:
 On Tue, Aug 5, 2014 at 9:21 PM, Robert Haas robertmh...@gmail.com wrote:
  On Thu, Jun 5, 2014 at 4:43 AM, Amit Kapila amit.kapil...@gmail.com
 wrote:
  This essentially removes BgWriterDelay, but it's still mentioned in
 BgBufferSync().  Looking further, I see that with the patch applied,
 BgBufferSync() is still present in the source code but is no longer called
 from anywhere.  Please don't submit patches that render things unused
 without actually removing them; it makes it much harder to see what you've
 changed.  I realize you probably left it that way for testing purposes, but
 you need to clean such things up before submitting.  Likewise, if you've
 rendered GUCs or statistics counters removed, you need to rip them out, so
 that the scope of the changes you've made is clear to reviewers.

FWIW, I found this email amost unreadable because it misses quoting
signs after linebreaks in quoted content.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format and API changes (9.5)

2014-08-12 Thread Andres Freund
On 2014-08-12 12:44:22 +0300, Heikki Linnakangas wrote:
 Here's a new patch with those little things fixed.

I've so far ignored this thread... I'm now taking a look. Unfortunately
I have to admit I'm not unequivocally happy.

* XLogRegisterData/XLogRegisterRecData imo don't really form a
  consistent API namewise. What does 'Rec' mean here?

* I'm distinctly not a fan of passing around both the LSN and the struct
  XLogRecord to functions. We should bit the bullet and use something
  encapsulating both.

* XLogReplayBuffer imo isn't a very good name - the buffer isn't
  replayed. If anything it's sometimes replaced by the backup block, but
  the real replay still happens outside of that function.

* Why do we need XLogBeginInsert(). Right now this leads to uglyness
  like duplicated if (RelationNeedsWAL()) wal checks and
  XLogCancelInsert() of inserts that haven't been started. And if we
  have Begin, why do we also need Cancel?

* XXX: do we need to do this for every page? - yes, and it's every
  tuple, not every page... And It needs to be done *after* the tuple has
  been put on the page, not before. Otherwise the location will be wrong.

* The patch mixes the API changes around WAL records with changes of how
  individual actions are logged. That makes it rather hard to review -
  and it's a 500kb patch already.

  I realize it's hard to avoid because the new API changes which
  information about blocks is available, but it does make it really hard
  to see whether the old/new code is doing something
  equivalent. It's hard to find more critical code than this :/

Have you done any performance evaluation?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minmax indexes

2014-08-12 Thread Alvaro Herrera
Heikki Linnakangas wrote:
 I couldn't resist starting to hack on this, and implemented the
 scheme I've been having in mind:
 
 1. MMTuple contains the block number of the heap page (range) that
 the tuple represents. Vacuum is no longer needed to clean up old
 tuples; when an index tuples is updated, the old tuple is deleted
 atomically with the insertion of a new tuple and updating the
 revmap, so no garbage is left behind.
 
 2. LockTuple is gone. When following the pointer from revmap to
 MMTuple, the block number is used to check that you land on the
 right tuple. If not, the search is started over, looking at the
 revmap again.

Thanks, looks good, yeah.  Did you just forget to attach the
access/rmgrdesc/minmaxdesc.c file, or did you ignore it altogether?
Anyway I hacked one up, and cleaned up some other things.

 I'm sure this still needs some cleanup, but here's the patch, based
 on your v14. Now that I know what this approach looks like, I still
 like it much better. The insert and update code is somewhat more
 complicated, because you have to be careful to lock the old page,
 new page, and revmap page in the right order. But it's not too bad,
 and it gets rid of all the complexity in vacuum.

It seems there is some issue here, because pageinspect tells me the
index is not growing properly for some reason.  minmax_revmap_data gives
me this array of TIDs after a bunch of insert/vacuum/delete/ etc:

(2,1),(2,2),(2,3),(2,4),(2,5),(4,1),(5,1),(6,1),(7,1),(8,1),(9,1),(10,1),(11,1),(12,1),(13,1),(14,1),(15,1),(16,1),(17,1),(18,1),(19,1),(20,1),(21,1),(22,1),(23,1),(24,1),(25,1),(26,1),(27,1),(28,1),(29,1),(30,1),(31,1),(32,1),(33,1),(34,1),(35,1),(36,1),(37,1),(38,1),(39,1),(40,1),(41,1),(42,1),(43,1),(44,1),(45,1),(46,1),(47,1),(48,1),(49,1),(50,1),(51,1),(52,1),(53,1),(54,1),(55,1),(56,1),(57,1),(58,1),(59,1),(60,1),(61,1),(62,1),(63,1),(64,1),(65,1),(66,1),(67,1),(68,1),(69,1),(70,1),(71,1),(72,1),(73,1),(74,1),(75,1),(76,1),(77,1),(78,1),(79,1),(80,1),(81,1),(82,1),(83,1),(84,1),(85,1),(86,1),(87,1),(88,1),(89,1),(90,1),(91,1),(92,1),(93,1),(94,1),(95,1),(96,1),(97,1),(98,1),(99,1),(100,1),(101,1),(102,1),(103,1),(104,1),(105,1),(106,1),(107,1),(108,1),(109,1),(110,1),(111,1),(112,1),(113,1),(114,1),(115,1),(116,1),(117,1),(118,1),(119,1),(120,1),(121,1),(122,1),(123,1),(124,1),(125,1),(126,1),(127,1),(128,1),(129,1),(130,1),(131,1),(132,1),(133,1),(134,1)

There are some who would think that getting one item per page is
suboptimal.  (Maybe it's just a missing FSM update somewhere.)


I've been hacking away a bit more at this; will post updated patch
probably tomorrow (was about to post but just found a memory stomp in
pageinspect.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replication commands and log_statements

2014-08-12 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
 On Tue, Aug 12, 2014 at 10:07:34AM +0530, Amit Kapila wrote:
  On Tue, Aug 12, 2014 at 9:29 AM, Fujii Masao masao.fu...@gmail.com wrote:
   On Tue, Aug 12, 2014 at 2:34 AM, Robert Haas robertmh...@gmail.com 
   wrote:
   
If you have a user devoted to it, I suppose that's true.  I still
think it shouldn't get munged together like that.
  
   Why do we need to treat only replication commands as special ones and
   add new parameter to display them?
  
  One difference is that replication commands are internally generated
  commands. Do we anywhere else log such internally generated
  commands with log_statement = all?

Not entirely sure what you're referring to as 'internally generated'
here..  While they can come from another backend, they don't have to.

 Good point --- we do not.  In fact, this is similar to how we don't log
 SPI queries, e.g. SQL queries inside functions.  We might want to enable
 that someday too.  Could we enable logging of both with a single GUC?

I don't see those as the same at all.  I'd like to see improved
flexibility in this area, certainly, but don't want two independent
considerations like these tied to one GUC..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-12 Thread Bruce Momjian
On Mon, Aug 11, 2014 at 01:44:05PM -0700, Peter Geoghegan wrote:
 On Mon, Aug 11, 2014 at 1:01 PM, Stephen Frost sfr...@snowman.net wrote:
  We've got a clear example of someone, quite reasonably, expecting their
  JSONB object to be compressed using the normal TOAST mechanism, and
  we're failing to do that in cases where it's actually a win to do so.
  That's the focus of this discussion and what needs to be addressed
  before 9.4 goes out.
 
 Sure. I'm not trying to minimize that. We should fix it, certainly.
 However, it does bear considering that JSON data, with each document
 stored in a row is not an effective target for TOAST compression in
 general, even as text.

Seems we have two issues:

1)  the header makes testing for compression likely to fail
2)  use of pointers rather than offsets reduces compression potential

I understand we are focusing on #1, but how much does compression reduce
the storage size with and without #2?  Seems we need to know that answer
before deciding if it is worth reducing the ability to do fast lookups
with #2.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] PostgreSQL 9.4 mmap(2) performance regression on FreeBSD...

2014-08-12 Thread Robert Haas
On Tue, Aug 12, 2014 at 12:59 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-08-12 09:42:30 -0700, Sean Chittenden wrote:
 One of the patches that I've been sitting on and am derelict in punting
 upstream is the attached mmap(2) flags patch for the BSDs. Is there any
 chance this can be squeezed in to the PostreSQL 9.4 release?

 The patch is trivial in size and is used to add one flag to mmap(2) calls in
 dsm_impl.c.  Alan Cox (FreeBSD alc, not Linux) and I went back and forth
 regarding PostgreSQL's use of mmap(2) and determined that the following is
 correct and will prevent a likely performance regression in PostgreSQL 9.4.
 In PostgreSQL 9.3, all mmap(2) calls were called with the flags MAP_ANON |
 MAP_SHARED, whereas in PostgreSQL 9.4 this is not the case.

 The performancewise important call to mmap will still use that set of
 flags, no? That's the one backing shared_buffers.

 The mmap backend for *dynamic* shared memory (aka dsm) is *NOT* supposed
 to be used on common platforms. Both posix and sysv shared memory will
 be used before falling back to the mmap() backend.

Hmm, yeah.  This might still be a good thing to do (because what do we
lose?) but it shouldn't really be an issue in practice.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-12 Thread Stephen Frost
Claudio,

* Claudio Freire (klaussfre...@gmail.com) wrote:
 I'm not talking about malicious attacks, with big enough data sets,
 checksum collisions are much more likely to happen than with smaller
 ones, and incremental backups are supposed to work for the big sets.

This is an issue when you're talking about de-duplication, not when
you're talking about testing if two files are the same or not for
incremental backup purposes.  The size of the overall data set in this
case is not relevant as you're only ever looking at the same (at most
1G) specific file in the PostgreSQL data directory.  Were you able to
actually produce a file with a colliding checksum as an existing PG
file, the chance that you'd be able to construct one which *also* has
a valid page layout sufficient that it wouldn't be obviously massivly
corrupted is very quickly approaching zero.

 You could use strong cryptographic checksums, but such strong
 checksums still aren't perfect, and even if you accept the slim chance
 of collision, they are quite expensive to compute, so it's bound to be
 a bottleneck with good I/O subsystems. Checking the LSN is much
 cheaper.

For my 2c on this- I'm actually behind the idea of using the LSN (though
I have not followed this thread in any detail), but there's plenty of
existing incremental backup solutions (PG specific and not) which work
just fine by doing checksums.  If you truely feel that this is a real
concern, I'd suggest you review the rsync binary diff protocol which is
used extensively around the world and show reports of it failing in the
field.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] WAL format and API changes (9.5)

2014-08-12 Thread Heikki Linnakangas

On 08/13/2014 01:04 AM, Andres Freund wrote:

On 2014-08-12 12:44:22 +0300, Heikki Linnakangas wrote:

Here's a new patch with those little things fixed.


I've so far ignored this thread... I'm now taking a look. Unfortunately
I have to admit I'm not unequivocally happy.

* XLogRegisterData/XLogRegisterRecData imo don't really form a
   consistent API namewise. What does 'Rec' mean here?


The latter function is actually called XLogRegisterBufData. The README 
was wrong, will fix.



* I'm distinctly not a fan of passing around both the LSN and the struct
   XLogRecord to functions. We should bit the bullet and use something
   encapsulating both.


You mean, the redo functions? Seems fine to me, and always it's been 
like that...



* XLogReplayBuffer imo isn't a very good name - the buffer isn't
   replayed. If anything it's sometimes replaced by the backup block, but
   the real replay still happens outside of that function.


I agree, but haven't come up with a better name.


* Why do we need XLogBeginInsert(). Right now this leads to uglyness
   like duplicated if (RelationNeedsWAL()) wal checks and
   XLogCancelInsert() of inserts that haven't been started.


I like clearly marking the beginning of the insert, with 
XLogBeginInsert(). We certainly could design it so that it's not needed, 
and you could just start registering stuff without calling 
XLogBeginInsert() first. But I believe it's more robust this way. For 
example, you will get an error at the next XLogBeginInsert(), if a 
previous operation had already registered some data without calling 
XLogInsert().



And if we have Begin, why do we also need Cancel?


We could just automatically cancel any previous insertion when 
XLogBeginInsert() is called, but again it seems like bad form to leave 
references to buffers and data just lying there, if an operation bails 
out after registering some stuff and doesn't finish the insertion.



* XXX: do we need to do this for every page? - yes, and it's every
   tuple, not every page... And It needs to be done *after* the tuple has
   been put on the page, not before. Otherwise the location will be wrong.


That comment is in heap_multi_insert(). All the inserted tuples have the 
same command id, seems redundant to log multiple NEW_CID records for them.



* The patch mixes the API changes around WAL records with changes of how
   individual actions are logged. That makes it rather hard to review -
   and it's a 500kb patch already.

   I realize it's hard to avoid because the new API changes which
   information about blocks is available, but it does make it really hard
   to see whether the old/new code is doing something
   equivalent. It's hard to find more critical code than this :/


Yeah, I hear you. I considered doing this piecemeal, just adding the new 
functions first so that you could still use the old XLogRecData API, 
until all the functions have been converted. But in the end, I figured 
it's not worth it, as sooner or later we'd want to convert all the 
functions anyway.



Have you done any performance evaluation?


No, not yet.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inconsistent use of --slot/-S in pg_receivexlog and pg_recvlogical

2014-08-12 Thread Fujii Masao
On Tue, Aug 12, 2014 at 7:27 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Aug 12, 2014 at 4:50 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 Hi,

 I noticed that pg_receivexlog is able to use --slot but not -S, even
 if the code is written this way. Attached is a patch correcting that.
 This makes pg_receivexlog consistent with pg_recvlogical regarding the
 slot option.
 IMHO, this should be backpatched to REL9_4_STABLE.
 Regards,

 Looks good to me. And, I could not find the reason why only -S was not
 added, in the past discussion.

 Anyway, barring any objection, I will commit this.

Applied. Thanks!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strncpy is not a safe version of strcpy

2014-08-12 Thread Noah Misch
On Sat, Nov 16, 2013 at 12:53:10PM +1300, David Rowley wrote:
 I went on a bit of a strncpy cleanup rampage this morning and ended up
 finding quite a few places where strncpy is used wrongly.
 I'm not quite sure if I have got them all in this patch, but I' think I've
 got the obvious ones at least.
 
 For the hash_search in jsconfuncs.c after thinking about it a bit more...
 Can we not just pass the attname without making a copy of it? I see keyPtr
 in hash_search is const void * so it shouldn't get modified in there. I
 can't quite see the reason for making the copy.

+1 for the goal of this patch.  Another commit took care of your jsonfuncs.c
concerns, and the patch for CVE-2014-0065 fixed several of the others.  Plenty
remain, though.

 Attached is a patch with various cleanups where I didn't like the look of
 the strncpy. I didn't go overboard with this as I know making this sort of
 small changes all over can be a bit scary and I thought maybe it would get
 rejected on that basis.
 
 I also cleaned up things like strncpy(dest, src, strlen(src)); which just
 seems a bit weird and I'm failing to get my head around why it was done. I
 replaced these with memcpy instead, but they could perhaps be a plain old
 strcpy.

I suggest preparing one or more patches that focus on the cosmetic-only
changes, such as strncpy() - memcpy() when strncpy() is guaranteed not to
reach a NUL byte.  With that noise out of the way, it will be easier to give
the rest the attention it deserves.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Production block comparison facility

2014-08-12 Thread Michael Paquier
On Wed, Jul 23, 2014 at 11:14 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Things could be refactored and improved for sure, but this patch is already
 useful as-is so I am going to add it to the next commit fest.

After some more investigation, I am going to mark this patch as
Returned with feedback for the time being (mainly to let it show up
on the commit fest app and for the sake of archives), Mainly for two
reasons:
- We can do better than what I sent: instead of checking if the FPW
and the current page are somewhat consistent, we could actually check
if the current page is equal with the FPW after applying WAL on it. In
order to do that, we would need to bypass the FPW replay and to apply
WAL on the current page (if the page is already initialized), then
control RestoreBackupBlock (or its equivalent) that with an additional
flag to tell that block is not restored, but can get WAL applied to
it safely. Then a comparison with the FPW contained in the WAL record
can be made.
- The patch of Heikki to change the WAL APIs and track more easily the
blocks changes is going to make this implementation far easier. It
also improves the status checks on which block has been restored, so
it is more easily extensible for what could be done here.

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2014-08-12 Thread Etsuro Fujita

(2014/08/12 18:34), Shigeru Hanada wrote:

Issues addressed by Eitoku-san were fixed properly, but he found a bug
and a possible enhancement  in the v2 patch.


Thank you for the review, Hanada-san and Eitoku-san!


* push-down check misses delete triggers
update_is_pushdown_safe() seems to have a bug that it misses the
existence of row-level delete trigger.  DELETE statement executed
against a foreign table which has row-level delete trigger is pushed
down to remote, and consequently no row-level delete trigger is fired.


Ah, I noticed that the current code for that is not correct.  Will fix.


* further optimization
Is there any chance to consider further optimization by passing the
operation type (UPDATE|DELETE) of undergoing statement to
update_is_pushdown_safe()?  It seems safe to push down UPDATE
statement when the target foreign table has no update trigger even it
has a delete trigger (of course the opposite combination would be also
fine).


Good idea!  Will improve that too.


* Documentation
The requirement of pushing down UPDATE/DELETE statements would not be
easy to understand for non-expert users, so it seems that there is a
room to enhance documentation.  An idea is to define which expression
is safe to send to remote first (it might need to mention the
difference of semantics), and refer the definition from the place
describing the requirement of pushing-down for SELECT, UPDATE and
DELETE.


Yeah, I also think that it would not necessarily easy for the users to 
understand which expression is safe to send.  So I agree with that 
enhancement, but ISTM that it would be better to do that as a separate 
patch.


Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp

2014-08-12 Thread Fujii Masao
On Mon, Aug 11, 2014 at 8:27 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Aug 11, 2014 at 4:46 PM, Andres Freund and...@2ndquadrant.com wrote:
 Hi,

 On 2011-10-04 20:52:59 +0900, Fujii Masao wrote:
 *** a/src/backend/access/transam/xact.c
 --- b/src/backend/access/transam/xact.c
 ***
 *** 1066,1071  RecordTransactionCommit(void)
 --- 1066,1074 

   (void) XLogInsert(RM_XACT_ID, 
 XLOG_XACT_COMMIT_COMPACT, rdata);
   }
 +
 + /* Save timestamp of latest transaction commit record */
 + pgstat_report_xact_end_timestamp(xactStopTimestamp);
   }


 Perhaps that pgstat_report() should instead be combined with the
 pgstat_report_xact_timestamp(0) in CommitTransaction()? Then the number
 of changecount increases and cacheline references would stay the
 same. The only thing that'd change would be a single additional
 assignment.

 Sounds good suggestion.

I attached the updated version of the patch. I changed pgstat_report_xx
functions like Andres suggested.

 While reading the patch again, I found it didn't handle the COMMIT/ABORT
 PREPARED case properly. According to the commit e74e090, now
 pg_last_xact_replay_timestamp() returns the timestamp of COMMIT/ABORT 
 PREPARED.
 pg_last_xact_insert_timestamp() is mainly expected to be used to calculate
 the replication delay, so it also needs to return that timestam. But the patch
 didn't change 2PC code at all. We need to add 
 pgstat_report_xact_end_timestamp()
 into FinishPreparedTransaction(), RecordTransactionCommitPrepared() or
 RecordTransactionAbortPrepared().

Done.

Regards,

-- 
Fujii Masao
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 16116,16121  SELECT set_config('log_statement_stats', 'off', false);
--- 16116,16124 
  primarypg_current_xlog_location/primary
 /indexterm
 indexterm
+ primarypg_last_xact_insert_timestamp/primary
+/indexterm
+indexterm
  primarypg_start_backup/primary
 /indexterm
 indexterm
***
*** 16180,16185  SELECT set_config('log_statement_stats', 'off', false);
--- 16183,16195 
/row
row
 entry
+literalfunctionpg_last_xact_insert_timestamp()/function/literal
+/entry
+entrytypetimestamp with time zone/type/entry
+entryGet last transaction log insert time stamp/entry
+   /row
+   row
+ entry
  literalfunctionpg_start_backup(parameterlabel/ typetext/ optional, parameterfast/ typeboolean/ /optional)/function/literal
  /entry
 entrytypepg_lsn/type/entry
***
*** 16334,16339  postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
--- 16344,16356 
 /para
  
 para
+ functionpg_last_xact_insert_timestamp/ displays the time stamp of last inserted
+ transaction. This is the time at which the commit or abort WAL record for that transaction.
+ If there has been no transaction committed or aborted yet since the server has started,
+ this function returns NULL.
+/para
+ 
+para
  For details about proper usage of these functions, see
  xref linkend=continuous-archiving.
 /para
*** a/doc/src/sgml/high-availability.sgml
--- b/doc/src/sgml/high-availability.sgml
***
*** 862,867  primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
--- 862,876 
   commandps/ command (see xref linkend=monitoring-ps for details).
  /para
  para
+  You can also calculate the lag in time stamp by comparing the last
+  WAL insert time stamp on the primary with the last WAL replay
+  time stamp on the standby. They can be retrieved using
+  functionpg_last_xact_insert_timestamp/ on the primary and
+  the functionpg_last_xact_replay_timestamp/ on the standby,
+  respectively (see xref linkend=functions-admin-backup-table and
+  xref linkend=functions-recovery-info-table for details).
+ /para
+ para
   You can retrieve a list of WAL sender processes via the
   link linkend=monitoring-stats-views-table
   literalpg_stat_replication//link view. Large differences between
*** a/src/backend/access/transam/twophase.c
--- b/src/backend/access/transam/twophase.c
***
*** 156,167  static void RecordTransactionCommitPrepared(TransactionId xid,
  RelFileNode *rels,
  int ninvalmsgs,
  SharedInvalidationMessage *invalmsgs,
! bool initfileinval);
  static void RecordTransactionAbortPrepared(TransactionId xid,
  			   int nchildren,
  			   TransactionId *children,
  			   int nrels,
! 			   RelFileNode *rels);
  static void ProcessRecords(char *bufptr, TransactionId xid,
  			   const TwoPhaseCallback callbacks[]);
  static void RemoveGXact(GlobalTransaction gxact);
--- 156,169 
  RelFileNode *rels,
  int ninvalmsgs,
  SharedInvalidationMessage 

Re: [HACKERS] replication commands and log_statements

2014-08-12 Thread Amit Kapila
On Wed, Aug 13, 2014 at 4:24 AM, Stephen Frost sfr...@snowman.net wrote:
 * Bruce Momjian (br...@momjian.us) wrote:
  On Tue, Aug 12, 2014 at 10:07:34AM +0530, Amit Kapila wrote:

   One difference is that replication commands are internally generated
   commands. Do we anywhere else log such internally generated
   commands with log_statement = all?

 Not entirely sure what you're referring to as 'internally generated'
 here..

Here 'internally generated' means that user doesn't execute those
statements, rather the replication/backup code form these statements
(IDENTIFY_SYSTEM, TIMELINE_HISTORY, BASE_BACKUP, ...)
and send to server to get the appropriate results.

While they can come from another backend, they don't have to.

  Good point --- we do not.  In fact, this is similar to how we don't log
  SPI queries, e.g. SQL queries inside functions.  We might want to enable
  that someday too.

Yes, few days back I have seen one of the user expects
such functionality. Refer below mail:
http://www.postgresql.org/message-id/caa4ek1lvuirqnmjv1vtmrg_f+1f9e9-8rdgnwidscqvtps1...@mail.gmail.com

Could we enable logging of both with a single GUC?

 I don't see those as the same at all.  I'd like to see improved
 flexibility in this area, certainly, but don't want two independent
 considerations like these tied to one GUC..

Agreed, I also think both are different and shouldn't be logged
with one GUC.  Here the important thing to decide is which is
the better way to proceed for allowing logging of replication
commands such that it can allow us to extend it for more
things.  We have discussed below options:

a. Make log_statement a list of comma separated options
b. Have a separate parameter something like log_replication*
c. Log it when user has mentioned option 'all' in log_statement.

From future extensibility pov, I think option (a) is a good
choice or we could even invent a new scheme for logging
commands which would be something similar to
log_min_messages where we can define levels
level - 0 (none)
level - 1 (dml)
level - 2 (ddl)
level - 4 (replication)
level - 8 (all)

Here if user specifies level = 2, then DDL commands will
get logged and level = 3 would mean log dml and ddl commands.
From backward compatibility, we can keep current parameter
as it is and introduce this a new parameter.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] failures on barnacle (CLOBBER_CACHE_RECURSIVELY) because of memory leaks

2014-08-12 Thread Tom Lane
I wrote:
 Tomas Vondra t...@fuzzy.cz writes:
 So after 83 days, the regression tests on barnacle completed,

 Hah, that's perseverance!

 and it
 smells like another memory leak in CacheMemoryContext, similar to those
 fixed in 078b2ed on May 18.

 Ugh, will look.

I've been experimenting by running the create_view test in isolation
(it's not quite self-contained, but close enough) and I find that there
are two memory leaks exposed here:

1. The relcache.c functions that provide on-demand caching, namely
RelationGetIndexList and RelationGetIndexAttrBitmap, are not careful
to free the old values (if any) of the relcache fields they fill.
I think we believed that the old values would surely always be null ...
but that's not necessarily the case when doing a recursive cache reload
of a system catalog, because we might've used/filled the fields while
fetching the data needed to fill them.  This results in a session-lifespan
leak of data in CacheMemoryContext, which is what Tomas saw.  (I'm not
real sure that this is a live issue for RelationGetIndexAttrBitmap, but
it demonstrably is for RelationGetIndexList.)

2. reloptions.c's parseRelOptions() leaks memory when disassembling a
reloptions array.  The leak is in the caller's memory context which
is typically query-lifespan, so under normal circumstances this is not
significant.  However, a CLOBBER_CACHE_RECURSIVELY build can call this
an awful lot of times within a single query.  The queries in create_view
that operate on views with security_barrier reloptions manage to eat
quite a lot of memory this way.

The attached patch fixes both of these things.  I'm inclined to think
it is not worth back-patching because neither effect could amount to
anything noticeable without CLOBBER_CACHE_RECURSIVELY.  Anyone think
otherwise?

regards, tom lane

diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index c7ad6f9..e5f0240 100644
*** a/src/backend/access/common/reloptions.c
--- b/src/backend/access/common/reloptions.c
*** parseRelOptions(Datum options, bool vali
*** 912,925 
  	/* Done if no options */
  	if (PointerIsValid(DatumGetPointer(options)))
  	{
! 		ArrayType  *array;
  		Datum	   *optiondatums;
  		int			noptions;
  
- 		array = DatumGetArrayTypeP(options);
- 
- 		Assert(ARR_ELEMTYPE(array) == TEXTOID);
- 
  		deconstruct_array(array, TEXTOID, -1, false, 'i',
  		  optiondatums, NULL, noptions);
  
--- 912,921 
  	/* Done if no options */
  	if (PointerIsValid(DatumGetPointer(options)))
  	{
! 		ArrayType  *array = DatumGetArrayTypeP(options);
  		Datum	   *optiondatums;
  		int			noptions;
  
  		deconstruct_array(array, TEXTOID, -1, false, 'i',
  		  optiondatums, NULL, noptions);
  
*** parseRelOptions(Datum options, bool vali
*** 959,964 
--- 955,965 
  		 errmsg(unrecognized parameter \%s\, s)));
  			}
  		}
+ 
+ 		/* It's worth avoiding memory leaks in this function */
+ 		pfree(optiondatums);
+ 		if (((void *) array) != DatumGetPointer(options))
+ 			pfree(array);
  	}
  
  	*numrelopts = numoptions;
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 10d300a..fd84853 100644
*** a/src/backend/utils/cache/relcache.c
--- b/src/backend/utils/cache/relcache.c
*** RelationGetIndexList(Relation relation)
*** 3646,3651 
--- 3646,3652 
  	ScanKeyData skey;
  	HeapTuple	htup;
  	List	   *result;
+ 	List	   *oldlist;
  	char		replident = relation-rd_rel-relreplident;
  	Oid			oidIndex = InvalidOid;
  	Oid			pkeyIndex = InvalidOid;
*** RelationGetIndexList(Relation relation)
*** 3737,3742 
--- 3738,3744 
  
  	/* Now save a copy of the completed list in the relcache entry. */
  	oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+ 	oldlist = relation-rd_indexlist;
  	relation-rd_indexlist = list_copy(result);
  	relation-rd_oidindex = oidIndex;
  	if (replident == REPLICA_IDENTITY_DEFAULT  OidIsValid(pkeyIndex))
*** RelationGetIndexList(Relation relation)
*** 3748,3753 
--- 3750,3758 
  	relation-rd_indexvalid = 1;
  	MemoryContextSwitchTo(oldcxt);
  
+ 	/* Don't leak the old list, if there is one */
+ 	list_free(oldlist);
+ 
  	return result;
  }
  
*** RelationGetIndexAttrBitmap(Relation rela
*** 4141,4146 
--- 4146,4159 
  
  	list_free(indexoidlist);
  
+ 	/* Don't leak any old values of these bitmaps */
+ 	bms_free(relation-rd_indexattr);
+ 	relation-rd_indexattr = NULL;
+ 	bms_free(relation-rd_keyattr);
+ 	relation-rd_keyattr = NULL;
+ 	bms_free(relation-rd_idattr);
+ 	relation-rd_idattr = NULL;
+ 
  	/*
  	 * Now save copies of the bitmaps in the relcache entry.  We intentionally
  	 * set rd_indexattr last, because that's the one that signals validity of

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal for 9.5: monitoring lock time for slow queries

2014-08-12 Thread Michael Paquier
On Wed, Aug 13, 2014 at 4:59 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 Any idea how to show a lock time in some practical form together with logged
 slow query?

Doing a join on pg_stat_activity and pg_locks is not going to help
much as you could only get the moment when query has started or its
state has changed. Have you thought about the addition of a new column
in pg_locks containing the timestamp of the moment a lock has been
taken? I am sure that we are concerned about the performance impact
that extra calls to gettimeofday could have though...
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-08-12 Thread Amit Kapila
On Wed, Aug 13, 2014 at 2:32 AM, Andres Freund and...@2ndquadrant.com
wrote:
 On 2014-08-06 15:42:08 +0530, Amit Kapila wrote:
  On Tue, Aug 5, 2014 at 9:21 PM, Robert Haas robertmh...@gmail.com
wrote:
   This essentially removes BgWriterDelay, but it's still mentioned in
  BgBufferSync().  Looking further, I see that with the patch applied,
  BgBufferSync() is still present in the source code but is no longer
called
  from anywhere.  Please don't submit patches that render things unused
  without actually removing them; it makes it much harder to see what
you've
  changed.  I realize you probably left it that way for testing purposes,
but
  you need to clean such things up before submitting.  Likewise, if you've
  rendered GUCs or statistics counters removed, you need to rip them out,
so
  that the scope of the changes you've made is clear to reviewers.

 FWIW, I found this email amost unreadable because it misses quoting
 signs after linebreaks in quoted content.

I think I have done something wrong while replying to Robert's
mail, the main point in that mail was trying to see if there is any
major problem incase we have separate process (bgreclaim) to
populate freelist.   One thing which I thought could be problematic
is to put a buf in freelist which has usage_count as zero and is *dirty*.
Please do let me know if you want clarification for something in
particular.


Overall, the main changes required in patch as per above feedback
are:
1. add an additional counter for the number of those
allocations not satisfied from the free list, with a
name like buffers_alloc_clocksweep.
2. Autotune the low and high threshold values for buffers
in freelist. In the patch, I have kept them as hard-coded
values.
3. For populating freelist, have a separate process (bgreclaim)
instead of doing it by bgwriter.

There are other things also which I need to take care as per
feedback like some change in locking strategy and code.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-12 Thread Jeff Davis
On Tue, 2014-08-12 at 14:58 +0200, Tomas Vondra wrote:
 CREATE AGGREGATE myaggregate (
 ...
 SERIALIZE_FUNC = 'dump_data',
 DESERIALIZE_FUNC = 'read_data',
 ...
 );

Seems reasonable.

 I don't see why it should get messy? In the end, you have a chunk of
 data and a hash for it.

Perhaps it's fine; I'd have to see the approach.

 It just means you need to walk through the hash table, look at the
 hashes and dump ~50% of the groups to a file. 

If you have fixed-size states, why would you *want* to remove the group?
What is gained?

One thing I like about my simple approach is that it returns a good
number of groups after each pass, and then those are completely finished
(returned to the operator above, even). That's impossible with HashJoin
because the hashing all needs to be done before the probe phase begins.

The weakness of my approach is the array_agg case that you mention,
because this approach doesn't offer a way to dump out transition states.
It seems like that could be added later, but let me know if you see a
problem there.

 I think you're missing the point, here. You need to compute the hash in
 both cases. And then you either can do a lookup or just peek at the first
 few bits of the hash to see whether it's in the current batch or not.

I understood that. The point I was trying to make (which might or might
not be true) was that: (a) this only matters for a failed lookup,
because a successful lookup would just go in the hash table anyway; and
(b) a failed lookup probably doesn't cost much compared to all of the
other things that need to happen along that path.

I should have chosen a better example though. For instance: if the
lookup fails, we need to write the tuple, and writing the tuple is sure
to swamp the cost of a failed hash lookup.

 is much faster than a lookup. Also, as the hash table grows (beyond L3
 cache size, which is a few MBs today), it becomes much slower in my
 experience - that's one of the lessons I learnt while hacking on the
 hashjoin. And we're dealing with hashagg not fitting into work_mem, so
 this seems to be relevant.

Could be, but this is also the path that goes to disk, so I'm not sure
how significant it is.

  Because I suspect there are costs in having an extra file around that
  I'm not accounting for directly. We are implicitly assuming that the OS
  will keep around enough buffers for each BufFile to do sequential writes
  when needed. If we create a zillion partitions, then either we end up
  with random I/O or we push the memory burden into the OS buffer cache.
 
 Assuming I understand it correctly, I think this logic is broken. Are you
 saying We'll try to do memory-bounded hashagg, but not for the really
 large datasets because of fear we might cause random I/O?

No, the memory is still bounded even for very high cardinality inputs
(ignoring array_agg case for now). When a partition is processed later,
it also might exhaust work_mem, and need to write out tuples to its own
set of partitions. This allows memory-bounded execution to succeed even
if the number of partitions each iteration is one, though it will result
in repeated I/O for the same tuple.

 While I certainly understand your concerns about generating excessive
 amount of random I/O, I think the modern filesystem are handling that just
 fine (coalescing the writes into mostly sequential writes, etc.). Also,
 current hardware is really good at handling this (controllers with write
 cache, SSDs etc.).

All of that requires memory. We shouldn't dodge a work_mem limit by
using the kernel's memory, instead.

 Also, if hash-join does not worry about number of batches, why should
 hashagg worry about that? I expect the I/O patterns to be very similar.

One difference with HashJoin is that, to create a large number of
batches, the inner side must be huge, which is not the expected
operating mode for HashJoin[1]. Regardless, every partition that is
active *does* have a memory cost. HashJoin might ignore that cost, but
that doesn't make it right.

I think the right analogy here is to Sort's poly-phase merge -- it
doesn't merge all of the runs at once; see the comments at the top of
tuplesort.c.

In other words, sometimes it's better to have fewer partitions (for
hashing) or merge fewer runs at once (for sorting). It does more
repeated I/O, but the I/O is more sequential.

 In any case, trying to fix this by limiting number of partitions seems
 like a bad approach. I think factoring those concerns into a costing
 model is more appropriate.

Fair enough. I haven't modeled the cost yet; and I agree that an upper
limit is quite crude.

(a) COUNT(DISTINCT) - this is solved by a custom aggregate

Is there a reason we can't offer a hash-based strategy for this one? It
would have to be separate hash tables for different aggregates, but it
seems like it could work.

(b) bad estimate of required memory - this is common for aggregates
passing 'internal' state (planner uses some 

Re: [HACKERS] Support for N synchronous standby servers

2014-08-12 Thread Fujii Masao
On Mon, Aug 11, 2014 at 4:38 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Mon, Aug 11, 2014 at 4:26 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Aug 11, 2014 at 2:10 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 Oh, that worked in my machine, too, this time... I did something wrong.
 Sorry for the noise.
 No problem, thanks for spending time testing.

Probably I got the similar but another problem. I set synchronous_standby_num
to 2 and started up two synchronous standbys. When I ran write transactions,
they were successfully completed. That's OK.

I sent the SIGSTOP signal to the walreceiver process in one of sync standbys,
and then ran write transactions again. In this case, they must not be completed
because their WAL cannot be replicated to the standby that its walreceiver
was stopped. But they were successfully completed.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal for 9.5: monitoring lock time for slow queries

2014-08-12 Thread Tom Lane
Michael Paquier michael.paqu...@gmail.com writes:
 Doing a join on pg_stat_activity and pg_locks is not going to help
 much as you could only get the moment when query has started or its
 state has changed. Have you thought about the addition of a new column
 in pg_locks containing the timestamp of the moment a lock has been
 taken? I am sure that we are concerned about the performance impact
 that extra calls to gettimeofday could have though...

In theory this could be driven off the same gettimeofday needed to
start the deadlock_timeout timer.  Not sure how messy that'd be.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal for 9.5: monitoring lock time for slow queries

2014-08-12 Thread Pavel Stehule
Hi


2014-08-13 6:18 GMT+02:00 Michael Paquier michael.paqu...@gmail.com:

 On Wed, Aug 13, 2014 at 4:59 AM, Pavel Stehule pavel.steh...@gmail.com
 wrote:
  Any idea how to show a lock time in some practical form together with
 logged
  slow query?

 Doing a join on pg_stat_activity and pg_locks is not going to help
 much as you could only get the moment when query has started or its
 state has changed. Have you thought about the addition of a new column
 in pg_locks containing the timestamp of the moment a lock has been
 taken? I am sure that we are concerned about the performance impact
 that extra calls to gettimeofday could have though...
 Regards,


There are two relative independent tasks

a) monitor and show total lock time of living queries

b) monitor and log total lock time of executed queries.

I am interested by @b now. When we work with slow query log, then we would
to identify reason for long duration. Locks are important source of these
queries on some systems.

What I know, we do monitoring these times for deadlock identification
trigger and for log_lock_waits - so there should not be any new pressure to
performance.

Regards

Pavel


 --
 Michael



Re: [HACKERS] proposal for 9.5: monitoring lock time for slow queries

2014-08-12 Thread Pavel Stehule
2014-08-13 7:19 GMT+02:00 Tom Lane t...@sss.pgh.pa.us:

 Michael Paquier michael.paqu...@gmail.com writes:
  Doing a join on pg_stat_activity and pg_locks is not going to help
  much as you could only get the moment when query has started or its
  state has changed. Have you thought about the addition of a new column
  in pg_locks containing the timestamp of the moment a lock has been
  taken? I am sure that we are concerned about the performance impact
  that extra calls to gettimeofday could have though...

 In theory this could be driven off the same gettimeofday needed to
 start the deadlock_timeout timer.  Not sure how messy that'd be.


we use it in out custom patch without problems

Pavel



 regards, tom lane