date:20200515

Re: pgindent && weirdness

2020-05-15 Thread Tom Lane

Thomas Munro  writes:
> On Sat, May 16, 2020 at 10:15 AM Tom Lane  wrote:
>> +1.  I think the repo will let you in, but if not, I can do it.

> It seems I cannot.  Please go ahead.

[ yawn... ]  It's about bedtime here, but I'll take care of it in the
morning.

Off the critical path, we oughta figure out why the repo wouldn't
let you commit.  What I was told was it was set up to be writable
by all PG committers.

> I'll eventually see if I can get this into FreeBSD's usr.bin/indent.

+1 to that, too.

regards, tom lane

Re: Index Skip Scan

2020-05-15 Thread Dilip Kumar

On Fri, 15 May 2020 at 6:06 PM, Dmitry Dolgov <9erthali...@gmail.com> wrote:

> > On Wed, May 13, 2020 at 02:37:21PM +0530, Dilip Kumar wrote:
> >
> > + if (_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf,
> dir))
> > + {
> >
> > Here we expect whether the "next" unique key can be found on this page
> > or not, but we are using the function which suggested whether the
> > "current" key can be found on this page or not.  I think in boundary
> > cases where the high key is equal to the current key, this function
> > will return true (which is expected from this function), and based on
> > that we will simply scan the current page and IMHO that cost could be
> > avoided no?
>
> Yes, looks like you're right, there is indeed an unecessary extra scan
> happening. To avoid that we can see the key->nextkey and adjust higher
> boundary correspondingly. Will also add this into the next rebased
> patch, thanks!


Great thanks!

> --
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: calling procedures is slow and consumes extra much memory against calling function

2020-05-15 Thread Pavel Stehule

so 16. 5. 2020 v 5:55 odesílatel Ranier Vilela  napsal:

> Em sáb., 16 de mai. de 2020 às 00:07, Pavel Stehule <
> pavel.steh...@gmail.com> escreveu:
>
>>
>>
>> so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela 
>> napsal:
>>
>>> Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <
>>> pavel.steh...@gmail.com> escreveu:
>>>
 Hi

 I try to use procedures in Orafce package, and I did some easy
 performance tests. I found some hard problems:

 1. test case

 create or replace procedure p1(inout r int, inout v int) as $$
 begin v := random() * r; end
 $$ language plpgsql;

 This command requires

 do $$
 declare r int default 100; x int;
 begin
   for i in 1..30 loop
  call p1(r, x);
   end loop;
 end;
 $$;

 about 2.2GB RAM and 10 sec.

>>> I am having a consistent result of 3 secs, with a modified version
>>> (exec_stmt_call) of your patch.
>>> But my notebook is (Core 5, 8GB and SSD), could it be a difference in
>>> the testing hardware?
>>>
>>
>> My notebook is old T520, and more I have a configured Postgres with
>> --enable-cassert option.
>>
> The hardware is definitely making a difference, but if you have time and
> don't mind testing it,
> I can send you a patch, not that the modifications are a big deal, but
> maybe they'll help.
>

send me a patch, please

Pavel


>
> regards,
> Ranier Vilela
>

Re: calling procedures is slow and consumes extra much memory against calling function

2020-05-15 Thread Ranier Vilela

Em sáb., 16 de mai. de 2020 às 00:07, Pavel Stehule 
escreveu:

>
>
> so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela 
> napsal:
>
>> Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <
>> pavel.steh...@gmail.com> escreveu:
>>
>>> Hi
>>>
>>> I try to use procedures in Orafce package, and I did some easy
>>> performance tests. I found some hard problems:
>>>
>>> 1. test case
>>>
>>> create or replace procedure p1(inout r int, inout v int) as $$
>>> begin v := random() * r; end
>>> $$ language plpgsql;
>>>
>>> This command requires
>>>
>>> do $$
>>> declare r int default 100; x int;
>>> begin
>>>   for i in 1..30 loop
>>>  call p1(r, x);
>>>   end loop;
>>> end;
>>> $$;
>>>
>>> about 2.2GB RAM and 10 sec.
>>>
>> I am having a consistent result of 3 secs, with a modified version
>> (exec_stmt_call) of your patch.
>> But my notebook is (Core 5, 8GB and SSD), could it be a difference in the
>> testing hardware?
>>
>
> My notebook is old T520, and more I have a configured Postgres with
> --enable-cassert option.
>
The hardware is definitely making a difference, but if you have time and
don't mind testing it,
I can send you a patch, not that the modifications are a big deal, but
maybe they'll help.

regards,
Ranier Vilela

Re: pgindent && weirdness

2020-05-15 Thread Thomas Munro

On Sat, May 16, 2020 at 10:15 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > Here's the patch I propose to commit to pg_bsd_indent, if the repo
> > lets me, and here's the result of running it on the PG tree today.
>
> +1.  I think the repo will let you in, but if not, I can do it.

It seems I cannot.  Please go ahead.

I'll eventually see if I can get this into FreeBSD's usr.bin/indent.
It's possible that that process results in a request to make it
optional (some project with a lot of STACK_OF- and no IsA-style macros
wouldn't like it), but I don't think it hurts to commit it to our copy
like this in the meantime to fix our weird formatting problem.

Re: calling procedures is slow and consumes extra much memory against calling function

2020-05-15 Thread Pavel Stehule

so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela  napsal:

> Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <
> pavel.steh...@gmail.com> escreveu:
>
>> Hi
>>
>> I try to use procedures in Orafce package, and I did some easy
>> performance tests. I found some hard problems:
>>
>> 1. test case
>>
>> create or replace procedure p1(inout r int, inout v int) as $$
>> begin v := random() * r; end
>> $$ language plpgsql;
>>
>> This command requires
>>
>> do $$
>> declare r int default 100; x int;
>> begin
>>   for i in 1..30 loop
>>  call p1(r, x);
>>   end loop;
>> end;
>> $$;
>>
>> about 2.2GB RAM and 10 sec.
>>
> I am having a consistent result of 3 secs, with a modified version
> (exec_stmt_call) of your patch.
> But my notebook is (Core 5, 8GB and SSD), could it be a difference in the
> testing hardware?
>

My notebook is old T520, and more I have a configured Postgres with
--enable-cassert option.

regards

Pavel


> regards,
> Ranier Vilela
>

Re: Potentially misleading name of libpq pass phrase hook

2020-05-15 Thread Michael Paquier

On Fri, May 15, 2020 at 09:21:52PM -0400, Jonathan S. Katz wrote:
> +1 on all of the above.
> 
> I noticed this has been added to Open Items; I added a note that the
> plan is to fix before the Beta 1 wrap.

+1.  Thanks.

Agreed.  PQsslKeyPassHook__type sounds fine to me as
convention.  Wouldn't we want to also rename PQsetSSLKeyPassHook and
PQgetSSLKeyPassHook, appending an "_OpenSSL" to both?
--
Michael


signature.asc
Description: PGP signature

Re: PostgreSQL 13 Beta 1 Release: 2020-05-21

2020-05-15 Thread Tom Lane

"Jonathan S. Katz"  writes:
> Just a reminder that the Beta 1 release is this upcoming Thursday. The
> Open Items list is pretty small at this point, but if you are planning
> to commit anything before the release, please be sure to do so over the
> weekend so we can wrap on Monday.

Pursuant to that, I'm hoping to be done with the wait-events naming issues
by the end of the weekend.  The hardest parts of that (the event types
that are connected to other things) are all done as of a few minutes ago.

regards, tom lane

Re: PostgreSQL 13 Beta 1 Release: 2020-05-21

2020-05-15 Thread Jonathan S. Katz

On 4/24/20 12:27 PM, Jonathan S. Katz wrote:
> Hi,
> 
> The PostgreSQL 13 Release Management Team is pleased to announce that
> the release date for PostgreSQL 13 Beta 1 is set to be 2020-05-21. The
> Open Items page is updated to reflect this.
> 
> We’re excited to make the Beta available for testing and receive some
> early feedback around the latest major release of PostgreSQL.

Just a reminder that the Beta 1 release is this upcoming Thursday. The
Open Items list is pretty small at this point, but if you are planning
to commit anything before the release, please be sure to do so over the
weekend so we can wrap on Monday.

Thanks!

Jonathan

signature.asc
Description: OpenPGP digital signature

Re: Potentially misleading name of libpq pass phrase hook

2020-05-15 Thread Jonathan S. Katz

On 5/15/20 8:21 PM, Tom Lane wrote:
> Magnus Hagander  writes:
>> On Thu, May 14, 2020 at 3:03 PM Daniel Gustafsson  wrote:
>>> Since we haven't shipped this there is still time to rename, which IMO
>>> is the right way forward.  PQsslKeyPassHook__type would be one
>>> option, but perhaps there are better alternatives?
> 
>> ISTM this should be renamed yeah -- and it should probably go on the open
>> item lists, and with the schedule for the beta perhaps dealt with rather
>> urgently?
> 
> +1.  Once beta1 is out the cost to change the name goes up noticeably.
> Not that we *couldn't* do it later, but it'd be better to have it be
> right in beta1.

+1 on all of the above.

I noticed this has been added to Open Items; I added a note that the
plan is to fix before the Beta 1 wrap.

Jonathan



signature.asc
Description: OpenPGP digital signature

Re: pg_stat_wal_receiver and flushedUpto/writtenUpto

2020-05-15 Thread Michael Paquier

On Fri, May 15, 2020 at 07:34:46PM -0400, Alvaro Herrera wrote:
> IIRC the only reason to put the written LSN where it is is so that it's
> below the mutex, to indicate it's not protected by it.  Conceptually,
> the written LSN is "before" the flushed LSN, which is why I propose to
> put it ahead of it.

Sure.  My point was mainly that it is easier to compare if we are
missing any fields in the view and the order is respected.  But it
makes also sense to do things your way, so let's do that.

> Yeah, that seems good (I didn't verify the boilerplate in
> pg_stat_get_wal_receiver or pg_proc.dat).  I propose
> 
> +   Last write-ahead log location already received and written to
> +   disk, but not flushed.  This should not be used for data
> +   integrity checks.

Thanks.  If there are no objections, I'll revisit that tomorrow and
apply it with those changes, just in time for beta1.
--
Michael


signature.asc
Description: PGP signature

Re: Potentially misleading name of libpq pass phrase hook

2020-05-15 Thread Tom Lane

Magnus Hagander  writes:
> On Thu, May 14, 2020 at 3:03 PM Daniel Gustafsson  wrote:
>> Since we haven't shipped this there is still time to rename, which IMO
>> is the right way forward.  PQsslKeyPassHook__type would be one
>> option, but perhaps there are better alternatives?

> ISTM this should be renamed yeah -- and it should probably go on the open
> item lists, and with the schedule for the beta perhaps dealt with rather
> urgently?

+1.  Once beta1 is out the cost to change the name goes up noticeably.
Not that we *couldn't* do it later, but it'd be better to have it be
right in beta1.

regards, tom lane

Re: [PATCH] Fix pg_dump --no-tablespaces for the custom format

2020-05-15 Thread Tom Lane

Christopher Baines  writes:
> So I'm new to poking around in the PostgreSQL code, so this is a bit of
> a shot in the dark. I'm having some problems with pg_dump, and a
> database with tablespaces. A couple of the tables are not in the default
> tablespace, and I want to ignore this for the dump.

I think you've misunderstood how the pieces fit together.  A lot of
the detail-filtering switches, including --no-tablespaces, work on
the output side of the "archive" format.  While you can't really tell
the difference in pg_dump text mode, the implication for custom-format
output is that the info is always there in the archive file, and you
give the switch to pg_restore if you don't want to see the info.
This is more flexible since you aren't compelled to make the decision
up-front, and it doesn't really cost anything to include such info in
the archive.  (Obviously, table-filtering switches don't work that
way, since with those there can be a really large cost in file size
to include unwanted data.)

So from my perspective, things are working fine and this patch would
break it.

If you actually want to suppress this info from getting into the
archive file, you'd have to give a very compelling reason for
breaking this behavior for other people.

regards, tom lane

Re: Potentially misleading name of libpq pass phrase hook

2020-05-15 Thread Alvaro Herrera

On 2020-May-15, Magnus Hagander wrote:

> On Thu, May 14, 2020 at 3:03 PM Daniel Gustafsson  wrote:

> > Since we haven't shipped this there is still time to rename, which
> > IMO is the right way forward.  PQsslKeyPassHook__type would
> > be one option, but perhaps there are better alternatives?
>
> ISTM this should be renamed yeah -- and it should probably go on the open
> item lists, and with the schedule for the beta perhaps dealt with rather
> urgently?

Seems easy enough to do!  +1 on Daniel's suggested renaming.

CCing Andrew as committer.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_stat_wal_receiver and flushedUpto/writtenUpto

2020-05-15 Thread Alvaro Herrera

On 2020-May-16, Michael Paquier wrote:

> On Fri, May 15, 2020 at 01:43:11PM -0400, Alvaro Herrera wrote:
> > Why do you put the column at the end?  I would put written_lsn before
> > flushed_lsn.
> 
> Fine by me.  I was thinking yesterday about putting the written
> position after the flushed one, and finished with something that maps
> with the structure.

IIRC the only reason to put the written LSN where it is is so that it's
below the mutex, to indicate it's not protected by it.  Conceptually,
the written LSN is "before" the flushed LSN, which is why I propose to
put it ahead of it.

> > Since this requires a catversion bump, I think it'd be best to do it
> > before beta1 next week.
> 
> Yes.  What do you think of the attached?

Yeah, that seems good (I didn't verify the boilerplate in
pg_stat_get_wal_receiver or pg_proc.dat).  I propose

+   Last write-ahead log location already received and written to
+   disk, but not flushed.  This should not be used for data
+   integrity checks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg13: xlogreader API adjust

2020-05-15 Thread Alvaro Herrera

On 2020-May-15, Michael Paquier wrote:

> On Thu, May 14, 2020 at 02:12:25PM +0900, Kyotaro Horiguchi wrote:
> > Good catch!  That's not only for CreateDecodingContet. That happens
> > everywhere in the query loop in PostgresMain() until logreader is
> > initialized.  So that also happens, for example, by starting logical
> > replication using invalidated slot. Checking xlogreader != NULL in
> > WalSndErrorCleanup is sufficient.  It doesn't make actual difference,
> > but the attached explicitly initialize the pointer with NULL.
> 
> Alvaro, are you planning to look at that?  Should we have an open item
> for this matter?

On it now.  I'm trying to add a test for this (needs a small change to
PostgresNode->psql), but I'm probably doing something stupid in the Perl
side, because it doesn't detect things as well as I'd like.  Still
trying, but I may be asked to evict the office soon ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 3f3a1d81f6..0645e07789 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -1386,6 +1386,12 @@ the B parameter is also given.
 If B is set and this parameter is given, the scalar it references
 is set to true if the psql call times out.
 
+=item replication => B
+
+If set, pass the value as B in the conninfo string.
+Passing literal B results in a logical replication connection.
+Passing literal B<1> results in a physical replication connection.
+
 =item extra_params => ['--single-transaction']
 
 If given, it must be an array reference containing additional parameters to B.
@@ -1414,10 +1420,14 @@ sub psql
 
 	my $stdout= $params{stdout};
 	my $stderr= $params{stderr};
+	my $replication   = $params{replication};
 	my $timeout   = undef;
 	my $timeout_exception = 'psql timed out';
 	my @psql_params =
-	  ('psql', '-XAtq', '-d', $self->connstr($dbname), '-f', '-');
+	  ('psql', '-XAtq', '-d',
+	   $self->connstr($dbname) .
+	   (defined $replication ? " replication=$replication" : ""),
+	   '-f', '-');
 
 	# If the caller wants an array and hasn't passed stdout/stderr
 	# references, allocate temporary ones to capture them so we
diff --git a/src/test/recovery/t/006_logical_decoding.pl b/src/test/recovery/t/006_logical_decoding.pl
index d40a500ed4..5c7e137e97 100644
--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
@@ -7,7 +7,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 12;
+use Test::More tests => 13;
 use Config;
 
 # Initialize master node
@@ -27,12 +27,20 @@ $node_master->safe_psql('postgres',
 	qq[SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding');]
 );
 
+# Cover walsender error shutdown code
+my ($result, $stdout, $stderr) = $node_master->psql('template1',
+	qq[START_REPLICATION SLOT test_slot LOGICAL 0/0],
+	replication => 'database');
+ok(($result == 3) and
+   ($stderr =~ 'replication slot "test_slot" was not created in this database'),
+   "Logical decoding correctly fails to start");
+
 $node_master->safe_psql('postgres',
 	qq[INSERT INTO decoding_test(x,y) SELECT s, s::text FROM generate_series(1,10) s;]
 );
 
 # Basic decoding works
-my ($result) = $node_master->safe_psql('postgres',
+$result = $node_master->safe_psql('postgres',
 	qq[SELECT pg_logical_slot_get_changes('test_slot', NULL, NULL);]);
 is(scalar(my @foobar = split /^/m, $result),
 	12, 'Decoding produced 12 rows inc BEGIN/COMMIT');

Re: pg_stat_wal_receiver and flushedUpto/writtenUpto

2020-05-15 Thread Michael Paquier

On Fri, May 15, 2020 at 01:43:11PM -0400, Alvaro Herrera wrote:
> Why do you put the column at the end?  I would put written_lsn before
> flushed_lsn.

Fine by me.  I was thinking yesterday about putting the written
position after the flushed one, and finished with something that maps
with the structure.

> Since this requires a catversion bump, I think it'd be best to do it
> before beta1 next week.

Yes.  What do you think of the attached?
--
Michael
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 5a771e7591..ca0d88e423 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
  */
 
 /*			mmddN */
-#define CATALOG_VERSION_NO	202005121
+#define CATALOG_VERSION_NO	202005161
 
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9edae40ed8..61f2c2f5b4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5244,9 +5244,9 @@
 { oid => '3317', descr => 'statistics: information about WAL receiver',
   proname => 'pg_stat_get_wal_receiver', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
-  proallargtypes => '{int4,text,pg_lsn,int4,pg_lsn,int4,timestamptz,timestamptz,pg_lsn,timestamptz,text,text,int4,text}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,status,receive_start_lsn,receive_start_tli,received_lsn,received_tli,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time,slot_name,sender_host,sender_port,conninfo}',
+  proallargtypes => '{int4,text,pg_lsn,int4,pg_lsn,pg_lsn,int4,timestamptz,timestamptz,pg_lsn,timestamptz,text,text,int4,text}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,status,receive_start_lsn,receive_start_tli,written_lsn,flushed_lsn,received_tli,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time,slot_name,sender_host,sender_port,conninfo}',
   prosrc => 'pg_stat_get_wal_receiver' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', proisstrict => 'f', provolatile => 's',
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2bd5f5ea14..56420bbc9d 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -812,7 +812,8 @@ CREATE VIEW pg_stat_wal_receiver AS
 s.status,
 s.receive_start_lsn,
 s.receive_start_tli,
-s.received_lsn,
+s.written_lsn,
+s.flushed_lsn,
 s.received_tli,
 s.last_msg_send_time,
 s.last_msg_receipt_time,
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index d69fb90132..4f585eaa19 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1348,7 +1348,8 @@ pg_stat_get_wal_receiver(PG_FUNCTION_ARGS)
 	WalRcvState state;
 	XLogRecPtr	receive_start_lsn;
 	TimeLineID	receive_start_tli;
-	XLogRecPtr	received_lsn;
+	XLogRecPtr	flushed_lsn;
+	XLogRecPtr	written_lsn;
 	TimeLineID	received_tli;
 	TimestampTz last_send_time;
 	TimestampTz last_receipt_time;
@@ -1366,7 +1367,8 @@ pg_stat_get_wal_receiver(PG_FUNCTION_ARGS)
 	state = WalRcv->walRcvState;
 	receive_start_lsn = WalRcv->receiveStart;
 	receive_start_tli = WalRcv->receiveStartTLI;
-	received_lsn = WalRcv->flushedUpto;
+	flushed_lsn = WalRcv->flushedUpto;
+	written_lsn = pg_atomic_read_u64(&WalRcv->writtenUpto);
 	received_tli = WalRcv->receivedTLI;
 	last_send_time = WalRcv->lastMsgSendTime;
 	last_receipt_time = WalRcv->lastMsgReceiptTime;
@@ -1413,43 +1415,47 @@ pg_stat_get_wal_receiver(PG_FUNCTION_ARGS)
 		else
 			values[2] = LSNGetDatum(receive_start_lsn);
 		values[3] = Int32GetDatum(receive_start_tli);
-		if (XLogRecPtrIsInvalid(received_lsn))
+		if (XLogRecPtrIsInvalid(written_lsn))
 			nulls[4] = true;
 		else
-			values[4] = LSNGetDatum(received_lsn);
-		values[5] = Int32GetDatum(received_tli);
-		if (last_send_time == 0)
-			nulls[6] = true;
+			values[4] = LSNGetDatum(written_lsn);
+		if (XLogRecPtrIsInvalid(flushed_lsn))
+			nulls[5] = true;
 		else
-			values[6] = TimestampTzGetDatum(last_send_time);
-		if (last_receipt_time == 0)
+			values[5] = LSNGetDatum(flushed_lsn);
+		values[6] = Int32GetDatum(received_tli);
+		if (last_send_time == 0)
 			nulls[7] = true;
 		else
-			values[7] = TimestampTzGetDatum(last_receipt_time);
-		if (XLogRecPtrIsInvalid(latest_end_lsn))
+			values[7] = TimestampTzGetDatum(last_send_time);
+		if (last_receipt_time == 0)
 			nulls[8] = true;
 		else
-			values[8] = LSNGetDatum(latest_end_lsn);
-		if (latest_end_time == 0)
+			values[8] = TimestampTzGetDatum(last_receipt_time);
+		if (XLogRecPtrIsInvalid(latest_end_lsn))
 			nulls[9] = true;
 		else
-			values[9] = TimestampTzGetDatum(latest_end_time);
-		if (*slotname == '\0')
+			v

Re: calling procedures is slow and consumes extra much memory against calling function

2020-05-15 Thread Ranier Vilela

Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule 
escreveu:

> Hi
>
> I try to use procedures in Orafce package, and I did some easy performance
> tests. I found some hard problems:
>
> 1. test case
>
> create or replace procedure p1(inout r int, inout v int) as $$
> begin v := random() * r; end
> $$ language plpgsql;
>
> This command requires
>
> do $$
> declare r int default 100; x int;
> begin
>   for i in 1..30 loop
>  call p1(r, x);
>   end loop;
> end;
> $$;
>
> about 2.2GB RAM and 10 sec.
>
I am having a consistent result of 3 secs, with a modified version
(exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference in the
testing hardware?

regards,
Ranier Vilela

Re: pgindent && weirdness

2020-05-15 Thread Tom Lane

Alvaro Herrera  writes:
> On 2020-May-16, Thomas Munro wrote:
>> Here's the patch I propose to commit to pg_bsd_indent, if the repo
>> lets me, and here's the result of running it on the PG tree today.

> Looks good.  Of all these changes in PG, only two are of the STACK_OK()
> nature, and there are 38 places that get better.

It should also be noted that there are a lot of places where we've
programmed around this silliness by artificially breaking conditions
using IsA() into multiple lines.  So the "38 places" is a lowball
estimate of how much of a problem this has been.

regards, tom lane

Re: pgindent && weirdness

2020-05-15 Thread Alvaro Herrera

On 2020-May-16, Thomas Munro wrote:

> Here's the patch I propose to commit to pg_bsd_indent, if the repo
> lets me, and here's the result of running it on the PG tree today.

Looks good.  Of all these changes in PG, only two are of the STACK_OK()
nature, and there are 38 places that get better.

> I suppose the principled way to fix that problem with STACK_OF(x)
> would be to have a user-supplied list of function-like-macros that
> expand to a type name, but I'm not planning to waste time on that.

+1 on just ignoring that problem as insignificant.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgindent && weirdness

2020-05-15 Thread Tom Lane

Thomas Munro  writes:
> Here's the patch I propose to commit to pg_bsd_indent, if the repo
> lets me, and here's the result of running it on the PG tree today.

+1.  I think the repo will let you in, but if not, I can do it.

regards, tom lane

Re: pgindent && weirdness

2020-05-15 Thread Thomas Munro

On Tue, Feb 18, 2020 at 12:42 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > Another problem is that there is one thing in our tree that looks like
> > a non-cast under the new rule, but it actually expands to a type name,
> > so now we get that wrong!  (I mean, unpatched indent doesn't really
> > understand it either, it thinks it's a cast, but at least it knows the
> > following * is not a binary operator):
>
> > -   STACK_OF(X509_NAME) *root_cert_list = NULL;
> > +   STACK_OF(X509_NAME) * root_cert_list = NULL;
>
> > That's a macro from an OpenSSL header.  Not sure what to do about that.
>
> If we get that wrong, but a hundred other places look better,
> I'm not too fussed about it.

Here's the patch I propose to commit to pg_bsd_indent, if the repo
lets me, and here's the result of running it on the PG tree today.

I suppose the principled way to fix that problem with STACK_OF(x)
would be to have a user-supplied list of function-like-macros that
expand to a type name, but I'm not planning to waste time on that.
From d3c9c8aa8f785266dd4302f8ab980c9a6ce99b5b Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 16 May 2020 09:26:10 +1200
Subject: [PATCH] Fix formatting of macros that take types.

Previously, we would assume that a macro like IsA() in the following
example was a cast just because it mentions a known type between parens,
and that messed up the formatting of any binary operator that followed:

   if (IsA(outer_path, UniquePath) ||path->skip_mark_restore)

This change errs on the side of assuming that function-like macros are
similar to sizeof() and offsetof(), so that operators are formatted
correctly:

   if (IsA(outer_path, UniquePath) || path->skip_mark_restore)

Discussion: https://postgr.es/m/20200114221814.GA19630%40alvherre.pgsql
---
 indent.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/indent.c b/indent.c
index 9faf57a..29b3fa4 100644
--- a/indent.c
+++ b/indent.c
@@ -570,8 +570,13 @@ check_type:
 		ps.in_or_st = false;	/* turn off flag for structure decl or
 	 * initialization */
 	}
-	/* parenthesized type following sizeof or offsetof is not a cast */
-	if (ps.keyword == 1 || ps.keyword == 2)
+	/*
+	 * parenthesized type following sizeof or offsetof is not a cast,
+	 * and we assume the same for any other non-keyword identifier,
+	 * to support macros that take types
+	 */
+	if (ps.last_token == ident &&
+		(ps.keyword == 0 || ps.keyword == 1 || ps.keyword == 2))
 		ps.not_cast_mask |= 1 << ps.p_l_follow;
 	break;
 
-- 
2.20.1

From bd2751742a47c4796c6bb73442a6109985e03bba Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 16 May 2020 09:44:43 +1200
Subject: [PATCH] Fix formatting of IsA() and similar macros.

Fix the formatting of many places where macros and sometimes even
function calls confused pg_bsd_indent by mentioning type names in
between parentheses.  We've now patched pg_bsd_indent to be less
confused about that.

This leads to a couple of places where OpenSSL's STACK_OF() macro is not
formatted correctly, but that seems like a good trade-off.

Discussion: https://postgr.es/m/20200114221814.GA19630%40alvherre.pgsql
---
 src/backend/access/nbtree/nbtutils.c |  6 +++---
 src/backend/commands/explain.c   |  4 ++--
 src/backend/commands/tablecmds.c |  2 +-
 src/backend/commands/typecmds.c  |  2 +-
 src/backend/executor/nodeAgg.c   |  2 +-
 src/backend/executor/nodeProjectSet.c|  4 ++--
 src/backend/libpq/be-secure-openssl.c|  2 +-
 src/backend/nodes/outfuncs.c |  2 +-
 src/backend/optimizer/path/costsize.c|  2 +-
 src/backend/optimizer/plan/setrefs.c |  2 +-
 src/backend/optimizer/util/pathnode.c|  2 +-
 src/backend/parser/analyze.c |  4 ++--
 src/backend/parser/parse_agg.c   |  4 ++--
 src/backend/parser/parse_clause.c|  2 +-
 src/backend/parser/parse_expr.c  |  2 +-
 src/backend/statistics/extended_stats.c  |  4 ++--
 src/backend/utils/adt/datetime.c |  2 +-
 src/backend/utils/adt/formatting.c   |  2 +-
 src/backend/utils/adt/numeric.c  |  2 +-
 src/backend/utils/adt/ruleutils.c|  6 +++---
 src/backend/utils/adt/timestamp.c|  2 +-
 src/backend/utils/adt/varbit.c   |  2 +-
 src/backend/utils/adt/varchar.c  |  2 +-
 src/bin/pg_dump/pg_backup_directory.c|  2 +-
 src/bin/psql/tab-complete.c  |  2 +-
 src/common/jsonapi.c |  2 +-
 src/interfaces/ecpg/ecpglib/execute.c|  2 +-
 src/interfaces/ecpg/ecpglib/prepare.c|  2 +-
 src/interfaces/libpq/fe-secure-openssl.c |  2 +-
 src/pl/plpgsql/src/pl_exec.c |  4 ++--
 src/timezone/localtime.c | 10 +-
 src/timezone/strftime.c  |  2 +-
 src/timezone/zic.c   |  4 ++--
 33 files changed, 48 insertions(+), 48 deletions(-)

diff --git a/src/backend/access/nbtree/nbtutils.c b/

Re: Multiple FPI_FOR_HINT for the same block during killing btree index items

2020-05-15 Thread Alvaro Herrera

On 2020-Apr-10, Masahiko Sawada wrote:

> Okay. I think only adding the check would also help with reducing the
> likelihood. How about the changes for the current HEAD I've attached?

Pushed this to all branches.  (Branches 12 and older obviously needed an
adjustment.)  Thanks!

> Related to this behavior on btree indexes, this can happen even on
> heaps during searching heap tuples. To reduce the likelihood of that
> more generally I wonder if we can acquire a lock on buffer descriptor
> right before XLogSaveBufferForHint() and set a flag to the buffer
> descriptor that indicates that we're about to log FPI for hint bit so
> that concurrent process can be aware of that.

I'm not sure how that helps; the other process would have to go back and
redo their whole operation from scratch in order to find out whether
there's still something alive that needs killing.

I think you need to acquire the exclusive lock sooner: if, when scanning
the page, you find a killable item, *then* upgrade the lock to exclusive
and restart the scan.  This means that we'll have to wait for any other
process that's doing the scan, and they will all give up their share
lock to wait for the exclusive lock they need.  So the one that gets it
first will do all the killing, log the page, then release the lock.  At
that point the other processes will wake up and see that items have been
killed, so they will return having done nothing.

Like the attached.  I didn't verify that it works well or that it
actually improves performance ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c
index 1429ac8b63..95ba3cc401 100644
--- a/src/backend/access/nbtree/nbtutils.c
+++ b/src/backend/access/nbtree/nbtutils.c
@@ -1726,6 +1726,7 @@ _bt_killitems(IndexScanDesc scan)
 	int			numKilled = so->numKilled;
 	bool		killedsomething = false;
 	bool		droppedpin PG_USED_FOR_ASSERTS_ONLY;
+	bool		have_exclusive = false;

 	Assert(BTScanPosIsValid(so->currPos));

@@ -1767,6 +1768,7 @@ _bt_killitems(IndexScanDesc scan)
 		}
 	}

+restart:
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 	minoff = P_FIRSTDATAKEY(opaque);
 	maxoff = PageGetMaxOffsetNumber(page);
@@ -1864,6 +1866,19 @@ _bt_killitems(IndexScanDesc scan)
 			 */
 			if (killtuple && !ItemIdIsDead(iid))
 			{
+if (!have_exclusive && XLogHintBitIsNeeded())
+{
+	/*
+	 * If wal_log_hints is enabled, we only want to do this
+	 * with an exclusive lock, so exchange our lock and
+	 * restart from the top.
+	 */
+	LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+	LockBuffer(so->currPos.buf, BT_WRITE);
+	have_exclusive = true;
+	goto restart;
+}
+
 /* found the item/all posting list items */
 ItemIdMarkDead(iid);
 killedsomething = true;

Re: Potentially misleading name of libpq pass phrase hook

2020-05-15 Thread Magnus Hagander

On Thu, May 14, 2020 at 3:03 PM Daniel Gustafsson  wrote:

> Reviewing TLS changes for v13 I came across one change which I think might
> be
> better off with a library qualified name.  The libpq frontend sslpassword
> hook
> added in 4dc63552109f65 is OpenSSL specific, but it has a very generic
> name:
>
> PQsetSSLKeyPassHook(PQsslKeyPassHook_type hook);
>
> This IMO has potential for confusion if we ever commit another TLS backend,
> since the above hook wont work for any other library (except maybe OpenSSL
> derivatives like LibreSSL et.al).  The backends will always have
> differently
> named hooks, as the signatures will be different, but having one with a
> generic
> name and another with a library qualified name doesn't seem too friendly to
> anyone implementing with libpq.
>
> As a point of reference; in the backend we added a TLS init hook in commit
> 896fcdb230e72 which also is specific to OpenSSL, but the name is library
> qualified making the purpose and usecase perfectly clear:
> openssl_tls_init_hook.
>
> Since we haven't shipped this there is still time to rename, which IMO is
> the
> right way forward.  PQsslKeyPassHook__type would be one option,
> but
> perhaps there are better alternatives?
>
> Thoughts?
>
>
ISTM this should be renamed yeah -- and it should probably go on the open
item lists, and with the schedule for the beta perhaps dealt with rather
urgently?

//Magnus

[PATCH] Fix pg_dump --no-tablespaces for the custom format

2020-05-15 Thread Christopher Baines

Hey,

So I'm new to poking around in the PostgreSQL code, so this is a bit of
a shot in the dark. I'm having some problems with pg_dump, and a
database with tablespaces. A couple of the tables are not in the default
tablespace, and I want to ignore this for the dump.

Looking at the pg_dump --help, there seems to be a perfect option for
this:

  --no-tablespaces do not dump tablespace assignments

This seems to work fine when using the plain text format, but I'm using
the custom format, and that seems to break the effect of
--no-tablespaces.

Looking at the code, I think I've managed to determine a place where
this behaviour can be changed, and so I've attached a draft patch [1].

Is this an actual problem, and if so, am I anywhere near the right place
in the code in terms of addressing it?

Thanks,

Chris

1:
From 124c10e3eb9f530a42bf294c95ee14b723ca2878 Mon Sep 17 00:00:00 2001
From: Christopher Baines 
Date: Fri, 15 May 2020 20:57:26 +0100
Subject: [PATCH] Fix pg_dump --no-tablespaces for the custom format

---
 src/bin/pg_dump/pg_dump.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index f33c2463a7..ea35984a94 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -16412,7 +16412,8 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
 	 ARCHIVE_OPTS(.tag = tbinfo->dobj.name,
   .namespace = tbinfo->dobj.namespace->dobj.name,
   .tablespace = (tbinfo->relkind == RELKIND_VIEW) ?
-  NULL : tbinfo->reltablespace,
+  NULL : (dopt->outputNoTablespaces ?
+		  NULL : tbinfo->reltablespace),
   .tableam = tableam,
   .owner = tbinfo->rolname,
   .description = reltypename,
-- 
2.26.2



signature.asc
Description: PGP signature

Re: calling procedures is slow and consumes extra much memory against calling function

2020-05-15 Thread Pavel Stehule

Hi


> The problem is in plpgsql implementation of CALL statement
>>
>> In non atomic case -  case of using procedures from DO block, the
>> expression plan is not cached, and plan is generating any time. This is
>> reason why it is slow.
>>
>> Unfortunately, generated plans are not released until SPI_finish.
>> Attached patch fixed this issue.
>>
>
> But now, recursive calling doesn't work :-(. So this patch is not enough
>

Attached patch is working - all tests passed

It doesn't solve performance, and doesn't solve all memory problems, but
significantly reduce memory requirements from 5007 bytes to 439 bytes per
one CALL

Regards

Pavel


>
>
>> Regards
>>
>> Pavel
>>
>>
>>> Regards
>>>
>>> Pavel
>>>
>>>
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index aeb6c8fefc..755387aab2 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2143,6 +2143,7 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 	LocalTransactionId after_lxid;
 	volatile bool pushed_active_snap = false;
 	volatile int rc;
+	bool plan_owner = false;
 
 	/* PG_TRY to ensure we clear the plan link, if needed, on failure */
 	PG_TRY();
@@ -2162,6 +2163,16 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			 */
 			exec_prepare_plan(estate, expr, 0, estate->atomic);
 
+			/*
+			 * Not saved plan should be explicitly released. Without it, any
+			 * not recursive usage of CALL statemenst leak plan in SPI memory.
+			 * The created plan can be reused when procedure is called recursively,
+			 * and releasing plan can be done only in recursion root call, when
+			 * expression has not assigned plan. Where a plan was created, then
+			 * there plan can be released.
+			 */
+			plan_owner = true;
+
 			/*
 			 * The procedure call could end transactions, which would upset
 			 * the snapshot management in SPI_execute*, so don't let it do it.
@@ -2315,7 +2326,11 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 	PG_END_TRY();
 
 	if (expr->plan && !expr->plan->saved)
+	{
+		if (plan_owner)
+			SPI_freeplan(expr->plan);
 		expr->plan = NULL;
+	}
 
 	if (rc < 0)
 		elog(ERROR, "SPI_execute_plan_with_paramlist failed executing query \"%s\": %s",

Spawned Background Process Knows the Exit of Client Process?

2020-05-15 Thread Shichao Jin

Hi Postgres Hackers,

I am wondering is there any elegant way for self-spawned background process
(forked by us) to get notified when the regular client-connected process
exit from the current database (switch db or even terminate)?

The background is that we are integrating a thread-model based storage
engine into Postgres via foreign data wrapper. The engine is not allowed to
have multiple processes to access it. So we have to spawn a background
process to access the engine, while the client process can communicate with
the spawned process via shared memory. In order to let the engine recognize
the data type in Postgres, the spawned process has to access catalog such
as relcache, and It must connect to the target database
via BackgroundWorkerInitializeConnection to get the info. Unfortunately, it
is not possible to switch databases for background process. So it has to
get notified when client process switches db or terminate, then we can
correspondingly close the spawned process. Please advise us if there are
alternative approaches.

Best,
Shichao

Re: pg_stat_wal_receiver and flushedUpto/writtenUpto

2020-05-15 Thread Alvaro Herrera

On 2020-May-15, Michael Paquier wrote:

> As discussed in the thread that introduced d140f2f3 to rename
> receivedUpto to flushedUpto and add writtenUpto to the WAL receiver's
> shared memory information, the equivalent columns in
> pg_stat_wal_receiver have not been renamed:

> When I have implemented this system view, the idea was to keep a
> one-one mapping between the SQL interface and the shmem info even if
> we are not compatible with past versions, hence I think that before
> beta1 we had better fix that and:
> - rename received_lsn to flushed_lsn.
> - add one column for writtenUpto.

Why do you put the column at the end?  I would put written_lsn before
flushed_lsn.

Since this requires a catversion bump, I think it'd be best to do it
before beta1 next week.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add A Glossary

2020-05-15 Thread Alvaro Herrera

Applied all these suggestions, and made a few additional very small
edits, and pushed -- better to ship what we have now in beta1, but
further edits are still possible.

Other possible terms to define, including those from the tweet I linked
to and a couple more:

archive
availability
backup
composite type
common table expression
data type
domain
dump
export
fault tolerance
GUC
high availability
hot standby
LSN
restore
secondary server (?)
snapshot
transactions per second

Anybody want to try their hand at a tentative definition?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 6cdbfd1d9f2d6c7fa815aab853a51f86b3650e11 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 15 May 2020 13:04:11 -0400
Subject: [PATCH v3] Review of the glossary
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add some more terms, clarify some definitions, remove redundant terms,
move a couple of terms to keep alphabetical order.

Co-authored-by: Jürgen Purtz 
Co-authored-by: Erik Rijkers 
Co-authored-by: Laurenz Albe 
Discussion: https://postgr.es/m/7b9b469e804777ac9df4d37716db9...@xs4all.nl
---
 doc/src/sgml/acronyms.sgml |   2 +-
 doc/src/sgml/glossary.sgml | 536 ++---
 2 files changed, 316 insertions(+), 222 deletions(-)

diff --git a/doc/src/sgml/acronyms.sgml b/doc/src/sgml/acronyms.sgml
index f638665dc9..b05c065546 100644
--- a/doc/src/sgml/acronyms.sgml
+++ b/doc/src/sgml/acronyms.sgml
@@ -766,7 +766,7 @@
 XID
 
  
-  Transaction Identifier
+  Transaction identifier
  
 

diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 8c6cb6e942..a1e8a595c8 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -7,8 +7,23 @@
  
 
  
+  
+   ACID
+   
+
+ Atomicity,
+ Consistency,
+ Isolation, and
+ Durability.
+ This set of properties of database transactions is intended to
+ guarantee validity in concurrent operation and even in event of
+ errors, power failures, etc.
+ 
+   
+  
+
   
-   Aggregate Function
+   Aggregate function

 
  A function that
@@ -35,11 +50,15 @@
  to make decisions about how to execute
  queries.
 
+
+ (Don't confuse this term with the ANALYZE option
+ to the  command.)
+

   
 
   
-   Analytic Function
+   Analytic function

   
 
@@ -106,8 +125,8 @@

 
  Process of an instance
- which act on behalf of client sessions
- and handle their requests.
+ which acts on behalf of a client session
+ and handles its requests.
 
 
  (Don't confuse this term with the similar terms
@@ -118,7 +137,7 @@
   
 
   
-   Background Worker (process)
+   Background worker (process)

 
  Process within an instance,
@@ -138,10 +157,11 @@
   
 
   
-   Background Writer (process)
+   Background writer (process)

 
- A process that continuously writes dirty pages from
+ A process that writes dirty
+ data pages from
  shared memory to
  the file system.  It wakes up periodically, but works only for a short
  period in order to distribute its expensive I/O
@@ -155,6 +175,16 @@

   
 
+  
+   Bloat
+   
+
+ Space in data pages which does not contain current row versions,
+ such as unused (free) space or outdated row versions.
+
+   
+  
+
   
Cast

@@ -190,7 +220,7 @@
   
 
   
-   Check Constraint
+   Check constraint

 
  A type of constraint
@@ -208,15 +238,6 @@

   
 
-  
-   Checkpointer (process)
-   
-
- A specialized process responsible for executing checkpoints.
-
-   
-  
-
   
Checkpoint

@@ -244,6 +265,15 @@

   
 
+  
+   Checkpointer (process)
+   
+
+ A specialized process responsible for executing checkpoints.
+
+   
+  
+
   
Class (archaic)

@@ -262,27 +292,6 @@

   
 
-  
-   Cluster
-   
-
- A group of databases plus their
- global SQL objects. The
- cluster is managed by exactly one
- instance. A newly created
- Cluster will have three databases created automatically. They are
- template0, template1, and
- postgres. It is expected that an application will
- create one or more additional database aside from these three.
-
-
- (Don't confuse the PostgreSQL-specific term
- Cluster with the SQL
- command CLUSTER).
-
-   
-  
-
   
Column

@@ -363,7 +372,10 @@

 
  A restriction on the values of data allowed within a
- Table.
+ table,
+ or in attributes of a
+ 
+ domain.
 
 
  For more information, see
@@ -373,19 +385,19 @@
   
 
   
-   Data Area
+   Data area

   
 
   
-   Data Directory
+   Data directory

 
  The base directory on the filesystem of a
  server that contains all
-

Re: Transactions involving multiple postgres foreign servers, take 2

2020-05-15 Thread Muhammad Usama

On Fri, May 15, 2020 at 7:52 PM Masahiko Sawada <
masahiko.saw...@2ndquadrant.com> wrote:

> On Fri, 15 May 2020 at 19:06, Muhammad Usama  wrote:
> >
> >
> >
> > On Fri, May 15, 2020 at 9:59 AM Masahiko Sawada <
> masahiko.saw...@2ndquadrant.com> wrote:
> >>
> >> On Fri, 15 May 2020 at 13:26, Muhammad Usama  wrote:
> >> >
> >> >
> >> >
> >> > On Fri, May 15, 2020 at 7:20 AM Masahiko Sawada <
> masahiko.saw...@2ndquadrant.com> wrote:
> >> >>
> >> >> On Fri, 15 May 2020 at 03:08, Muhammad Usama 
> wrote:
> >> >> >
> >> >> >
> >> >> > Hi Sawada,
> >> >> >
> >> >> > I have just done some review and testing of the patches and have
> >> >> > a couple of comments.
> >> >>
> >> >> Thank you for reviewing!
> >> >>
> >> >> >
> >> >> > 1- IMHO the PREPARE TRANSACTION should always use 2PC even
> >> >> > when the transaction has operated on a single foreign server
> regardless
> >> >> > of foreign_twophase_commit setting, and throw an error otherwise
> when
> >> >> > 2PC is not available on any of the data-modified servers.
> >> >> >
> >> >> > For example, consider the case
> >> >> >
> >> >> > BEGIN;
> >> >> > INSERT INTO ft_2pc_1 VALUES(1);
> >> >> > PREPARE TRANSACTION 'global_x1';
> >> >> >
> >> >> > Here since we are preparing the local transaction so we should
> also prepare
> >> >> > the transaction on the foreign server even if the transaction has
> modified only
> >> >> > one foreign table.
> >> >> >
> >> >> > What do you think?
> >> >>
> >> >> Good catch and I agree with you. The transaction should fail if it
> >> >> opened a transaction on a 2pc-no-support server regardless of
> >> >> foreign_twophase_commit. And I think we should prepare a transaction
> >> >> on a foreign server even if it didn't modify any data on that.
> >> >>
> >> >> >
> >> >> > Also without this change, the above test case produces an
> assertion failure
> >> >> > with your patches.
> >> >> >
> >> >> > 2- when deciding if the two-phase commit is required or not in
> >> >> > FOREIGN_TWOPHASE_COMMIT_PREFER mode we should use
> >> >> > 2PC when we have at least one server capable of doing that.
> >> >> >
> >> >> > i.e
> >> >> >
> >> >> > For FOREIGN_TWOPHASE_COMMIT_PREFER case in
> >> >> > checkForeignTwophaseCommitRequired() function I think
> >> >> > the condition should be
> >> >> >
> >> >> > need_twophase_commit = (nserverstwophase >= 1);
> >> >> > instead of
> >> >> > need_twophase_commit = (nserverstwophase >= 2);
> >> >> >
> >> >>
> >> >> Hmm I might be missing your point but it seems to me that you want to
> >> >> use two-phase commit even in the case where a transaction modified
> >> >> data on only one server. Can't we commit distributed transaction
> >> >> atomically even using one-phase commit in that case?
> >> >>
> >> >
> >> > I think you are confusing between nserverstwophase and
> nserverswritten.
> >> >
> >> > need_twophase_commit = (nserverstwophase >= 1)  would mean
> >> > use two-phase commit if at least one server exists in the list that is
> >> > capable of doing 2PC
> >> >
> >> > For the case when the transaction modified data on only one server we
> >> > already exits the function indicating no two-phase required
> >> >
> >> > if (nserverswritten <= 1)
> >> >   return false;
> >> >
> >>
> >> Thank you for your explanation. If the transaction modified two
> >> servers that don't' support 2pc and one server that supports 2pc I
> >> think we don't want to use 2pc even in 'prefer' case. Because even if
> >> we use 2pc in that case, it's still possible to have the atomic commit
> >> problem. For example, if we failed to commit a transaction after
> >> committing other transactions on the server that doesn't support 2pc
> >> we cannot rollback the already-committed transaction.
> >
> >
> > Yes, that is true, And I think the 'prefer' mode will always have a
> corner case
> > no matter what. But the thing is we can reduce the probability of hitting
> > an atomic commit problem by ensuring to use 2PC whenever possible.
> >
> > For instance as in your example scenario where a transaction modified
> > two servers that don't support 2PC and one server that supports it. let
> us
> > analyze both scenarios.
> >
> > If we use 2PC on the server that supports it then the probability of
> hitting
> > a problem would be 1/3 = 0.33. because there is only one corner case
> > scenario in that case. which would be if we fail to commit the third
> server
> > As the first server (2PC supported one) would be using prepared
> > transactions so no problem there. The second server (NON-2PC support)
> > if failed to commit then, still no problem as we can rollback the
> prepared
> > transaction on the first server. The only issue would happen when we fail
> > to commit on the third server because we have already committed
> > on the second server and there is no way to undo that.
> >
> >
> > Now consider the other possibility if we do not use the 2PC in that
> > case (as you mentioned), then the probability of hitting the problem
> > wou

Re: documenting the backup manifest file format

2020-05-15 Thread David Steele


On 5/15/20 10:17 AM, Tom Lane wrote:

David Steele  writes:

On 5/15/20 9:34 AM, Tom Lane wrote:

I vote for following the backup_label precedent; that's stood for quite
some years now.



Of course, my actual preference is to use epoch time which is easy to
work with and eliminates the possibility of conversion errors. It is
also compact.


Well, if we did that then it'd be sufficiently different from the backup
label as to remove any risk of confusion.  But "easy to work with" is in
the eye of the beholder; do we really want a format that's basically
unreadable to the naked eye?


Well, I lost this argument before so it seems I'm in the minority on 
easy-to-use. We use epoch time in the pgBackRest manifests which has 
been easy to deal with in both C and Perl, so experience tells me it 
really is easy, at least for programs.


The manifest (to me, at least) is generally intended to be 
machine-processed. For instance, it contains checksums which are not all 
that useful unless they are checked programmatically -- they can't just 
be eye-balled.


Regards,
--
-David
da...@pgmasters.net

Re: Transactions involving multiple postgres foreign servers, take 2

2020-05-15 Thread Masahiko Sawada

On Fri, 15 May 2020 at 19:06, Muhammad Usama  wrote:
>
>
>
> On Fri, May 15, 2020 at 9:59 AM Masahiko Sawada 
>  wrote:
>>
>> On Fri, 15 May 2020 at 13:26, Muhammad Usama  wrote:
>> >
>> >
>> >
>> > On Fri, May 15, 2020 at 7:20 AM Masahiko Sawada 
>> >  wrote:
>> >>
>> >> On Fri, 15 May 2020 at 03:08, Muhammad Usama  wrote:
>> >> >
>> >> >
>> >> > Hi Sawada,
>> >> >
>> >> > I have just done some review and testing of the patches and have
>> >> > a couple of comments.
>> >>
>> >> Thank you for reviewing!
>> >>
>> >> >
>> >> > 1- IMHO the PREPARE TRANSACTION should always use 2PC even
>> >> > when the transaction has operated on a single foreign server regardless
>> >> > of foreign_twophase_commit setting, and throw an error otherwise when
>> >> > 2PC is not available on any of the data-modified servers.
>> >> >
>> >> > For example, consider the case
>> >> >
>> >> > BEGIN;
>> >> > INSERT INTO ft_2pc_1 VALUES(1);
>> >> > PREPARE TRANSACTION 'global_x1';
>> >> >
>> >> > Here since we are preparing the local transaction so we should also 
>> >> > prepare
>> >> > the transaction on the foreign server even if the transaction has 
>> >> > modified only
>> >> > one foreign table.
>> >> >
>> >> > What do you think?
>> >>
>> >> Good catch and I agree with you. The transaction should fail if it
>> >> opened a transaction on a 2pc-no-support server regardless of
>> >> foreign_twophase_commit. And I think we should prepare a transaction
>> >> on a foreign server even if it didn't modify any data on that.
>> >>
>> >> >
>> >> > Also without this change, the above test case produces an assertion 
>> >> > failure
>> >> > with your patches.
>> >> >
>> >> > 2- when deciding if the two-phase commit is required or not in
>> >> > FOREIGN_TWOPHASE_COMMIT_PREFER mode we should use
>> >> > 2PC when we have at least one server capable of doing that.
>> >> >
>> >> > i.e
>> >> >
>> >> > For FOREIGN_TWOPHASE_COMMIT_PREFER case in
>> >> > checkForeignTwophaseCommitRequired() function I think
>> >> > the condition should be
>> >> >
>> >> > need_twophase_commit = (nserverstwophase >= 1);
>> >> > instead of
>> >> > need_twophase_commit = (nserverstwophase >= 2);
>> >> >
>> >>
>> >> Hmm I might be missing your point but it seems to me that you want to
>> >> use two-phase commit even in the case where a transaction modified
>> >> data on only one server. Can't we commit distributed transaction
>> >> atomically even using one-phase commit in that case?
>> >>
>> >
>> > I think you are confusing between nserverstwophase and nserverswritten.
>> >
>> > need_twophase_commit = (nserverstwophase >= 1)  would mean
>> > use two-phase commit if at least one server exists in the list that is
>> > capable of doing 2PC
>> >
>> > For the case when the transaction modified data on only one server we
>> > already exits the function indicating no two-phase required
>> >
>> > if (nserverswritten <= 1)
>> >   return false;
>> >
>>
>> Thank you for your explanation. If the transaction modified two
>> servers that don't' support 2pc and one server that supports 2pc I
>> think we don't want to use 2pc even in 'prefer' case. Because even if
>> we use 2pc in that case, it's still possible to have the atomic commit
>> problem. For example, if we failed to commit a transaction after
>> committing other transactions on the server that doesn't support 2pc
>> we cannot rollback the already-committed transaction.
>
>
> Yes, that is true, And I think the 'prefer' mode will always have a corner 
> case
> no matter what. But the thing is we can reduce the probability of hitting
> an atomic commit problem by ensuring to use 2PC whenever possible.
>
> For instance as in your example scenario where a transaction modified
> two servers that don't support 2PC and one server that supports it. let us
> analyze both scenarios.
>
> If we use 2PC on the server that supports it then the probability of hitting
> a problem would be 1/3 = 0.33. because there is only one corner case
> scenario in that case. which would be if we fail to commit the third server
> As the first server (2PC supported one) would be using prepared
> transactions so no problem there. The second server (NON-2PC support)
> if failed to commit then, still no problem as we can rollback the prepared
> transaction on the first server. The only issue would happen when we fail
> to commit on the third server because we have already committed
> on the second server and there is no way to undo that.
>
>
> Now consider the other possibility if we do not use the 2PC in that
> case (as you mentioned), then the probability of hitting the problem
> would be 2/3 = 0.66. because now commit failure on either second or
> third server will land us in an atomic-commit-problem.
>
> So, INMO using the 2PC whenever available with 'prefer' mode
> should be the way to go.

My understanding of 'prefer' mode is that even if a distributed
transaction modified data on several types of server we can ensure to
keep data consistent among

Re: ldap tls test fails in some environments

2020-05-15 Thread Christoph Berg

Re: Tom Lane
> Somebody should get out the LDAP RFCs and decode the packet contents
> that this log helpfully provides, but I suspect that we're just looking
> at an authentication failure; there's still not much clue as to why.

The non-TLS tests work, so it's not a plain auth failure...

I'm attaching the full logs from that test, maybe someone with more
insight can compare the non-TLS with the TLS bits.

Would it help to re-run that with log_debug on the PG side?

Christoph


001_auth_node.log.gz
Description: application/gzip


regress_log_001_auth.gz
Description: application/gzip


slapd.log.gz
Description: application/gzip

Re: documenting the backup manifest file format

2020-05-15 Thread Tom Lane

David Steele  writes:
> On 5/15/20 9:34 AM, Tom Lane wrote:
>> I vote for following the backup_label precedent; that's stood for quite
>> some years now.

> Of course, my actual preference is to use epoch time which is easy to 
> work with and eliminates the possibility of conversion errors. It is 
> also compact.

Well, if we did that then it'd be sufficiently different from the backup
label as to remove any risk of confusion.  But "easy to work with" is in
the eye of the beholder; do we really want a format that's basically
unreadable to the naked eye?

regards, tom lane

Re: documenting the backup manifest file format

2020-05-15 Thread David Steele


On 5/15/20 9:34 AM, Tom Lane wrote:

Robert Haas  writes:

It's a good question. My inclination was to think that GMT would be
the clearest thing, but I also didn't realize that the result would
thus be inconsistent with backup_label. Not sure what's best here.


I vote for following the backup_label precedent; that's stood for quite
some years now.


I'd rather keep it GMT. The timestamps in the backup label are purely 
informational, but the timestamps in the manifest are useful, e.g. to 
set the mtime on a restore to the original value.


Forcing the user to do timezone conversions is prone to error. Some 
languages, like C, simply aren't good at it.


Of course, my actual preference is to use epoch time which is easy to 
work with and eliminates the possibility of conversion errors. It is 
also compact.


Regards,
--
-David
da...@pgmasters.net

Re: ldap tls test fails in some environments

2020-05-15 Thread Tom Lane

Christoph Berg  writes:
> The slapd debug log is mostly garbage to me, the error seems to be
> this:
> ldap_read: want=8 error=Resource temporarily unavailable

Hm, so EAGAIN (although that's a BSD-ish spelling of the strerror
string, which seems pretty odd in a Debian context).  I don't think
that's actually an error, it's just the daemon's data collection
logic trying to read data that isn't there.  It then goes on and
issues a response, so this must not indicate that the request is
incomplete --- it's just a useless speculative read.

Somebody should get out the LDAP RFCs and decode the packet contents
that this log helpfully provides, but I suspect that we're just looking
at an authentication failure; there's still not much clue as to why.

regards, tom lane

Re: documenting the backup manifest file format

2020-05-15 Thread Tom Lane

Robert Haas  writes:
> It's a good question. My inclination was to think that GMT would be
> the clearest thing, but I also didn't realize that the result would
> thus be inconsistent with backup_label. Not sure what's best here.

I vote for following the backup_label precedent; that's stood for quite
some years now.

regards, tom lane

Re: Parallel copy

2020-05-15 Thread Robert Haas

On Fri, May 15, 2020 at 12:19 AM Amit Kapila  wrote:
> > My sense is that it would be a lot more sensible to do it at the
> > *beginning* of the parallel operation. Once we do it once, we
> > shouldn't ever do it again; that's how it works now. Deferring it
> > until later seems much more likely to break things.
>
> AFAIU, we always increment the command counter after executing the
> command.  Why do we want to do it differently here?

Hmm, now I'm starting to think that I'm confused about what is under
discussion here. Which CommandCounterIncrement() are we talking about
here?

> First, let me clarify the CTID I have used in my email are for the
> table in which insertion is happening which means FK table.  So, in
> such a case, we can't have the same CTIDs queued for different
> workers.  Basically, we use CTID to fetch the row from FK table later
> and form a query to lock (in KEY SHARE mode) the corresponding tuple
> in PK table.  Now, it is possible that two different workers try to
> lock the same row of PK table.  I am not clear what problem group
> locking can have in this case because these are non-conflicting locks.
> Can you please elaborate a bit more?

I'm concerned about two workers trying to take the same lock at the
same time. If that's prevented by the buffer locking then I think it's
OK, but if it's prevented by a heavyweight lock then it's not going to
work in this case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: documenting the backup manifest file format

2020-05-15 Thread Robert Haas

On Fri, May 15, 2020 at 2:10 AM Fujii Masao  wrote:
> I have one question related to this; Why don't we use log_timezone,
> like backup_label? log_timezone is used for "START TIME" field in
> backup_label. Sorry if this was already discussed.
>
> /* Use the log timezone here, not the session timezone */
> stamp_time = (pg_time_t) time(NULL);
> pg_strftime(strfbuf, sizeof(strfbuf),
> "%Y-%m-%d %H:%M:%S %Z",
> pg_localtime(&stamp_time, 
> log_timezone));
>
> OTOH, *if* we want to use the same timezone for backup-related files because
> backup can be used in different environements and timezone setting
> may be different there or for other reasons, backup_label also should use
> GMT or something for the sake of consistency?

It's a good question. My inclination was to think that GMT would be
the clearest thing, but I also didn't realize that the result would
thus be inconsistent with backup_label. Not sure what's best here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PG 13 release notes, first draft

2020-05-15 Thread Bruce Momjian

On Fri, May 15, 2020 at 09:54:55PM +0900, Fujii Masao wrote:
> > Actually, it should be:
> > 
> > 
> > 
> > because we are using the text from the link.
> 
> Yes, this works.
> 
> > See
> > doc/src/sgml/README.links for details on xref links.  Release notes
> > updated.
> 
> Thanks!
> 
> >   Odd I got no warning for this on 'make check'.
> 
> I'm not sure why, but btw I got the message when I compiled the document on 
> Mac.

I don't think I looked at the HTML build output, only the check one, so
that might be the cause.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PG 13 release notes, first draft

2020-05-15 Thread Fujii Masao





On 2020/05/15 21:29, Bruce Momjian wrote:

On Fri, May 15, 2020 at 03:55:19PM +0900, Fujii Masao wrote:



On 2020/05/05 12:16, Bruce Momjian wrote:

I have committed the first draft of the PG 13 release notes.  You can
see them here:

https://momjian.us/pgsql_docs/release-13.html

It still needs markup, word wrap, and indenting.  The community doc
build should happen in a few hours.


Many thanks for working on this!

When I did "make html", I got the following message.

 Link element has no content and no Endterm. Nothing to show in the link to 
sepgsql

"Allow  to control access to the" in release-13.sgml
seems to have caused this. Also I found it's converted into "Allow ??? to
  control access to the", i.e., ??? was used.

- Allow  to control access to the
+ Allow sepgsql to control access to the

Shouldn't we change that as the above?


Actually, it should be:



because we are using the text from the link.


Yes, this works.


See
doc/src/sgml/README.links for details on xref links.  Release notes
updated.


Thanks!

  Odd I got no warning for this on 'make check'. 


I'm not sure why, but btw I got the message when I compiled the document on Mac.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Index Skip Scan

2020-05-15 Thread Dmitry Dolgov

> On Wed, May 13, 2020 at 02:37:21PM +0530, Dilip Kumar wrote:
>
> + if (_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir))
> + {
>
> Here we expect whether the "next" unique key can be found on this page
> or not, but we are using the function which suggested whether the
> "current" key can be found on this page or not.  I think in boundary
> cases where the high key is equal to the current key, this function
> will return true (which is expected from this function), and based on
> that we will simply scan the current page and IMHO that cost could be
> avoided no?

Yes, looks like you're right, there is indeed an unecessary extra scan
happening. To avoid that we can see the key->nextkey and adjust higher
boundary correspondingly. Will also add this into the next rebased
patch, thanks!

Re: PG 13 release notes, first draft

2020-05-15 Thread Bruce Momjian

On Fri, May 15, 2020 at 03:55:19PM +0900, Fujii Masao wrote:
> 
> 
> On 2020/05/05 12:16, Bruce Momjian wrote:
> > I have committed the first draft of the PG 13 release notes.  You can
> > see them here:
> > 
> > https://momjian.us/pgsql_docs/release-13.html
> > 
> > It still needs markup, word wrap, and indenting.  The community doc
> > build should happen in a few hours.
> 
> Many thanks for working on this!
> 
> When I did "make html", I got the following message.
> 
> Link element has no content and no Endterm. Nothing to show in the link 
> to sepgsql
> 
> "Allow  to control access to the" in release-13.sgml
> seems to have caused this. Also I found it's converted into "Allow ??? to
>  control access to the", i.e., ??? was used.
> 
> - Allow  to control access to the
> + Allow sepgsql to control access to the
> 
> Shouldn't we change that as the above?

Actually, it should be:



because we are using the text from the link.  See
doc/src/sgml/README.links for details on xref links.  Release notes
updated.   Odd I got no warning for this on 'make check'.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: ldap tls test fails in some environments

2020-05-15 Thread Christoph Berg

> I'll see if I can catch a shell in the environment where it fails.

It failed right away when I tried on the buildd machine:

The slapd debug log is mostly garbage to me, the error seems to be
this:
ldap_read: want=8 error=Resource temporarily unavailable


src/test/ldap/t/001_auth.pl:

system_or_bail "sh", "-c", "$slapd -f $slapd_conf -h '$ldap_url $ldaps_url' -d 
255 &";

END
{
kill 'INT', `cat $slapd_pidfile` if -f $slapd_pidfile;
}


tmp_check/log/001_auth_node.log:

2020-05-15 14:06:18.915 CEST [30486] [unknown] LOG:  could not start LDAP TLS 
session: Connect error
2020-05-15 14:06:18.916 CEST [30486] [unknown] FATAL:  LDAP authentication 
failed for user "test1"
2020-05-15 14:06:18.916 CEST [30486] [unknown] DETAIL:  Connection matched 
pg_hba.conf line 1: "local all all ldap ldapserver=localhost ldapport=65510 
ldapbasedn="dc=example,dc=net" ldapsearchfilter="(uid=$username)" ldaptls=1"


tmp_check/log/regress_log_001_auth:

# TLS
### Restarting node "node"
# Running: pg_ctl -D 
/home/myon/postgresql-13-13~~devel~20200515.0434/build/src/test/ldap/tmp_ch
eck/t_001_auth_node_data/pgdata -l 
/home/myon/postgresql-13-13~~devel~20200515.0434/build/src/te
st/ldap/tmp_check/log/001_auth_node.log restart
waiting for server to shut down done
server stopped
waiting for server to start done
server started
# Postmaster PID for node "node" is 30477
5ebe85ba daemon: activity on 1 descriptor
5ebe85ba daemon: activity on:
5ebe85ba slap_listener_activate(6):
5ebe85ba daemon: epoll: listen=6 busy
5ebe85ba daemon: epoll: listen=7 active_threads=0 tvp=NULL
5ebe85ba daemon: epoll: listen=8 active_threads=0 tvp=NULL
5ebe85ba >>> slap_listener(ldap://localhost:65510)
5ebe85ba daemon: epoll: listen=9 active_threads=0 tvp=NULL
5ebe85ba daemon: accept() = 10
5ebe85ba daemon: listen=6, new connection on 10
5ebe85ba daemon: activity on 1 descriptor
5ebe85ba daemon: activity on:
5ebe85ba daemon: epoll: listen=6 active_threads=0 tvp=NULL
5ebe85ba daemon: epoll: listen=7 active_threads=0 tvp=NULL
5ebe85ba daemon: epoll: listen=8 active_threads=0 tvp=NULL
5ebe85ba daemon: epoll: listen=9 active_threads=0 tvp=NULL
5ebe85ba daemon: added 10r (active) listener=(nil)
5ebe85ba daemon: activity on 1 descriptor
5ebe85ba daemon: activity on: 10r
5ebe85ba daemon: read active on 10
5ebe85ba daemon: epoll: listen=6 active_threads=0 tvp=NULL
5ebe85ba connection_get(10)
5ebe85ba connection_get(10): got connid=1033
5ebe85ba daemon: epoll: listen=7 active_threads=0 tvp=NULL
5ebe85ba daemon: epoll: listen=8 active_threads=0 tvp=NULL
5ebe85ba daemon: epoll: listen=9 active_threads=0 tvp=NULL
5ebe85ba daemon: activity on 1 descriptor
5ebe85ba connection_read(10): checking for input on id=1033
ber_get_next
5ebe85ba daemon: activity on:
5ebe85ba daemon: epoll: listen=6 active_threads=0 tvp=NULL
ldap_read: want=8, got=8
  :  30 1d 02 01 01 77 18 800w..
ldap_read: want=23, got=23
5ebe85ba daemon: epoll: listen=7 active_threads=0 tvp=NULL
  :  16 31 2e 33 2e 36 2e 31  2e 34 2e 31 2e 31 34 36   .1.3.6.1.4.1.146
5ebe85ba daemon: epoll: listen=8 active_threads=0 tvp=NULL
5ebe85ba daemon: epoll: listen=9 active_threads=0 tvp=NULL
  0010:  36 2e 32 30 30 33 37   6.20037
ber_get_next: tag 0x30 len 29 contents:
ber_dump: buf=0x7fa8ec107910 ptr=0x7fa8ec107910 end=0x7fa8ec10792d len=29
  :  02 01 01 77 18 80 16 31  2e 33 2e 36 2e 31 2e 34   ...w...1.3.6.1.4
  0010:  2e 31 2e 31 34 36 36 2e  32 30 30 33 37.1.1466.20037
5ebe85ba op tag 0x77, time 1589544378
ber_get_next
ldap_read: want=8 error=Resource temporarily unavailable
5ebe85ba conn=1033 op=0 do_extended
ber_scanf fmt ({m) ber:
5ebe85ba daemon: activity on 1 descriptor
5ebe85ba daemon: activity on:ber_dump: buf=0x7fa8ec107910 ptr=0x7fa8ec107913 
end=0x7fa8ec10792d len=26
  :  77 18 80 16 31 2e 33 2e  36 2e 31 2e 34 2e 31 2e   w...1.3.6.1.4.1.
  0010:  31 34 36 36 2e 32 30 30  33 37 1466.20037

5ebe85ba daemon: epoll: listen=6 active_threads=0 tvp=NULL
5ebe85ba daemon: epoll: listen=7 active_threads=0 tvp=NULL
5ebe85ba do_extended: oid=1.3.6.1.4.1.1466.20037
5ebe85ba daemon: epoll: listen=8 active_threads=0 tvp=NULL
5ebe85ba daemon: epoll: listen=9 active_threads=0 tvp=NULL
5ebe85ba send_ldap_extended: err=0 oid= len=0
5ebe85ba send_ldap_response: msgid=1 tag=120 err=0
ber_flush2: 14 bytes to sd 10
  :  30 0c 02 01 01 78 07 0a  01 00 04 00 04 00 0x
ldap_write: want=14, written=14
  :  30 0c 02 01 01 78 07 0a  01 00 04 00 04 00 0x
5ebe85ba daemon: activity on 1 descriptor
5ebe85ba daemon: activity on: 10r
5ebe85ba daemon: read active on 10
5ebe85ba daemon: epoll: listen=6 active_threads=0 tvp=NULL
5ebe85ba daemon: epoll: listen=7 active_threads=0 tvp=NULL
psql:5ebe85ba connection_get(10)
 error: could not connect to server: FATAL:  LDAP authentication failed for 
user "test1"
5ebe85ba daemon: epoll: listen=8 active_threads=0 tvp=NULL
5eb

Re: pg_bsd_indent and -Wimplicit-fallthrough

2020-05-15 Thread Julien Rouhaud

On Fri, May 15, 2020 at 9:17 AM Daniel Gustafsson  wrote:
>
> > On 15 May 2020, at 08:28, Julien Rouhaud  wrote:
> > On Fri, May 15, 2020 at 8:03 AM Michael Paquier  wrote:
>
> >> Something like the attached is fine to take care of those warnings,
> >> but what's our current patching policy for this tool?
> >
> > The patch looks good to me.  It looks like we already have custom
> > patches, so +1 to applying it.
>
> Shouldn't we try and propose it to upstream first to minimize our diff?

Good point, adding Piotr.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-05-15 Thread Dilip Kumar

On Fri, May 15, 2020 at 4:35 PM Amit Kapila  wrote:
>
> On Fri, May 15, 2020 at 4:20 PM Dilip Kumar  wrote:
> >
> > On Fri, May 15, 2020 at 4:04 PM Amit Kapila  wrote:
> > >
> >
> > >
> > > > - In the example we can not show a real example, because of the
> > > > in-progress transaction to show the changes, we might have to
> > > > implement a lot of tuple.  I think we can show the partial output?
> > > >
> > >
> > > I think we can display what API will actually display, what is the
> > > confusion here.
> >
> > What, I meant is that even with the logical_decoding_work_mem=64kb, we
> > need to have quite a few changes in a transaction to stream it so the
> > example output will be quite big in size.  So I told we might not show
> > the real example instead we will just show a few lines and cut the
> > remaining.  But, I got your point we can just show how it will look
> > like.
> >
>
> Right.
>
> > >
> > > I have a few more comments on the previous version of patch
> > > v20-0005-Implement-streaming-mode-in-ReorderBuffer.  If you have fixed
> > > any, then leave those and fix others.
> > >
> > > Review comments:
> > > --
> > > 1.
> > > @@ -1762,10 +1952,16 @@ ReorderBufferCommit(ReorderBuffer *rb,
> > > TransactionId xid,
> > >   }
> > >
> > >   case REORDER_BUFFER_CHANGE_MESSAGE:
> > > - rb->message(rb, txn, change->lsn, true,
> > > - change->data.msg.prefix,
> > > - change->data.msg.message_size,
> > > - change->data.msg.message);
> > > + if (streaming)
> > > + rb->stream_message(rb, txn, change->lsn, true,
> > > +change->data.msg.prefix,
> > > +change->data.msg.message_size,
> > > +change->data.msg.message);
> > > + else
> > > + rb->message(rb, txn, change->lsn, true,
> > > +change->data.msg.prefix,
> > > +change->data.msg.message_size,
> > > +change->data.msg.message);
> > >
> > > Don't we need to set any_data_sent flag while streaming messages as we
> > > do for other types of changes?
> >
> > Actually, pgoutput plugin don't send any data on stream_message.  But,
> > I agree that how other plugin will handle.  I will analyze this part
> > again, maybe we have to such flag at the plugin level and whether stop
> > is sent to not can also be handled at the plugin level.
> >
>
> Okay, lets discuss this after your analysis.
>
> > > 2.
> > > + if (streaming)
> > > + {
> > > + /*
> > > + * Set the last of the stream as the final lsn before calling
> > > + * stream stop.
> > > + */
> > > + if (!XLogRecPtrIsInvalid(prev_lsn))
> > > + txn->final_lsn = prev_lsn;
> > > + rb->stream_stop(rb, txn);
> > > + }
> > >
> > > I am not sure if it is good to use final_lsn for this purpose.  See
> > > comments for this variable in reorderbuffer.h.  Basically, it is used
> > > for a specific purpose on different occasions.  Now, if we want to
> > > start using it for a new purpose, we need to study its interaction
> > > with all other places and update the comments as well.  Can we pass an
> > > additional parameter to stream_stop() instead?
> >
> > I think it was in sycn with the spill code right? I mean the last
> > change we spill is set as the final_lsn and same is done here.
> >
>
> But we use final_lsn in ReorderBufferRestoreCleanup() for serialized
> changes.  Now, in some case if we first do serialization, then perform
> streaming and then tried to call ReorderBufferRestoreCleanup(),it
> might not work as intended.  Now, this might not happen today but I
> don't think we have any protection to avoid that.

If streaming is complete then we will remove the serialize flag so it
will not cause any issue.  However, we can avoid setting final_lsn
here and pass some parameters to the stream_stop about the last lsn of
the stream.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Problem with logical replication

2020-05-15 Thread Euler Taveira

On Fri, 15 May 2020 at 02:47, Michael Paquier  wrote:

>
> Agreed.  I don't think either that we need to update this comment.  I
> was playing with this patch and what you have here looks fine by me.
> Two nits: the extra parenthesis in the assert are not necessary, and
> the indentation had some diffs.  Tom has just reindented the whole
> tree, so let's keep things clean.
>
>
LGTM.


-- 
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg13: xlogreader API adjust

2020-05-15 Thread Michael Paquier

On Thu, May 14, 2020 at 02:12:25PM +0900, Kyotaro Horiguchi wrote:
> Good catch!  That's not only for CreateDecodingContet. That happens
> everywhere in the query loop in PostgresMain() until logreader is
> initialized.  So that also happens, for example, by starting logical
> replication using invalidated slot. Checking xlogreader != NULL in
> WalSndErrorCleanup is sufficient.  It doesn't make actual difference,
> but the attached explicitly initialize the pointer with NULL.

Alvaro, are you planning to look at that?  Should we have an open item
for this matter?
--
Michael


signature.asc
Description: PGP signature

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-05-15 Thread Amit Kapila

On Fri, May 15, 2020 at 4:20 PM Dilip Kumar  wrote:
>
> On Fri, May 15, 2020 at 4:04 PM Amit Kapila  wrote:
> >
>
> >
> > > - In the example we can not show a real example, because of the
> > > in-progress transaction to show the changes, we might have to
> > > implement a lot of tuple.  I think we can show the partial output?
> > >
> >
> > I think we can display what API will actually display, what is the
> > confusion here.
>
> What, I meant is that even with the logical_decoding_work_mem=64kb, we
> need to have quite a few changes in a transaction to stream it so the
> example output will be quite big in size.  So I told we might not show
> the real example instead we will just show a few lines and cut the
> remaining.  But, I got your point we can just show how it will look
> like.
>

Right.

> >
> > I have a few more comments on the previous version of patch
> > v20-0005-Implement-streaming-mode-in-ReorderBuffer.  If you have fixed
> > any, then leave those and fix others.
> >
> > Review comments:
> > --
> > 1.
> > @@ -1762,10 +1952,16 @@ ReorderBufferCommit(ReorderBuffer *rb,
> > TransactionId xid,
> >   }
> >
> >   case REORDER_BUFFER_CHANGE_MESSAGE:
> > - rb->message(rb, txn, change->lsn, true,
> > - change->data.msg.prefix,
> > - change->data.msg.message_size,
> > - change->data.msg.message);
> > + if (streaming)
> > + rb->stream_message(rb, txn, change->lsn, true,
> > +change->data.msg.prefix,
> > +change->data.msg.message_size,
> > +change->data.msg.message);
> > + else
> > + rb->message(rb, txn, change->lsn, true,
> > +change->data.msg.prefix,
> > +change->data.msg.message_size,
> > +change->data.msg.message);
> >
> > Don't we need to set any_data_sent flag while streaming messages as we
> > do for other types of changes?
>
> Actually, pgoutput plugin don't send any data on stream_message.  But,
> I agree that how other plugin will handle.  I will analyze this part
> again, maybe we have to such flag at the plugin level and whether stop
> is sent to not can also be handled at the plugin level.
>

Okay, lets discuss this after your analysis.

> > 2.
> > + if (streaming)
> > + {
> > + /*
> > + * Set the last of the stream as the final lsn before calling
> > + * stream stop.
> > + */
> > + if (!XLogRecPtrIsInvalid(prev_lsn))
> > + txn->final_lsn = prev_lsn;
> > + rb->stream_stop(rb, txn);
> > + }
> >
> > I am not sure if it is good to use final_lsn for this purpose.  See
> > comments for this variable in reorderbuffer.h.  Basically, it is used
> > for a specific purpose on different occasions.  Now, if we want to
> > start using it for a new purpose, we need to study its interaction
> > with all other places and update the comments as well.  Can we pass an
> > additional parameter to stream_stop() instead?
>
> I think it was in sycn with the spill code right? I mean the last
> change we spill is set as the final_lsn and same is done here.
>

But we use final_lsn in ReorderBufferRestoreCleanup() for serialized
changes.  Now, in some case if we first do serialization, then perform
streaming and then tried to call ReorderBufferRestoreCleanup(), it
might not work as intended.  Now, this might not happen today but I
don't think we have any protection to avoid that.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-05-15 Thread Dilip Kumar

On Fri, May 15, 2020 at 4:04 PM Amit Kapila  wrote:
>
> On Fri, May 15, 2020 at 2:47 PM Dilip Kumar  wrote:
> >
> > > 6. I think it will be good if we can provide an example of streaming
> > > changes via test_decoding at
> > > https://www.postgresql.org/docs/devel/test-decoding.html. I think we
> > > can also explain there why the user is not expected to see the actual
> > > data in the stream.
> >
> > I have a few problems to solve here.
> > -  With streaming transaction also shall we show the actual values or
> > we shall do like it is currently in the patch
> > (appendStringInfo(ctx->out, "streaming change for TXN %u",
> > txn->xid);).  I think we should show the actual values instead of what
> > we are doing now.
> >
>
> I think why we don't want to display the tuple at this stage is
> because it is not clear by this time if the transaction will commit or
> abort.  I am not sure if displaying the contents of aborted
> transactions is a good idea but if there is a reason for doing so, we
> can do it later as well.

Ok.

>
> > - In the example we can not show a real example, because of the
> > in-progress transaction to show the changes, we might have to
> > implement a lot of tuple.  I think we can show the partial output?
> >
>
> I think we can display what API will actually display, what is the
> confusion here.

What, I meant is that even with the logical_decoding_work_mem=64kb, we
need to have quite a few changes in a transaction to stream it so the
example output will be quite big in size.  So I told we might not show
the real example instead we will just show a few lines and cut the
remaining.  But, I got your point we can just show how it will look
like.

>
> I have a few more comments on the previous version of patch
> v20-0005-Implement-streaming-mode-in-ReorderBuffer.  If you have fixed
> any, then leave those and fix others.
>
> Review comments:
> --
> 1.
> @@ -1762,10 +1952,16 @@ ReorderBufferCommit(ReorderBuffer *rb,
> TransactionId xid,
>   }
>
>   case REORDER_BUFFER_CHANGE_MESSAGE:
> - rb->message(rb, txn, change->lsn, true,
> - change->data.msg.prefix,
> - change->data.msg.message_size,
> - change->data.msg.message);
> + if (streaming)
> + rb->stream_message(rb, txn, change->lsn, true,
> +change->data.msg.prefix,
> +change->data.msg.message_size,
> +change->data.msg.message);
> + else
> + rb->message(rb, txn, change->lsn, true,
> +change->data.msg.prefix,
> +change->data.msg.message_size,
> +change->data.msg.message);
>
> Don't we need to set any_data_sent flag while streaming messages as we
> do for other types of changes?

Actually, pgoutput plugin don't send any data on stream_message.  But,
I agree that how other plugin will handle.  I will analyze this part
again, maybe we have to such flag at the plugin level and whether stop
is sent to not can also be handled at the plugin level.

> 2.
> + if (streaming)
> + {
> + /*
> + * Set the last of the stream as the final lsn before calling
> + * stream stop.
> + */
> + if (!XLogRecPtrIsInvalid(prev_lsn))
> + txn->final_lsn = prev_lsn;
> + rb->stream_stop(rb, txn);
> + }
>
> I am not sure if it is good to use final_lsn for this purpose.  See
> comments for this variable in reorderbuffer.h.  Basically, it is used
> for a specific purpose on different occasions.  Now, if we want to
> start using it for a new purpose, we need to study its interaction
> with all other places and update the comments as well.  Can we pass an
> additional parameter to stream_stop() instead?

I think it was in sycn with the spill code right? I mean the last
change we spill is set as the final_lsn and same is done here.

Other comments looks fine so I will work on them and reply separatly.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-05-15 Thread Amit Kapila

On Fri, May 15, 2020 at 2:47 PM Dilip Kumar  wrote:
>
> > 6. I think it will be good if we can provide an example of streaming
> > changes via test_decoding at
> > https://www.postgresql.org/docs/devel/test-decoding.html. I think we
> > can also explain there why the user is not expected to see the actual
> > data in the stream.
>
> I have a few problems to solve here.
> -  With streaming transaction also shall we show the actual values or
> we shall do like it is currently in the patch
> (appendStringInfo(ctx->out, "streaming change for TXN %u",
> txn->xid);).  I think we should show the actual values instead of what
> we are doing now.
>

I think why we don't want to display the tuple at this stage is
because it is not clear by this time if the transaction will commit or
abort.  I am not sure if displaying the contents of aborted
transactions is a good idea but if there is a reason for doing so, we
can do it later as well.

> - In the example we can not show a real example, because of the
> in-progress transaction to show the changes, we might have to
> implement a lot of tuple.  I think we can show the partial output?
>

I think we can display what API will actually display, what is the
confusion here.

I have a few more comments on the previous version of patch
v20-0005-Implement-streaming-mode-in-ReorderBuffer.  If you have fixed
any, then leave those and fix others.

Review comments:
--
1.
@@ -1762,10 +1952,16 @@ ReorderBufferCommit(ReorderBuffer *rb,
TransactionId xid,
  }

  case REORDER_BUFFER_CHANGE_MESSAGE:
- rb->message(rb, txn, change->lsn, true,
- change->data.msg.prefix,
- change->data.msg.message_size,
- change->data.msg.message);
+ if (streaming)
+ rb->stream_message(rb, txn, change->lsn, true,
+change->data.msg.prefix,
+change->data.msg.message_size,
+change->data.msg.message);
+ else
+ rb->message(rb, txn, change->lsn, true,
+change->data.msg.prefix,
+change->data.msg.message_size,
+change->data.msg.message);

Don't we need to set any_data_sent flag while streaming messages as we
do for other types of changes?

2.
+ if (streaming)
+ {
+ /*
+ * Set the last of the stream as the final lsn before calling
+ * stream stop.
+ */
+ if (!XLogRecPtrIsInvalid(prev_lsn))
+ txn->final_lsn = prev_lsn;
+ rb->stream_stop(rb, txn);
+ }

I am not sure if it is good to use final_lsn for this purpose.  See
comments for this variable in reorderbuffer.h.  Basically, it is used
for a specific purpose on different occasions.  Now, if we want to
start using it for a new purpose, we need to study its interaction
with all other places and update the comments as well.  Can we pass an
additional parameter to stream_stop() instead?

3.
+ /* remember the command ID and snapshot for the streaming run */
+ txn->command_id = command_id;
+
+ /* Avoid copying if it's already copied. */
+ if (snapshot_now->copied)
+ txn->snapshot_now = snapshot_now;
+ else
+ txn->snapshot_now = ReorderBufferCopySnap(rb, snapshot_now,
+   txn, command_id);

This code is used at two different places, can we try to keep this in
a single function.

4.
In ReorderBufferProcessTXN(), the patch is calling stream_stop in both
the try and catch block.  If there is an error after calling it in a
try block, we might call it again via catch.  I think that will lead
to sending a stop message twice.  Won't that be a problem?  See the
usage of iterstate in the catch block, we have made it safe from a
similar problem.

5.
+ if (streaming)
+ {
+ /* Discard the changes that we just streamed. */
+ ReorderBufferTruncateTXN(rb, txn);

- PG_RE_THROW();
+ /* Re-throw only if it's not an abort. */
+ if (errdata->sqlerrcode != ERRCODE_TRANSACTION_ROLLBACK)
+ {
+ MemoryContextSwitchTo(ecxt);
+ PG_RE_THROW();
+ }
+ else
+ {
+ FlushErrorState();
+ FreeErrorData(errdata);
+ errdata = NULL;
+

I think here we can write few comments on why we are doing error-code
specific handling, basically, explain a bit about concurrent abort
handling and or refer to the part of comments where it is explained.

6.
PG_CATCH();
  {
+ MemoryContext ecxt = MemoryContextSwitchTo(ccxt);
+ ErrorData  *errdata = CopyErrorData();

I don't understand the usage of memory context in this part of the
code.  Basically, you are switching to CurrentMemoryContext here, do
some error handling and then again reset back to some random context
before rethrowing the error.  If there is some purpose for it, then it
might be better if you can write a few comments to explain the same.

7.
+ReorderBufferCommit()
{
..
+ /*
+ * If the transaction was (partially) streamed, we need to commit it in a
+ * 'streamed' way. That is, we first stream the remaining part of the
+ * transaction, and then invoke stream_commit message.
+ *
+ * XXX Called after everything (origin ID and LSN, ...) is stored in the
+ * transaction, so we don't pass that directly.
+ *
+ * XXX Somewhat hackish redirection, perhaps needs to be refactored?
+ */
+ if (rbtxn_is_streamed(txn))
+ {
+ R

Re: Transactions involving multiple postgres foreign servers, take 2

2020-05-15 Thread Muhammad Usama

On Fri, May 15, 2020 at 9:59 AM Masahiko Sawada <
masahiko.saw...@2ndquadrant.com> wrote:

> On Fri, 15 May 2020 at 13:26, Muhammad Usama  wrote:
> >
> >
> >
> > On Fri, May 15, 2020 at 7:20 AM Masahiko Sawada <
> masahiko.saw...@2ndquadrant.com> wrote:
> >>
> >> On Fri, 15 May 2020 at 03:08, Muhammad Usama  wrote:
> >> >
> >> >
> >> > Hi Sawada,
> >> >
> >> > I have just done some review and testing of the patches and have
> >> > a couple of comments.
> >>
> >> Thank you for reviewing!
> >>
> >> >
> >> > 1- IMHO the PREPARE TRANSACTION should always use 2PC even
> >> > when the transaction has operated on a single foreign server
> regardless
> >> > of foreign_twophase_commit setting, and throw an error otherwise when
> >> > 2PC is not available on any of the data-modified servers.
> >> >
> >> > For example, consider the case
> >> >
> >> > BEGIN;
> >> > INSERT INTO ft_2pc_1 VALUES(1);
> >> > PREPARE TRANSACTION 'global_x1';
> >> >
> >> > Here since we are preparing the local transaction so we should also
> prepare
> >> > the transaction on the foreign server even if the transaction has
> modified only
> >> > one foreign table.
> >> >
> >> > What do you think?
> >>
> >> Good catch and I agree with you. The transaction should fail if it
> >> opened a transaction on a 2pc-no-support server regardless of
> >> foreign_twophase_commit. And I think we should prepare a transaction
> >> on a foreign server even if it didn't modify any data on that.
> >>
> >> >
> >> > Also without this change, the above test case produces an assertion
> failure
> >> > with your patches.
> >> >
> >> > 2- when deciding if the two-phase commit is required or not in
> >> > FOREIGN_TWOPHASE_COMMIT_PREFER mode we should use
> >> > 2PC when we have at least one server capable of doing that.
> >> >
> >> > i.e
> >> >
> >> > For FOREIGN_TWOPHASE_COMMIT_PREFER case in
> >> > checkForeignTwophaseCommitRequired() function I think
> >> > the condition should be
> >> >
> >> > need_twophase_commit = (nserverstwophase >= 1);
> >> > instead of
> >> > need_twophase_commit = (nserverstwophase >= 2);
> >> >
> >>
> >> Hmm I might be missing your point but it seems to me that you want to
> >> use two-phase commit even in the case where a transaction modified
> >> data on only one server. Can't we commit distributed transaction
> >> atomically even using one-phase commit in that case?
> >>
> >
> > I think you are confusing between nserverstwophase and nserverswritten.
> >
> > need_twophase_commit = (nserverstwophase >= 1)  would mean
> > use two-phase commit if at least one server exists in the list that is
> > capable of doing 2PC
> >
> > For the case when the transaction modified data on only one server we
> > already exits the function indicating no two-phase required
> >
> > if (nserverswritten <= 1)
> >   return false;
> >
>
> Thank you for your explanation. If the transaction modified two
> servers that don't' support 2pc and one server that supports 2pc I
> think we don't want to use 2pc even in 'prefer' case. Because even if
> we use 2pc in that case, it's still possible to have the atomic commit
> problem. For example, if we failed to commit a transaction after
> committing other transactions on the server that doesn't support 2pc
> we cannot rollback the already-committed transaction.
>

Yes, that is true, And I think the 'prefer' mode will always have a corner
case
no matter what. But the thing is we can reduce the probability of hitting
an atomic commit problem by ensuring to use 2PC whenever possible.

For instance as in your example scenario where a transaction modified
two servers that don't support 2PC and one server that supports it. let us
analyze both scenarios.

If we use 2PC on the server that supports it then the probability of hitting
a problem would be 1/3 = 0.33. because there is only one corner case
scenario in that case. which would be if we fail to commit the third server
As the first server (2PC supported one) would be using prepared
transactions so no problem there. The second server (NON-2PC support)
if failed to commit then, still no problem as we can rollback the prepared
transaction on the first server. The only issue would happen when we fail
to commit on the third server because we have already committed
on the second server and there is no way to undo that.


Now consider the other possibility if we do not use the 2PC in that
case (as you mentioned), then the probability of hitting the problem
would be 2/3 = 0.66. because now commit failure on either second or
third server will land us in an atomic-commit-problem.

So, INMO using the 2PC whenever available with 'prefer' mode
should be the way to go.


> On the other hand, in 'prefer' case, if the transaction also modified
> the local data, we need to use 2pc even if it modified data on only
> one foreign server that supports 2pc. But the current code doesn't
> work fine in that case for now. Probably we also need the following
> change:
>
> @@ -540,7 +540,10 @@

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-05-15 Thread Dilip Kumar

On Wed, May 13, 2020 at 4:50 PM Amit Kapila  wrote:
>
> On Wed, May 13, 2020 at 11:35 AM Dilip Kumar  wrote:
> >
> > On Tue, May 12, 2020 at 4:39 PM Amit Kapila  wrote:
> > >
> > >
> > > v20-0003-Extend-the-output-plugin-API-with-stream-methods
> > > 
> > > 1.
> > > +static void
> > > +pg_decode_stream_change(LogicalDecodingContext *ctx,
> > > + ReorderBufferTXN *txn,
> > > + Relation relation,
> > > + ReorderBufferChange *change)
> > > +{
> > > + OutputPluginPrepareWrite(ctx, true);
> > > + appendStringInfo(ctx->out, "streaming change for TXN %u", txn->xid);
> > > + OutputPluginWrite(ctx, true);
> > > +}
> > > +
> > > +static void
> > > +pg_decode_stream_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN 
> > > *txn,
> > > +   int nrelations, Relation relations[],
> > > +   ReorderBufferChange *change)
> > > +{
> > > + OutputPluginPrepareWrite(ctx, true);
> > > + appendStringInfo(ctx->out, "streaming truncate for TXN %u", txn->xid);
> > > + OutputPluginWrite(ctx, true);
> > > +}
> > >
> > > In the above and similar APIs, there are parameters like relation
> > > which are not used.  I think you should add some comments atop these
> > > APIs to explain why it is so? I guess it is because we want to keep
> > > them similar to non-stream version of APIs and we can't display
> > > relation or other information as the transaction is still in-progress.
> >
> > I think because the interfaces are designed that way because other
> > decoding plugins might need it e.g. in pgoutput we need change and
> > relation but not here.  We have other similar examples also e.g.
> > pg_decode_message has the parameter txn but not used.  Do you think we
> > still need to add comments?
> >
>
> In that case, we can leave but lets ensure that we are not exposing
> any parameter which is not used and if there is any due to some
> reason, we should document it. I will also look into this.

Ok

> > > 4.
> > > +static void
> > > +stream_start_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn)
> > > +{
> > > + LogicalDecodingContext *ctx = cache->private_data;
> > > + LogicalErrorCallbackState state;
> > > + ErrorContextCallback errcallback;
> > > +
> > > + Assert(!ctx->fast_forward);
> > > +
> > > + /* We're only supposed to call this when streaming is supported. */
> > > + Assert(ctx->streaming);
> > > +
> > > + /* Push callback + info on the error context stack */
> > > + state.ctx = ctx;
> > > + state.callback_name = "stream_start";
> > > + /* state.report_location = apply_lsn; */
> > >
> > > Why can't we supply the report_location here?  I think here we need to
> > > report txn->first_lsn if this is the very first stream and
> > > txn->final_lsn if it is any consecutive one.
> >
> > I am not sure about this,  Because for the very first stream we will
> > report the location of the first lsn of the stream and for the
> > consecutive stream we will report the last lsn in the stream.
> >
>
> Yeah, that doesn't seem to be consistent.  How about if get it as an
> additional parameter?  The caller can pass the lsn of the very first
> change it is trying to decode in this stream.

Done

> > > 11.
> > > - * HeapTupleSatisfiesHistoricMVCC.
> > > + * tqual.c's HeapTupleSatisfiesHistoricMVCC.
> > > + *
> > > + * We do build the hash table even if there are no CIDs. That's
> > > + * because when streaming in-progress transactions we may run into
> > > + * tuples with the CID before actually decoding them. Think e.g. about
> > > + * INSERT followed by TRUNCATE, where the TRUNCATE may not be decoded
> > > + * yet when applying the INSERT. So we build a hash table so that
> > > + * ResolveCminCmaxDuringDecoding does not segfault in this case.
> > > + *
> > > + * XXX We might limit this behavior to streaming mode, and just bail
> > > + * out when decoding transaction at commit time (at which point it's
> > > + * guaranteed to see all CIDs).
> > >   */
> > >  static void
> > >  ReorderBufferBuildTupleCidHash(ReorderBuffer *rb, ReorderBufferTXN *txn)
> > > @@ -1350,9 +1498,6 @@ ReorderBufferBuildTupleCidHash(ReorderBuffer
> > > *rb, ReorderBufferTXN *txn)
> > >   dlist_iter iter;
> > >   HASHCTL hash_ctl;
> > >
> > > - if (!rbtxn_has_catalog_changes(txn) || dlist_is_empty(&txn->tuplecids))
> > > - return;
> > > -
> > >
> > > I don't understand this change.  Why would "INSERT followed by
> > > TRUNCATE" could lead to a tuple which can come for decode before its
> > > CID?
> >
> > Actually, even if we haven't decoded the DDL operation but in the
> > actual system table the tuple might have been deleted from the next
> > operation.  e.g. while we are streaming the INSERT it is possible that
> > the truncate has already deleted that tuple and set the max for the
> > tuple.  So before streaming patch, we were only streaming the INSERT
> > only on commit so by that time we had got all the operation which has
> > done DDL and we would have already prepared tupl

pg_stat_wal_receiver and flushedUpto/writtenUpto

2020-05-15 Thread Michael Paquier

Hi,

As discussed in the thread that introduced d140f2f3 to rename
receivedUpto to flushedUpto and add writtenUpto to the WAL receiver's
shared memory information, the equivalent columns in
pg_stat_wal_receiver have not been renamed:
https://www.postgresql.org/message-id/ca+hukgj06d3h5jeotav4h52n0vg1jopzxqmcn5fysjquvza...@mail.gmail.com

When I have implemented this system view, the idea was to keep a
one-one mapping between the SQL interface and the shmem info even if
we are not compatible with past versions, hence I think that before
beta1 we had better fix that and:
- rename received_lsn to flushed_lsn.
- add one column for writtenUpto.

Attached is a patch to do that.  This needs also a catalog version
bump, and I am adding an open item.
Thanks,
--
Michael
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9edae40ed8..24d8128339 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5244,9 +5244,9 @@
 { oid => '3317', descr => 'statistics: information about WAL receiver',
   proname => 'pg_stat_get_wal_receiver', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
-  proallargtypes => '{int4,text,pg_lsn,int4,pg_lsn,int4,timestamptz,timestamptz,pg_lsn,timestamptz,text,text,int4,text}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,status,receive_start_lsn,receive_start_tli,received_lsn,received_tli,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time,slot_name,sender_host,sender_port,conninfo}',
+  proallargtypes => '{int4,text,pg_lsn,int4,pg_lsn,int4,timestamptz,timestamptz,pg_lsn,timestamptz,text,text,int4,text,pg_lsn}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,status,receive_start_lsn,receive_start_tli,flushed_lsn,received_tli,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time,slot_name,sender_host,sender_port,conninfo,written_lsn}',
   prosrc => 'pg_stat_get_wal_receiver' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', proisstrict => 'f', provolatile => 's',
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2bd5f5ea14..20669da784 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -812,7 +812,7 @@ CREATE VIEW pg_stat_wal_receiver AS
 s.status,
 s.receive_start_lsn,
 s.receive_start_tli,
-s.received_lsn,
+s.flushed_lsn,
 s.received_tli,
 s.last_msg_send_time,
 s.last_msg_receipt_time,
@@ -821,7 +821,8 @@ CREATE VIEW pg_stat_wal_receiver AS
 s.slot_name,
 s.sender_host,
 s.sender_port,
-s.conninfo
+s.conninfo,
+s.written_lsn
 FROM pg_stat_get_wal_receiver() s
 WHERE s.pid IS NOT NULL;
 
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index d69fb90132..e26394c7ef 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1348,7 +1348,8 @@ pg_stat_get_wal_receiver(PG_FUNCTION_ARGS)
 	WalRcvState state;
 	XLogRecPtr	receive_start_lsn;
 	TimeLineID	receive_start_tli;
-	XLogRecPtr	received_lsn;
+	XLogRecPtr	flushed_lsn;
+	XLogRecPtr	written_lsn;
 	TimeLineID	received_tli;
 	TimestampTz last_send_time;
 	TimestampTz last_receipt_time;
@@ -1366,7 +1367,8 @@ pg_stat_get_wal_receiver(PG_FUNCTION_ARGS)
 	state = WalRcv->walRcvState;
 	receive_start_lsn = WalRcv->receiveStart;
 	receive_start_tli = WalRcv->receiveStartTLI;
-	received_lsn = WalRcv->flushedUpto;
+	flushed_lsn = WalRcv->flushedUpto;
+	written_lsn = pg_atomic_read_u64(&WalRcv->writtenUpto);
 	received_tli = WalRcv->receivedTLI;
 	last_send_time = WalRcv->lastMsgSendTime;
 	last_receipt_time = WalRcv->lastMsgReceiptTime;
@@ -1413,10 +1415,10 @@ pg_stat_get_wal_receiver(PG_FUNCTION_ARGS)
 		else
 			values[2] = LSNGetDatum(receive_start_lsn);
 		values[3] = Int32GetDatum(receive_start_tli);
-		if (XLogRecPtrIsInvalid(received_lsn))
+		if (XLogRecPtrIsInvalid(flushed_lsn))
 			nulls[4] = true;
 		else
-			values[4] = LSNGetDatum(received_lsn);
+			values[4] = LSNGetDatum(flushed_lsn);
 		values[5] = Int32GetDatum(received_tli);
 		if (last_send_time == 0)
 			nulls[6] = true;
@@ -1450,6 +1452,10 @@ pg_stat_get_wal_receiver(PG_FUNCTION_ARGS)
 			nulls[13] = true;
 		else
 			values[13] = CStringGetTextDatum(conninfo);
+		if (XLogRecPtrIsInvalid(written_lsn))
+			nulls[14] = true;
+		else
+			values[14] = LSNGetDatum(written_lsn);
 	}
 
 	/* Returns the record as Datum */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 8876025aaa..db29058f47 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2124,7 +2124,7 @@ pg_stat_wal_receiver| SELEC

Re: MultiXact\SLRU buffers configuration

2020-05-15 Thread Andrey M. Borodin




> 15 мая 2020 г., в 05:03, Kyotaro Horiguchi  
> написал(а):
> 
> At Thu, 14 May 2020 11:44:01 +0500, "Andrey M. Borodin" 
>  wrote in 
>>> GetMultiXactIdMembers believes that 4 is successfully done if 2
>>> returned valid offset, but actually that is not obvious.
>>> 
>>> If we add a single giant lock just to isolate ,say,
>>> GetMultiXactIdMember and RecordNewMultiXact, it reduces concurrency
>>> unnecessarily.  Perhaps we need finer-grained locking-key for standby
>>> that works similary to buffer lock on primary, that doesn't cause
>>> confilicts between irrelevant mxids.
>>> 
>> We can just replay members before offsets. If offset is already there - 
>> members are there too.
>> But I'd be happy if we could mitigate those 1000us too - with a hint about 
>> last maixd state in a shared MX state, for example.
> 
> Generally in such cases, condition variables would work.  In the
> attached PoC, the reader side gets no penalty in the "likely" code
> path.  The writer side always calls ConditionVariableBroadcast but the
> waiter list is empty in almost all cases.  But I couldn't cause the
> situation where the sleep 1000u is reached.
Thanks! That really looks like a good solution without magic timeouts. 
Beautiful!
I think I can create temporary extension which calls MultiXact API and tests 
edge-cases like this 1000us wait.
This extension will also be also useful for me to assess impact of bigger 
buffers, reduced read locking (as in my 2nd patch) and other tweaks.

>> Actually, if we read empty mxid array instead of something that is replayed 
>> just yet - it's not a problem of inconsistency, because transaction in this 
>> mxid could not commit before we started. ISTM.
>> So instead of fix, we, probably, can just add a comment. If this reasoning 
>> is correct.
> 
> The step 4 of the reader side reads the members of the target mxid. It
> is already written if the offset of the *next* mxid is filled-in.
Most often - yes, but members are not guaranteed to be filled in order. Those 
who win MXMemberControlLock will write first.
But nobody can read members of MXID before it is returned. And its members will 
be written before returning MXID.

Best regards, Andrey Borodin.

Re: Is it useful to record whether plans are generic or custom?

2020-05-15 Thread Atsushi Torikoshi

On Thu, May 14, 2020 at 2:28 AM legrand legrand 
wrote:

> Hello,
>
> yes this can be usefull, under the condition of differentiating all the
> counters
> for a queryid using a generic plan and the one using a custom one.
>
> For me one way to do that is adding a generic_plan column to
> pg_stat_statements key, someting like:
> - userid,
> - dbid,
> - queryid,
> - generic_plan
>

Thanks for your kind advice!

> I don't know if generic/custom plan types are available during planning
> and/or
> execution hooks ...

Yeah, that's what I'm concerned about.

As far as I can see, there are no variables tracking executed
plan types so we may need to add variables in
CachedPlanSource or somewhere.

CachedPlanSource.num_custom_plans looked like what is needed,
but it is the number of PLANNING so it also increments when
the planner calculates both plans and decides to take generic
plan.

To track executed plan types, I think execution layer hooks
are appropriate.
These hooks, however, take QueryDesc as a param and it does
not include cached plan information.
Portal includes CachedPlanSource but there seem no hooks to
take Portal.

So I'm wondering it's necessary to add a hook to get Portal
or CachedPlanSource.
Are these too much change for getting plan types?

Regards,

--
Atsushi Torikoshi

Re: pg_regress cleans up tablespace twice.

2020-05-15 Thread Kyotaro Horiguchi

Thank you for looking this!

At Fri, 15 May 2020 11:58:55 +0900, Michael Paquier  wrote 
in 
> On Mon, May 11, 2020 at 05:13:54PM +0900, Kyotaro Horiguchi wrote:
> > Tablespace directory cleanup is not done for all testing
> > targets. Actually it is not done for the tools under bin/ except
> > pg_upgrade.
> 
> Let's first take one problem at a time, as I can see that your patch
> 0002 is modifying a portion of what you added in 0001, and so let's
> try to remove this WIN32 stuff from pg_regress.c.

Yes, 0001 and 0001+0002 are alternatives.  They should be merged if we
are going to fix the pg_upgrade test.  I take this as we go on 0001+0002.

> +sub CleanupTablespaceDirectory
> +{
> +   my $tablespace = 'testtablespace';
> +
> +   rmtree($tablespace) if (-e $tablespace);
> +   mkdir($tablespace);
> +}
> This check should use "-d" and not "-e" as it would be true for a file
> as well.  Also, in pg_regress.c, we remove the existing tablespace

That was intentional so that a file with the name don't stop
testing. Actually pg_regress is checking only for a directory in other
place and it's not that bad since no-one can create a file with that
name while running test.  On the other hand, is there any reason for
refraining from removing if it weren't a directory but a file?

Changed to -d in the attached.

> as well.  Also, in pg_regress.c, we remove the existing tablespace
> test directory in --outputdir, which is "." by default but it can be a
> custom one.  Shouldn't you do the same logic in this new routine?  So
> we should have an optional argument for the output directory that
> defaults to `pwd` if not defined, no?  This means passing down the
> argument only for upgradecheck() in vcregress.pl.

I thought of that but didn't in the patch.  I refrained from doing
that because the output directory is dedicatedly created at the only
place (pg_upgrade test) where the --outputdir is specified. (I think I
tend to do too-much.)

It is easy in perl scripts, but rather complex for makefiles. The
attached is using a perl one-liner to extract outputdir from
EXTRA_REGRESS_OPTS. I don't like that but I didn't come up with better
alternatives.  On the other hand ActivePerl (with default
installation) doesn't seem to know Getopt::Long::GetOptions and
friends.  In the attached vcregress.pl parses --outputdir not using
GetOpt::Long...

>  sub isolationcheck
>  {
>   chdir "../isolation";
> + CleanupTablespaceDirectory();
>   copy("../../../$Config/isolationtester/isolationtester.exe",
>   "../../../$Config/pg_isolation_regress");
>   my @args = (
> [...]
>   print "\n";
>   print "Checking $module\n";
> + CleanupTablespaceDirectory();
>   my @args = (
>   "$topdir/$Config/pg_regress/pg_regress",
>   "--bindir=${topdir}/${Config}/psql",
> I would put that just before the system() calls for consistency with
> the rest.

Right. That's just an mistake. Fixed along with subdircheck.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From db38bd5cdbea950cdeba8bd4745801e9c0a2824c Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 30 Apr 2020 14:06:51 +0900
Subject: [PATCH] Move tablespace cleanup out of pg_regress.

Tablespace directory is cleaned-up in regress/GNUmakefile for all
platforms other than Windows. For Windoiws pg_regress does that on
behalf of make.  That is not only ugly also leads to do the clean up
twice at once.  Let's move the cleanup code out of pg_regress into
vcregress.pl, where is more sutable place.

This also fixes a bug that the test for pg_upgrade mistakenly cleans
up the tablespace directory of default tmp-install location, instead
of its own installation directoty.
---
 src/test/regress/GNUmakefile  |  6 --
 src/test/regress/pg_regress.c | 22 --
 src/tools/msvc/vcregress.pl   | 25 +
 3 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index 1a3164065f..815d87904b 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -118,8 +118,10 @@ submake-contrib-spi: | submake-libpgport submake-generated-headers

 .PHONY: tablespace-setup
 tablespace-setup:
-	rm -rf ./testtablespace
-	mkdir ./testtablespace
+# extract --outputdir optsion from EXTRA_REGRESS_OPTS
+	$(eval dir := $(shell perl -e 'use Getopt::Long qw(GetOptionsFromString Configure); Configure("pass_through", "gnu_getopt"); ($$r, $$x) = GetOptionsFromString($$ENV{"EXTRA_REGRESS_OPTS"}, "outputdir=s" => \$$dir); print ((defined $$dir ? $$dir : ".") . "/testtablespace");'))
+	rm -rf $(dir)
+	mkdir $(dir)

 ##
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 38b2b1e8e1..c56bfaf7f5 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -491,28 +491,6 @@ convert_sourcefiles_in(const char *source_subdir

Re: pg_bsd_indent and -Wimplicit-fallthrough

2020-05-15 Thread Daniel Gustafsson

> On 15 May 2020, at 08:28, Julien Rouhaud  wrote:
> On Fri, May 15, 2020 at 8:03 AM Michael Paquier  wrote:

>> Something like the attached is fine to take care of those warnings,
>> but what's our current patching policy for this tool?
> 
> The patch looks good to me.  It looks like we already have custom
> patches, so +1 to applying it.

Shouldn't we try and propose it to upstream first to minimize our diff?

cheers ./daniel

58 matches

Mail list logo