Re: [HACKERS] Timing-sensitive case in src/test/recovery TAP tests

2017-06-25 Thread Michael Paquier
On Mon, Jun 26, 2017 at 12:12 PM, Craig Ringer  wrote:
> On 26 June 2017 at 11:06, Michael Paquier  wrote:
>
>> As long as we are on it, there is this code block in the test:
>> my ($xmin, $catalog_xmin) = get_slot_xmins($node_master, $slotname_1);
>> is($xmin, '', 'non-cascaded slot xmin null with no hs_feedback');
>> is($catalog_xmin, '', 'non-cascaded slot xmin null with no hs_feedback');
>>
>> ($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2);
>> is($xmin, '', 'cascaded slot xmin null with no hs_feedback');
>> is($catalog_xmin, '', 'cascaded slot xmin null with no hs_feedback');
>>
>> This should be more verbose as the 2nd and 4th test should say
>> "catalog xmin" instead of xmin.
>
> Agree, should. Mind posting a revision?

Sure.

>> Also, wouldn't it be better to poll as well node_standby_1's
>> pg_replication_slot on slotname_2? It would really seem better to make
>> the nullness check conditional in get_slot_xmins instead. Sorry for
>> changing opinion here.
>
> I'm not sure I understand this.

The patch attached may explain that better. Your patch added 3 poll
phases. I think that a 4th is needed. At the same time I have found
cleaner to put the poll calls into a small wrapper.
-- 
Michael
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index 266d27c8a2..d97db659cb 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -146,6 +146,17 @@ $node_standby_2->append_conf('postgresql.conf',
 	"wal_receiver_status_interval = 1");
 $node_standby_2->restart;
 
+sub wait_slot_xmins_update
+{
+	my ($node, $slot_name, $check_expr) = @_;
+
+	$node->poll_query_until('postgres', qq[
+	SELECT $check_expr
+	FROM pg_replication_slots
+	WHERE slot_name = '$slot_name';
+]);
+}
+
 sub get_slot_xmins
 {
 	my ($node, $slotname) = @_;
@@ -156,12 +167,12 @@ sub get_slot_xmins
 # There's no hot standby feedback and there are no logical slots on either peer
 # so xmin and catalog_xmin should be null on both slots.
 my ($xmin, $catalog_xmin) = get_slot_xmins($node_master, $slotname_1);
-is($xmin, '', 'non-cascaded slot xmin null with no hs_feedback');
-is($catalog_xmin, '', 'non-cascaded slot xmin null with no hs_feedback');
+is($xmin, '', 'xmin of non-cascaded slot null with no hs_feedback');
+is($catalog_xmin, '', 'catalog xmin of non-cascaded slot null with no hs_feedback');
 
 ($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2);
-is($xmin, '', 'cascaded slot xmin null with no hs_feedback');
-is($catalog_xmin, '', 'cascaded slot xmin null with no hs_feedback');
+is($xmin, '', 'xmin of cascaded slot null with no hs_feedback');
+is($catalog_xmin, '', 'catalog xmin of cascaded slot null with no hs_feedback');
 
 # Replication still works?
 $node_master->safe_psql('postgres', 'CREATE TABLE replayed(val integer);');
@@ -196,15 +207,17 @@ $node_standby_2->safe_psql('postgres',
 	'ALTER SYSTEM SET hot_standby_feedback = on;');
 $node_standby_2->reload;
 replay_check();
-sleep(2);
+wait_slot_xmins_update($node_master, $slotname_1,
+	   "xmin IS NOT NULL AND catalog_xmin IS NULL");
 
 ($xmin, $catalog_xmin) = get_slot_xmins($node_master, $slotname_1);
-isnt($xmin, '', 'non-cascaded slot xmin non-null with hs feedback');
-is($catalog_xmin, '', 'non-cascaded slot xmin still null with hs_feedback');
+isnt($xmin, '', 'xmin of non-cascaded slot non-null with hs feedback');
+is($catalog_xmin, '',
+	'catalog xmin of non-cascaded slot still null with hs_feedback');
 
 ($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2);
-isnt($xmin, '', 'cascaded slot xmin non-null with hs feedback');
-is($catalog_xmin, '', 'cascaded slot xmin still null with hs_feedback');
+isnt($xmin, '', 'xmin of cascaded slot non-null with hs feedback');
+is($catalog_xmin, '', 'catalog xmin cascaded slot still null with hs_feedback');
 
 note "doing some work to advance xmin";
 for my $i (1 .. 11000)
@@ -216,15 +229,15 @@ $node_master->safe_psql('postgres', 'CHECKPOINT;');
 
 my ($xmin2, $catalog_xmin2) = get_slot_xmins($node_master, $slotname_1);
 note "new xmin $xmin2, old xmin $xmin";
-isnt($xmin2, $xmin, 'non-cascaded slot xmin with hs feedback has changed');
+isnt($xmin2, $xmin, 'xmin of non-cascaded slot with hs feedback has changed');
 is($catalog_xmin2, '',
-	'non-cascaded slot xmin still null with hs_feedback unchanged');
+	'catalog xmin of non-cascaded slot still null with hs_feedback unchanged');
 
 ($xmin2, $catalog_xmin2) = get_slot_xmins($node_standby_1, $slotname_2);
 note "new xmin $xmin2, old xmin $xmin";
-isnt($xmin2, $xmin, 'cascaded slot xmin with hs feedback has changed');
+isnt($xmin2, $xmin, 'xmin of cascaded slot with hs feedback has changed');
 is($catalog_xmin2, '',
-	'cascaded slot xmin still null with hs_feedback unchanged');
+	'catalog xmin of cascaded slot still null with hs_feedback 

Re: [HACKERS] Code quality issues in ICU patch

2017-06-25 Thread Noah Misch
On Sat, Jun 24, 2017 at 10:03:25AM -0400, Peter Eisentraut wrote:
> On 6/23/17 12:31, Tom Lane wrote:
> > icu_to_uchar() and icu_from_uchar(), and perhaps other places, are
> > touchingly naive about integer overflow hazards in buffer size
> > calculations.  I call particular attention to this bit in
> > icu_from_uchar():
> > 
> > len_result = UCNV_GET_MAX_BYTES_FOR_STRING(len_uchar, 
> > ucnv_getMaxCharSize(icu_converter));
> > 
> > The ICU man pages say that that macro is defined as
> > 
> > #define UCNV_GET_MAX_BYTES_FOR_STRING(length, maxCharSize)  
> > (((int32_t)(length)+10)*(int32_t)(maxCharSize))
> > 
> > which means that getting this to overflow (resulting in
> > probably-exploitable memory overruns) would be about as hard as taking
> > candy from a baby.
> 
> Here is a patch that should address this.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gen_random_uuid security not explicit in documentation

2017-06-25 Thread Noah Misch
On Fri, Jun 23, 2017 at 10:23:36AM +0900, Michael Paquier wrote:
> On Fri, Jun 23, 2017 at 3:02 AM, Heikki Linnakangas  wrote:
> > I'm inclined to change gen_random_uuid() to throw an error if the server is
> > built with --disable-strong-random, like gen_random_bytes() does. That way,
> > they would behave the same.
> 
> No objections to do that. I guess you don't need a patch. As this is
> new to 10, I have added an open item.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Heikki,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Same expression more than once in partition key

2017-06-25 Thread Amit Langote
On 2017/06/24 5:04, Tom Lane wrote:
> Peter Eisentraut  writes:
>> We also allow the same column more than once in an index.  We probably
>> don't have to be more strict here.
> 
> There actually are valid uses for the same column more than once in
> an index, eg if you use a different operator class for each instance.
> I find it hard to envision a similar use-case in partitioning though.

So, does this mean we don't need to apply Nagata-san's patch for now?

As far as the partitioning internals are concerned, I see no harm in
allowing a column to appear more than once in the partition key.  OTOH, I
too don't see a partitioning use-case which would require it.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting pd_lower in GIN metapage

2017-06-25 Thread Masahiko Sawada
On Mon, Jun 26, 2017 at 10:54 AM, Michael Paquier
 wrote:
> On Fri, Jun 23, 2017 at 11:17 AM, Amit Langote
>  wrote:
>> That was it, thanks for the pointer.
>
> GinInitMetabuffer() sets up pd_lower and pd_upper anyway using
> PageInit so the check of PageIsVerified is guaranteed to work in any
> case. Upgraded pages will still have their pd_lower set to the
> previous values, and new pages will have the optimization. So this
> patch is actually harmless for past pages, while newer ones are seen
> as more compressible.
>
>> Attached updated patch, which I confirmed, passes wal_consistency_check = 
>> gin.
>
> I have spent some time looking at this patch, playing with pg_upgrade
> to check the state of the page upgraded. And this looks good to me.
> One thing that I noticed is that this optimization could as well
> happen for spgist meta pages. What do others think?

Good point. I think it could happen for brin meta page as well.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting pd_lower in GIN metapage

2017-06-25 Thread Masahiko Sawada
On Sat, Jun 24, 2017 at 1:35 AM, Alvaro Herrera
 wrote:
> Masahiko Sawada wrote:
>> On Fri, Jun 23, 2017 at 3:31 PM, Michael Paquier
>>  wrote:
>> > On Fri, Jun 23, 2017 at 3:17 PM, Amit Langote
>> >  wrote:
>> >> Initially, I had naively set wal_consistency_check = all before running
>> >> make installcheck and then had to wait for a long time to confirm that WAL
>> >> generated by the gin test indeed caused consistency check failure on the
>> >> standby with the v1 patch.
>> >
>> > wal_consistency_check = gin would have saved you a lot of I/O.
>> >
>> >> But I can see Sawada-san's point that there should be some way for
>> >> developers writing code that better had gone through WAL consistency
>> >> checking facility to do it without much hassle.  But then again, it may
>> >> not be that frequent to need that.
>>
>> Yeah, it should be optional. I imagined providing such an option of
>> pg_regress or TAP test for the developers.
>
> As far as I know it is possible to have third-party modules that extend
> the buildfarm client script so that it runs additional tests that the
> standard ones.  You could have a custom module installed in some
> powerful machine of yours that runs the WAL consistency check and report
> the results to the buildfarm.  A single animal running that test should
> be enough, right?
>

Yes, thank you for the information. It's a good idea. I'll try it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-06-25 Thread Masahiko Sawada
On Sun, Jun 25, 2017 at 7:35 PM, Petr Jelinek
 wrote:
> (was away for a while, got some time now for this again)
>
> On 22/06/17 04:43, Peter Eisentraut wrote:
>> The alternative is that we use the LockSharedObject() approach that was
>> already alluded to, like in the attached patch.  Both approaches would
>> work equally fine AFAICT.
>
> I agree, but I think we need bigger overhaul of the locking/management
> in general. So here is patch which does much more changes.
>
> The patch does several important things (in no particular order):
> - Split SetSubscriptionRelState into AddSubscriptionRelState and
> UpdateSubscriptionRelState for the reasons said upstream, it's cleaner,
> there is no half-broken upsert logic and it has proper error checking
> for each action.
>
> - Do LockSharedObject in ALTER SUBSCRIPTION, DROP SUBSCRIPTION (this one
> is preexisting but mentioning it for context), SetSubscriptionRelState,
> AddSubscriptionRelState, and in the logicalrep_worker_launch. This means
> we use granular per object locks to deal with concurrency.
>
> - Because of above, the AccessExclusiveLock on pg_subscription is no
> longer needed, just normal RowExlusiveLock is used now.
>
> - logicalrep_worker_stop is also simplified due to the proper locking
>
> - There is new interface logicalrep_worker_stop_at_commit which is used
> by ALTER SUBSCRIPTION ... REFRESH PUBLICATION and by transactional
> variant of DROP SUBSCRIPTION to only kill workers at the end of transaction.
>
> - Locking/reading of subscription info is unified between DROP and ALTER
> SUBSCRIPTION commands.
>
> - DROP SUBSCRIPTION will kill all workers associated with subscription,
> not just apply.
>
> - The sync worker checks during startup if the relation is still subscribed.
>
> - The sync worker will exit when waiting for apply and apply has shut-down.
>
> - The log messages around workers and removed or disabled subscription
> are now more consistent between startup and normal runtime of the worker.
>
> - Some code deduplication and stylistic changes/simplification in
> related areas.
>
> - Fixed catcache's IndexScanOK() handling of the subscription catalog.
>
> It's bit bigger patch but solves issues from multiple threads around
> handling of ALTER/DROP subscription.
>
> A lot of the locking that I added is normally done transparently by
> dependency handling, but subscriptions and subscription relation status
> do not use that much as it was deemed to bloat pg_depend needlessly
> during the original patch review (it's also probably why this has
> slipped through).
>

Thank you for reworking on this! I'll review this patch on Tuesday.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timing-sensitive case in src/test/recovery TAP tests

2017-06-25 Thread Craig Ringer
On 26 June 2017 at 11:06, Michael Paquier  wrote:

> As long as we are on it, there is this code block in the test:
> my ($xmin, $catalog_xmin) = get_slot_xmins($node_master, $slotname_1);
> is($xmin, '', 'non-cascaded slot xmin null with no hs_feedback');
> is($catalog_xmin, '', 'non-cascaded slot xmin null with no hs_feedback');
>
> ($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2);
> is($xmin, '', 'cascaded slot xmin null with no hs_feedback');
> is($catalog_xmin, '', 'cascaded slot xmin null with no hs_feedback');
>
> This should be more verbose as the 2nd and 4th test should say
> "catalog xmin" instead of xmin.

Agree, should. Mind posting a revision?

> Also, wouldn't it be better to poll as well node_standby_1's
> pg_replication_slot on slotname_2? It would really seem better to make
> the nullness check conditional in get_slot_xmins instead. Sorry for
> changing opinion here.

I'm not sure I understand this.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-06-25 Thread Amit Khandekar
On 22 June 2017 at 01:41, Robert Haas  wrote:
>>> Second, it will amount to a functional bug if you get a
>>> different answer than the planner did.
>>
>> Actually, the per-leaf WCOs are meant to be executed on the
>> destination partitions where the tuple is moved, while the WCOs
>> belonging to the per-subplan resultRelInfo are meant for the
>> resultRelinfo used for the UPDATE plans. So actually it should not
>> matter whether they look same or different, because they are fired at
>> different objects. Now these objects can happen to be the same
>> relations though.
>>
>> But in any case, it's not clear to me how the mapped WCO and the
>> planner's WCO would yield a different answer if they are both the same
>> relation. I am possibly missing something. The planner has already
>> generated the withCheckOptions for each of the resultRelInfo. And then
>> we are using one of those to re-generate the WCO for a leaf partition
>> by only adjusting the attnos. If there is already a WCO generated in
>> the planner for that leaf partition (because that partition was
>> present in mtstate->resultRelInfo), then the re-built WCO should be
>> exactly look same as the earlier one, because they are the same
>> relations, and so the attnos generated in them would be same since the
>> Relation TupleDesc is the same.
>
> If the planner's WCOs and mapped WCOs are always the same, then I
> think we should try to avoid generating both.  If they can be
> different, but that's intentional and correct, then there's no
> substantive problem with the patch but the comments need to make it
> clear why we are generating both.
>
>> Actually I meant, "above works for only local updates. For
>> row-movement-updates, we need per-leaf partition WCOs, because when
>> the row is inserted into target partition, that partition may be not
>> be included in the above planner resultRelInfo, so we need WCOs for
>> all partitions". I think this said comment should be sufficient if I
>> add this in the code ?
>
> Let's not get too focused on updating the comment until we are in
> agreement about what the code ought to be doing.  I'm not clear
> whether you accept the point that the patch needs to be changed to
> avoid generating the same WCOs and returning lists in both the planner
> and the executor.

Yes, we can re-use the WCOs generated in the planner, as an
optimization, since those we re-generate for the same relations will
look exactly the same. The WCOs generated by planner (in
inheritance_planner) are generated when (in adjust_appendrel_attrs())
we change attnos used in the query to refer to the child RTEs and this
adjusts the attnos of the WCOs of the child RTEs. So the WCOs of
subplan resultRelInfo are actually the parent table WCOs, but only the
attnos changed. And in ExecInitModifyTable() we do the same thing for
leaf partitions, although using different function
map_variable_attnos().

>
>>> Also, I feel like it's probably not correct to use the first result
>>> relation as the nominal relation for building WCOs and returning lists
>>> anyway.  I mean, if the first result relation has a different column
>>> order than the parent relation, isn't this just broken?  If it works
>>> for some reason, the comments don't explain what that reason is.

One thing I didn't mention earlier about the WCOs, is that for child
rels, we don't use the WCOs defined for the child rels. We only
inherit the WCO expressions defined for the root rel. That's the
reason they are the same expressions, only the attnos changed to match
the respective relation tupledesc. If the WCOs of each of the subplan
resultRelInfo() were different, then definitely it was not possible to
use the first resultRelinfo to generate other leaf partition WCOs,
because the WCO defined for relation A is independent of that defined
for relation B.

So, since the WCOs of all the relations are actually those of the
parent, we only need to adjust the attnos of any of these
resultRelInfos.

For e.g., if the root rel WCO is defined as "col > 5" where col is the
4th column, the expression will look like "var_1.attno_4 > 5". And the
WCO that is generated for a subplan resultRelInfo will look something
like "var_n.attno_2 > 5" if col is the 2nd column in this table.

All of the above logic assumes that we never use the WCO defined for
the child relation. At least that's how it looks by looking at the way
we generate WCOs in ExecInitModifyTable() for INSERTs as well looking
at the code in inheritance_planner() for UPDATEs. At both these
places, we never use the WCOs defined for child tables.

So suppose we define the tables and their WCOs like this :

CREATE TABLE range_parted ( a text, b int, c int) partition by range (a, b);

ALTER TABLE range_parted ENABLE ROW LEVEL SECURITY;
GRANT ALL ON range_parted TO PUBLIC ;
create policy seeall ON range_parted as PERMISSIVE for SELECT using ( true);

create table part_b_10_b_20 partition of range_parted for values from
('b', 

Re: [HACKERS] Timing-sensitive case in src/test/recovery TAP tests

2017-06-25 Thread Michael Paquier
On Mon, Jun 26, 2017 at 11:44 AM, Craig Ringer  wrote:
> On 26 June 2017 at 10:09, Tom Lane  wrote:
>> Michael Paquier  writes:
>>> On Mon, Jun 26, 2017 at 10:48 AM, Craig Ringer  
>>> wrote:
 $node_standby_1->poll_query_until('postgres', q[SELECT xmin IS NULL
 from pg_replication_slots WHERE slot_name = '] . $slotname_2 . q[']);
>>
>>> +1 for avoiding a sleep call if it is not necessary. Fast platforms
>>> would always pay a cost on that, and slow platforms would wait 1s (or
>>> more!) when polling for the result.
>>
>>> Could it be possible to remove as well the second sleep(2) call in
>>> this test please?
>>
>> Yes, I'd like to see those fixed sleeps go away too.  Want to work
>> on a concrete patch?
>
>
> Attached.

Thanks for the patch.

As long as we are on it, there is this code block in the test:
my ($xmin, $catalog_xmin) = get_slot_xmins($node_master, $slotname_1);
is($xmin, '', 'non-cascaded slot xmin null with no hs_feedback');
is($catalog_xmin, '', 'non-cascaded slot xmin null with no hs_feedback');

($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2);
is($xmin, '', 'cascaded slot xmin null with no hs_feedback');
is($catalog_xmin, '', 'cascaded slot xmin null with no hs_feedback');

This should be more verbose as the 2nd and 4th test should say
"catalog xmin" instead of xmin.

Also, wouldn't it be better to poll as well node_standby_1's
pg_replication_slot on slotname_2? It would really seem better to make
the nullness check conditional in get_slot_xmins instead. Sorry for
changing opinion here.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timing-sensitive case in src/test/recovery TAP tests

2017-06-25 Thread Craig Ringer
On 26 June 2017 at 10:09, Tom Lane  wrote:
> Michael Paquier  writes:
>> On Mon, Jun 26, 2017 at 10:48 AM, Craig Ringer  wrote:
>>> $node_standby_1->poll_query_until('postgres', q[SELECT xmin IS NULL
>>> from pg_replication_slots WHERE slot_name = '] . $slotname_2 . q[']);
>
>> +1 for avoiding a sleep call if it is not necessary. Fast platforms
>> would always pay a cost on that, and slow platforms would wait 1s (or
>> more!) when polling for the result.
>
>> Could it be possible to remove as well the second sleep(2) call in
>> this test please?
>
> Yes, I'd like to see those fixed sleeps go away too.  Want to work
> on a concrete patch?


Attached.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


fix-timing-in-tap-001-stream-rep.pl
Description: Perl program

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timing-sensitive case in src/test/recovery TAP tests

2017-06-25 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Jun 26, 2017 at 10:48 AM, Craig Ringer  wrote:
>> $node_standby_1->poll_query_until('postgres', q[SELECT xmin IS NULL
>> from pg_replication_slots WHERE slot_name = '] . $slotname_2 . q[']);

> +1 for avoiding a sleep call if it is not necessary. Fast platforms
> would always pay a cost on that, and slow platforms would wait 1s (or
> more!) when polling for the result.

> Could it be possible to remove as well the second sleep(2) call in
> this test please?

Yes, I'd like to see those fixed sleeps go away too.  Want to work
on a concrete patch?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timing-sensitive case in src/test/recovery TAP tests

2017-06-25 Thread Michael Paquier
On Mon, Jun 26, 2017 at 10:48 AM, Craig Ringer  wrote:
> $node_standby_1->poll_query_until('postgres', q[SELECT xmin IS NULL
> from pg_replication_slots WHERE slot_name = '] . $slotname_2 . q[']);

+1 for avoiding a sleep call if it is not necessary. Fast platforms
would always pay a cost on that, and slow platforms would wait 1s (or
more!) when polling for the result.

Could it be possible to remove as well the second sleep(2) call in
this test please? This could be replaced by a similar poll using the
query Craig has just given as this needs to wait until the xmon is
cleared. Other tests expect a value so this poll cannot happen in
get_slot_xmins.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Corner-case error in pgstats file loading

2017-06-25 Thread Tom Lane
Using the patch I posted a little while ago to reduce pg_ctl's reaction
time, I noticed that there were semi-reproducible ten-second delays in
postmaster shutdown, in some cases in the recovery tests where we quickly
stop and restart and again stop the postmaster.  The cause turned out to
be that the autovacuum launcher is failing to exit until after it times
out trying to load the global stats file.  It goes like this:

* at startup, the stats collector reads and deletes the permanent
stats files, which have stats_timestamp equal to when they were written
during the prior shutdown.

* the autovac launcher tries to read the temp stats file.  There isn't
any, so it sends an inquiry message with cutoff_time equal to the current
time less PGSTAT_STAT_INTERVAL.

* the collector notes that the cutoff_time is before what it has recorded
as the stats_timestamp, and does nothing.

* the autovac launcher keeps waiting, and occasionally retransmitting
the inquiry message.  The collector keeps ignoring it.

* when the postmaster sends SIGTERM to the autovac launcher, that isn't
enough to break it out of the loop in backend_read_statsfile.  So only
after PGSTAT_MAX_WAIT_TIME (10s) does the loop give up, and only after
that does the autovac launcher realize it should exit.

We don't usually see this in the regression tests without the pg_ctl
patch, because without that, we typically spend enough time twiddling
our thumbs after the first postmaster shutdown for PGSTAT_STAT_INTERVAL
to elapse, so that the reloaded stats files appear stale already.

So this opens a number of issues:

* The stats collector, when reading in the permanent stats files,
ought not believe that their timestamps have anything to do with
when it needs to write out temp stats files.  This seems like an
ancient thinko dating back to when we introduced a separation
between temp and permanent stats files.  It seems easily fixable,
though, as in the attached patch.

* It's not great that the autovac launcher's SIGTERM handler doesn't
trigger query cancel handling; if it did, that would have allowed the
CHECK_FOR_INTERRUPTS in backend_read_statsfile to get us out of this
situation.  But I'm not (yet) sufficiently enthused about this to do
the research needed on whether it'd be safe.

* Perhaps it's not such a great idea that the loop in
backend_read_statsfile doesn't advance min_ts from its initial value.
If it had done so, that would've triggered a stats file update after at
most PGSTAT_STAT_INTERVAL, getting us out of this situation a lot sooner.
But that behavior is quite intentional right now, and again I'm lacking
the energy to investigate what would be the downsides of changing it.

Anyway, I present the attached proposed patch, which fixes this
problem by zeroing the stats_timestamp fields when the collector
reads them.  (The same code is also used in backends and the autovac
launcher to read temp stats files, and there we don't want to reset
the fields.)  I also noticed that the code wasn't being careful to
reset globalStats and archiverStats to zeroes if they were partially
filled from a corrupted stats file, so this deals with that too.

This seems like a backpatchable bug fix to me; any objections?

regards, tom lane

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 65b7b32..a0b0eec 100644
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
*** pgstat_read_statsfiles(Oid onlydb, bool 
*** 4947,4962 
--- 4947,4974 
  	{
  		ereport(pgStatRunningInCollector ? LOG : WARNING,
  (errmsg("corrupted statistics file \"%s\"", statfile)));
+ 		memset(, 0, sizeof(globalStats));
  		goto done;
  	}
  
  	/*
+ 	 * In the collector, disregard the timestamp we read from the permanent
+ 	 * stats file; we should be willing to write a temp stats file immediately
+ 	 * upon the first request from any backend.  This only matters if the old
+ 	 * file's timestamp is less than PGSTAT_STAT_INTERVAL ago, but that's not
+ 	 * an unusual scenario.
+ 	 */
+ 	if (pgStatRunningInCollector)
+ 		globalStats.stats_timestamp = 0;
+ 
+ 	/*
  	 * Read archiver stats struct
  	 */
  	if (fread(, 1, sizeof(archiverStats), fpin) != sizeof(archiverStats))
  	{
  		ereport(pgStatRunningInCollector ? LOG : WARNING,
  (errmsg("corrupted statistics file \"%s\"", statfile)));
+ 		memset(, 0, sizeof(archiverStats));
  		goto done;
  	}
  
*** pgstat_read_statsfiles(Oid onlydb, bool 
*** 5002,5007 
--- 5014,5028 
  dbentry->functions = NULL;
  
  /*
+  * In the collector, disregard the timestamp we read from the
+  * permanent stats file; we should be willing to write a temp
+  * stats file immediately upon the first request from any
+  * backend.
+  */
+ if (pgStatRunningInCollector)
+ 	dbentry->stats_timestamp = 0;
+ 
+ /*
   * Don't create tables/functions hashtables for uninteresting
   * databases.
  

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-06-25 Thread Michael Paquier
On Fri, Jun 23, 2017 at 11:17 AM, Amit Langote
 wrote:
> That was it, thanks for the pointer.

GinInitMetabuffer() sets up pd_lower and pd_upper anyway using
PageInit so the check of PageIsVerified is guaranteed to work in any
case. Upgraded pages will still have their pd_lower set to the
previous values, and new pages will have the optimization. So this
patch is actually harmless for past pages, while newer ones are seen
as more compressible.

> Attached updated patch, which I confirmed, passes wal_consistency_check = gin.

I have spent some time looking at this patch, playing with pg_upgrade
to check the state of the page upgraded. And this looks good to me.
One thing that I noticed is that this optimization could as well
happen for spgist meta pages. What do others think?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timing-sensitive case in src/test/recovery TAP tests

2017-06-25 Thread Craig Ringer
On 26 June 2017 at 05:10, Tom Lane  wrote:
> I've been experimenting with a change to pg_ctl, which I'll post
> separately, to reduce its reaction time so that it reports success
> more quickly after a wait for postmaster start/stop.  I found one
> case in "make check-world" that got a failure when I reduced the
> reaction time to ~1ms.  That's the very last test in 001_stream_rep.pl,
> "cascaded slot xmin reset after startup with hs feedback reset", and
> the cause appears to be that it's not allowing any delay time for a
> replication slot's state to update after a postmaster restart.
>
> This seems worth fixing independently of any possible code changes,
> because it shows that this test could fail on a slow or overloaded
> machine.  I couldn't find any instances of such a failure in the
> buildfarm archives, but that may have a lot to do with the fact that
> owners of slow buildfarm animals are (mostly?) not running this test.
>
> Some experimentation says that the minimum delay needed to make it
> work reliably on my workstation is about 100ms, so a simple patch
> along the lines of the attached might be good enough.  I find this
> approach conceptually dissatisfying, though, since it's still
> potentially vulnerable to the failure under sufficient load.
> I wonder if there is an easy way to improve that ... maybe convert
> to something involving poll_query_until?

This should do the trick:

$node_standby_1->poll_query_until('postgres', q[SELECT xmin IS NULL
from pg_replication_slots WHERE slot_name = '] . $slotname_2 . q[']);




-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2017-06-25 Thread Chapman Flack
I notice CopyXLogRecordToWAL contains this loop (in the case where
the record being copied is a switch):

while (CurrPos < EndPos)
{
/* initialize the next page (if not initialized already) */
WALInsertLockUpdateInsertingAt(CurrPos);
AdvanceXLInsertBuffer(CurrPos, false);
CurrPos += XLOG_BLCKSZ;
}

in which it calls, one page at a time, AdvanceXLInsertBuffer, which contains
its own loop able to do a sequence of pages. A comment explains why:

/*
 * We do this one page at a time, to make sure we don't deadlock
 * against ourselves if wal_buffers < XLOG_SEG_SIZE.
 */

I want to make sure I understand what the deadlock potential is
in this case. AdvanceXLInsertBuffer will call WaitXLogInsertionsToFinish
before writing any dirty buffer, and we do hold insertion slot locks
(all of 'em, in the case of a log switch, because that makes
XlogInsertRecord call WALInsertLockAcquireExclusive instead of just
WALInsertLockAcquire for other record types).

Does not the fact we hold all the insertion slots exclude the possibility
that any dirty buffer (preceding the one we're touching) needs to be checked
for in-flight insertions?

I've been thinking along the lines of another parameter to
AdvanceXLInsertBuffer to indicate when the caller is exactly this loop
filling out the tail after a log switch (originally, to avoid filling
in page headers). It now seems to me that, if AdvanceXLInsertBuffer
has that information, it could also be safe for it to skip the
WaitXLogInsertionsToFinish in that case. Would that eliminate the
deadlock potential, and allow the loop in CopyXLogRecordToWAL to be
replaced with a single call to AdvanceXLInsertBuffer and a single
WALInsertLockUpdateInsertingAt ?

Or have I overlooked some other subtlety?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-06-25 Thread Mark Rofail
*What I did:*



   - read into the old patch but couldn't apply it since it's quite old. It
   needs to be rebased and that's what I am working on.  It's a lot of work.
  - incomplete patch can be found attached here

*Bugs*

   - problem with the @>(anyarray, anyelement) opertator: if for example,
   you apply the operator as follows  '{AA646'}' @> 'AA646' it
   maps to @>(anyarray, anyarray) since 'AA646' is interpreted as
   char[] instead of Text

*Suggestion:*

   - since I needed to check if the Datum was null and its type, I had to
   do it in the arraycontainselem and pass it as a parameter to the underlying
   function array_contains_elem. I'm proposing to introduce a new struct like
   ArrayType, but ElementType along all with brand new MACROs to make dealing
   with anyelement easier in any polymorphic context.


Best Regards,
Mark Rofail

On Tue, Jun 20, 2017 at 12:19 AM, Alvaro Herrera 
wrote:

> Mark Rofail wrote:
> > Okay, so major breakthrough.
> >
> > *Updates:*
> >
> >- The operator @>(anyarray, anyelement) is now functional
> >   - The segmentation fault was due to applying PG_FREE_IF_COPY on a
> >   datum when it should only be applied on TOASTed inputs
> >   - The only problem now is if for example you apply the operator as
> >   follows  '{AA646'}' @> 'AA646' it maps to
> @>(anyarray,
> >   anyarray) since 'AA646' is interpreted as char[] instead
> of Text
> >- Added some regression tests (src/test/regress/sql/arrays.sql) and
> >their results(src/test/regress/expected/arrays.out)
> >- wokred on the new GIN strategy, I don't think it would vary much
> from
> >GinContainsStrategy.
>
> OK, that's great.
>
> > *What I plan to do:*
> >
> >- I need to start working on the Referential Integrity code but I
> don't
> >where to start
>
> You need to study the old patch posted by Marco Nenciarini.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ea655a10a8..712f631e88 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2288,6 +2288,14 @@ SCRAM-SHA-256$iteration count:salt<
  
 
  
+  confiselement
+  bool
+  
+  If a foreign key, is it an array ELEMENT
+  foreign key?
+ 
+
+ 
   coninhcount
   int4
   
@@ -2324,6 +2332,18 @@ SCRAM-SHA-256$iteration count:salt<
  
 
  
+  confelement
+  bool[]
+  
+  
+ 	If a foreign key, list of booleans expressing which columns
+ 	are array ELEMENT columns; see
+ 	
+ 	for details
+  
+ 
+
+ 
   conpfeqop
   oid[]
   pg_operator.oid
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index b05a9c2150..c1c847bc7e 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -881,7 +881,112 @@ CREATE TABLE order_items (
 .

   
-
+  
+   
+Array ELEMENT Foreign Keys
+ 
+
+ ELEMENT foreign key
+
+ 
+
+ constraint
+ Array ELEMENT foreign key
+
+ 
+
+ constraint
+ ELEMENT foreign key
+
+ 
+
+ referential integrity
+
+ 
+
+ Another option you have with foreign keys is to use a
+ referencing column which is an array of elements with
+ the same type (or a compatible one) as the referenced
+ column in the related table. This feature is called
+ array element foreign key and is implemented
+ in PostgreSQL with ELEMENT foreign key constraints,
+ as described in the following example:
+ 
+
+CREATE TABLE drivers (
+driver_id integer PRIMARY KEY,
+first_name text,
+last_name text,
+...
+);
+
+CREATE TABLE races (
+race_id integer PRIMARY KEY,
+title text,
+race_day DATE,
+...
+final_positions integer[] ELEMENT REFERENCES drivers
+);
+
+ 
+ The above example uses an array (final_positions)
+ to store the results of a race: for each of its elements
+ a referential integrity check is enforced on the
+ drivers table.
+ Note that ELEMENT REFERENCES is an extension
+ of PostgreSQL and it is not included in the SQL standard.
+
+ 
+
+ Even though the most common use case for array ELEMENT
+ foreign keys is on a single column key, you can define an array
+ ELEMENT foreign key constraint on a group
+ of columns. As the following example shows, it must be written in table
+ constraint form:
+ 
+
+CREATE TABLE available_moves (
+kind text,
+move text,
+description text,
+PRIMARY KEY (kind, move)
+);
+
+CREATE TABLE paths (
+description text,
+kind text,
+moves text[],
+FOREIGN KEY (kind, ELEMENT moves) REFERENCES available_moves 

[HACKERS] Reducing pg_ctl's reaction time

2017-06-25 Thread Tom Lane
I still have a bee in my bonnet about how slow the recovery TAP tests
are, and especially about how low the CPU usage is while they run,
suggesting that a lot of the wall clock time is being expended on
useless sleeps.  Some analysis I did today found some low-hanging fruit
there: a significant part of the time is being spent in pg_ctl waiting
for postmaster start/stop operations.  It waits in quanta of 1 second,
but the postmaster usually starts or stops in much less than that.
(In these tests, most of the shutdown checkpoints have little to do,
so that in many cases the postmaster stops in just a couple of msec.
Startup isn't very many msec either.)

The attached proposed patch adjusts pg_ctl to check every 100msec,
instead of every second, for the postmaster to be done starting or
stopping.  This cuts the runtime of the recovery TAP tests from around
4m30s to around 3m10s on my machine, a good 30% savings.  I experimented
with different check frequencies but there doesn't seem to be anything
to be gained by cutting the delay further (and presumably, it becomes
counterproductive at some point due to pg_ctl chewing too many cycles).

This change probably doesn't offer much real-world benefit, since few
people start/stop their postmasters often, and shutdown checkpoints are
seldom so cheap on production installations.  Still, it doesn't seem
like it could hurt.  The original choice to use once-per-second checks
was made for hardware a lot slower than what most of us use now; I do
not think it's been revisited since the first implementation of pg_ctl
in 1999 (cf 5b912b089).

Barring objections I'd like to push this soon.

regards, tom lane

diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 82ac62d..1e6cb69 100644
*** a/src/bin/pg_ctl/pg_ctl.c
--- b/src/bin/pg_ctl/pg_ctl.c
*** typedef enum
*** 68,73 
--- 68,75 
  
  #define DEFAULT_WAIT	60
  
+ #define WAITS_PER_SEC	10		/* should divide 100 evenly */
+ 
  static bool do_wait = true;
  static int	wait_seconds = DEFAULT_WAIT;
  static bool wait_seconds_arg = false;
*** test_postmaster_connection(pgpid_t pm_pi
*** 531,537 
  
  	connstr[0] = '\0';
  
! 	for (i = 0; i < wait_seconds; i++)
  	{
  		/* Do we need a connection string? */
  		if (connstr[0] == '\0')
--- 533,539 
  
  	connstr[0] = '\0';
  
! 	for (i = 0; i < wait_seconds * WAITS_PER_SEC; i++)
  	{
  		/* Do we need a connection string? */
  		if (connstr[0] == '\0')
*** test_postmaster_connection(pgpid_t pm_pi
*** 701,724 
  #endif
  
  		/* No response, or startup still in process; wait */
! #ifdef WIN32
! 		if (do_checkpoint)
  		{
! 			/*
! 			 * Increment the wait hint by 6 secs (connection timeout + sleep)
! 			 * We must do this to indicate to the SCM that our startup time is
! 			 * changing, otherwise it'll usually send a stop signal after 20
! 			 * seconds, despite incrementing the checkpoint counter.
! 			 */
! 			status.dwWaitHint += 6000;
! 			status.dwCheckPoint++;
! 			SetServiceStatus(hStatus, (LPSERVICE_STATUS) );
! 		}
! 		else
  #endif
! 			print_msg(".");
  
! 		pg_usleep(100);		/* 1 sec */
  	}
  
  	/* return result of last call to PQping */
--- 703,730 
  #endif
  
  		/* No response, or startup still in process; wait */
! 		if (i % WAITS_PER_SEC == 0)
  		{
! #ifdef WIN32
! 			if (do_checkpoint)
! 			{
! /*
!  * Increment the wait hint by 6 secs (connection timeout +
!  * sleep). We must do this to indicate to the SCM that our
!  * startup time is changing, otherwise it'll usually send a
!  * stop signal after 20 seconds, despite incrementing the
!  * checkpoint counter.
!  */
! status.dwWaitHint += 6000;
! status.dwCheckPoint++;
! SetServiceStatus(hStatus, (LPSERVICE_STATUS) );
! 			}
! 			else
  #endif
! print_msg(".");
! 		}
  
! 		pg_usleep(100 / WAITS_PER_SEC); /* 1/WAITS_PER_SEC sec */
  	}
  
  	/* return result of last call to PQping */
*** do_stop(void)
*** 998,1009 
  
  		print_msg(_("waiting for server to shut down..."));
  
! 		for (cnt = 0; cnt < wait_seconds; cnt++)
  		{
  			if ((pid = get_pgpid(false)) != 0)
  			{
! print_msg(".");
! pg_usleep(100); /* 1 sec */
  			}
  			else
  break;
--- 1004,1016 
  
  		print_msg(_("waiting for server to shut down..."));
  
! 		for (cnt = 0; cnt < wait_seconds * WAITS_PER_SEC; cnt++)
  		{
  			if ((pid = get_pgpid(false)) != 0)
  			{
! if (cnt % WAITS_PER_SEC == 0)
! 	print_msg(".");
! pg_usleep(100 / WAITS_PER_SEC); /* 1/WAITS_PER_SEC sec */
  			}
  			else
  break;
*** do_restart(void)
*** 1088,1099 
  
  		/* always wait for restart */
  
! 		for (cnt = 0; cnt < wait_seconds; cnt++)
  		{
  			if ((pid = get_pgpid(false)) != 0)
  			{
! print_msg(".");
! pg_usleep(100); /* 1 sec */
  			}
  			else
  break;
--- 1095,1107 
  
  		/* always wait for restart */
  
! 		

[HACKERS] Timing-sensitive case in src/test/recovery TAP tests

2017-06-25 Thread Tom Lane
I've been experimenting with a change to pg_ctl, which I'll post
separately, to reduce its reaction time so that it reports success
more quickly after a wait for postmaster start/stop.  I found one
case in "make check-world" that got a failure when I reduced the
reaction time to ~1ms.  That's the very last test in 001_stream_rep.pl,
"cascaded slot xmin reset after startup with hs feedback reset", and
the cause appears to be that it's not allowing any delay time for a
replication slot's state to update after a postmaster restart.

This seems worth fixing independently of any possible code changes,
because it shows that this test could fail on a slow or overloaded
machine.  I couldn't find any instances of such a failure in the
buildfarm archives, but that may have a lot to do with the fact that
owners of slow buildfarm animals are (mostly?) not running this test.

Some experimentation says that the minimum delay needed to make it
work reliably on my workstation is about 100ms, so a simple patch
along the lines of the attached might be good enough.  I find this
approach conceptually dissatisfying, though, since it's still
potentially vulnerable to the failure under sufficient load.
I wonder if there is an easy way to improve that ... maybe convert
to something involving poll_query_until?

regards, tom lane

diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index 266d27c..8d6edd2 100644
*** a/src/test/recovery/t/001_stream_rep.pl
--- b/src/test/recovery/t/001_stream_rep.pl
*** isnt($xmin, '', 'cascaded slot xmin non-
*** 265,270 
--- 265,272 
  # Xmin from a previous run should be cleared on startup.
  $node_standby_2->start;
  
+ sleep(1);  # need some delay before interrogating slot xmin
+ 
  ($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2);
  is($xmin, '',
  	'cascaded slot xmin reset after startup with hs feedback reset');

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench tap tests & minor fixes

2017-06-25 Thread Fabien COELHO



may be this it because of "end_offset + 1" in expr_scanner_chomp_substring ?
Why there is + 1 there?


Thanks for the debug! Here is a v9 which does a rebase after the 
reindentation and fixes the bad offset.


--
Fabien.diff --git a/src/bin/pgbench/exprscan.l b/src/bin/pgbench/exprscan.l
index dc1367b..8255eff 100644
--- a/src/bin/pgbench/exprscan.l
+++ b/src/bin/pgbench/exprscan.l
@@ -360,6 +360,23 @@ expr_scanner_get_substring(PsqlScanState state,
 }
 
 /*
+ * get current expression line without one ending newline
+ */
+char *
+expr_scanner_chomp_substring(PsqlScanState state, int start_offset, int end_offset)
+{
+	const char *p = state->scanbuf;
+	/* chomp eol */
+	if (end_offset > start_offset && p[end_offset] == '\n')
+	{
+		end_offset--;
+		if (end_offset > start_offset && p[end_offset] == '\r')
+			end_offset--;
+	}
+	return expr_scanner_get_substring(state, start_offset, end_offset);
+}
+
+/*
  * Get the line number associated with the given string offset
  * (which must not be past the end of where we've lexed to).
  */
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 4d364a1..6bc6eb6 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -229,7 +229,7 @@ typedef struct SimpleStats
 typedef struct StatsData
 {
 	time_t		start_time;		/* interval start time, for aggregates */
-	int64		cnt;			/* number of transactions */
+	int64		cnt;			/* number of transactions, including skipped */
 	int64		skipped;		/* number of transactions skipped under --rate
  * and --latency-limit */
 	SimpleStats latency;
@@ -329,7 +329,7 @@ typedef struct
 	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
 
 	/* per client collected stats */
-	int64		cnt;			/* transaction count */
+	int64		cnt;			/* client transaction count, for -t */
 	int			ecnt;			/* error count */
 } CState;
 
@@ -2045,7 +2045,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 	if (INSTR_TIME_IS_ZERO(now))
 		INSTR_TIME_SET_CURRENT(now);
 	now_us = INSTR_TIME_GET_MICROSEC(now);
-	while (thread->throttle_trigger < now_us - latency_limit)
+	while (thread->throttle_trigger < now_us - latency_limit &&
+		   (nxacts <= 0 || st->cnt < nxacts)) /* with -t, do not overshoot */
 	{
 		processXactStats(thread, st, , true, agg);
 		/* next rendez-vous */
@@ -2053,6 +2054,12 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 		thread->throttle_trigger += wait;
 		st->txn_scheduled = thread->throttle_trigger;
 	}
+
+	if (nxacts > 0 && st->cnt >= nxacts)
+	{
+		st->state = CSTATE_FINISHED;
+		break;
+	}
 }
 
 st->state = CSTATE_THROTTLE;
@@ -2364,15 +2371,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
  */
 			case CSTATE_END_TX:
 
-/*
- * transaction finished: calculate latency and log the
- * transaction
- */
-if (progress || throttle_delay || latency_limit ||
-	per_script_stats || use_log)
-	processXactStats(thread, st, , false, agg);
-else
-	thread->stats.cnt++;
+/* transaction finished: calculate latency and do log */
+processXactStats(thread, st, , false, agg);
 
 if (is_connect)
 {
@@ -2381,7 +2381,6 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 	INSTR_TIME_SET_ZERO(now);
 }
 
-++st->cnt;
 if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)
 {
 	/* exit success */
@@ -2519,17 +2518,20 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 {
 	double		latency = 0.0,
 lag = 0.0;
+	bool		detailed = progress || throttle_delay || latency_limit ||
+		per_script_stats || use_log;
 
-	if ((!skipped) && INSTR_TIME_IS_ZERO(*now))
-		INSTR_TIME_SET_CURRENT(*now);
-
-	if (!skipped)
+	if (detailed && !skipped)
 	{
+		if (INSTR_TIME_IS_ZERO(*now))
+			INSTR_TIME_SET_CURRENT(*now);
+
 		/* compute latency & lag */
 		latency = INSTR_TIME_GET_MICROSEC(*now) - st->txn_scheduled;
 		lag = INSTR_TIME_GET_MICROSEC(st->txn_begin) - st->txn_scheduled;
 	}
 
+	/* detailed thread stats */
 	if (progress || throttle_delay || latency_limit)
 	{
 		accumStats(>stats, skipped, latency, lag);
@@ -2539,7 +2541,13 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 			thread->latency_late++;
 	}
 	else
+	{
+		/* no detailed stats, just count */
 		thread->stats.cnt++;
+	}
+
+	/* client stat is just counting */
+	st->cnt ++;
 
 	if (use_log)
 		doLog(thread, st, agg, skipped, latency, lag);
@@ -3030,8 +3038,7 @@ process_backslash_command(PsqlScanState sstate, const char *source)
 	PQExpBufferData word_buf;
 	int			word_offset;
 	int			offsets[MAX_ARGS];	/* offsets of argument words */
-	int			start_offset,
-end_offset;
+	int			start_offset;
 	int			lineno;
 	int			j;
 
@@ -3085,13 +3092,10 @@ process_backslash_command(PsqlScanState sstate, const char *source)
 
 		my_command->expr = expr_parse_result;
 
-		/* Get location of the ending newline */
-		end_offset = 

Re: [HACKERS] pgbench tap tests & minor fixes

2017-06-25 Thread Nikolay Shaplov
В письме от 15 июня 2017 21:10:12 Вы написали:
> > As for me, I would do expr_scanner_chomp_substring(PsqlScanState, int,
> > int&); that changes end_offset as desired...
> 
> Why not.
> 
> > And use it instead of end_offset = expr_scanner_offset(sstate) - 1;
> 
> I removed these?
> 
> > The second issue: you are removing all trailing \n and \r. I think you
> > should remove only one \n at the end of the string, and one \r before \n
> > if there was one.
> 
> So chomp one eol.
> 
> Is the attached version better to your test?

I've expected from expr_scanner_chomp_substring to decrement end_offset, so it 
would work more like perl chomp function, but the way you've done is also 
good.

The sad think is that in v7 and v8 TAP tests fails. (in v6 it still works, I 
have local branches for all your patches versions). I did not check it bdefore 
in v7, just read the code. It was my mistake 

-- 
Do code for fun. Can do it for money (Perl & C/C++ ~10h/week)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench tap tests & minor fixes

2017-06-25 Thread Nikolay Shaplov
В письме от 25 июня 2017 19:03:54 пользователь Fabien COELHO написал:
> Hello Nikolay,
> 
> >> Is the attached version better to your test?
> > 
> > I've expected from expr_scanner_chomp_substring to decrement end_offset,
> > so it would work more like perl chomp function, but the way you've done
> > is also good.
> 
> Ok.
> 
> > The sad think is that in v7 and v8 TAP tests fails. (in v6 it still works,
> > I have local branches for all your patches versions). I did not check it
> > bdefore in v7, just read the code. It was my mistake
> 
> Could you be more precise please? Which TAP tests are failing? Could you
> give the log showing the issues encountered?

I am building dev postgres with --enable-cassert

and get a lot of 

'pgbench: exprscan.l:354: expr_scanner_get_substring: Assertion `end_offset <= 
strlen(state->scanbuf)' failed.

may be this it because of "end_offset + 1" in expr_scanner_chomp_substring ?

Why there is + 1 there?

> 
> I did "make check" and "make check-world", both PASS.
> 
> ISTM that manually in pgbench right know with patch v8 I have:
> 
>   sh> make check
>   rm -rf '/home/fabien/DEV/GIT/postgresql'/tmp_install
>   /bin/mkdir -p '/home/fabien/DEV/GIT/postgresql'/tmp_install/log
>   make -C '../../..' DESTDIR='/home/fabien/DEV/GIT/postgresql'/tmp_install
>   install >'/home/fabien/DEV/GIT/postgresql'/tmp_install/log/install.log
> 2>&1 rm -rf /home/fabien/DEV/GIT/postgresql/src/bin/pgbench/tmp_check/log
> cd . && TESTDIR='/home/fabien/DEV/GIT/postgresql/src/bin/pgbench'
> PATH="/home/fabien/DEV/GIT/postgresql/tmp_install/usr/local/pgsql/bin:$PATH
> "
> LD_LIBRARY_PATH="/home/fabien/DEV/GIT/postgresql/tmp_install/usr/local/pgsq
> l/lib" PGPORT='65432'
> PG_REGRESS='/home/fabien/DEV/GIT/postgresql/src/bin/pgbench/../../../src/te
> st/regress/pg_regress' prove -I ../../../src/test/perl/ -I .  t/*.pl
> t/001_pgbench_with_server.pl .. ok
>   t/002_pgbench_no_server.pl  ok
>   All tests successful.
>   Files=2, Tests=360,  6 wallclock secs ( 0.04 usr  0.02 sys +  4.53 cusr
> 0.22 csys =  4.81 CPU) Result: PASS
> 
> Which looks ok.

-- 
Do code for fun. Can do it for money (Perl & C/C++ ~10h/week)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is exec_simple_check_node still doing anything?

2017-06-25 Thread Tom Lane
I wrote:
> Robert Haas  writes:
>> I'm a little mystified by exec_simple_check_node().
>> ...
>> Did that, possibly, remove the last way in which a simple expression
>> could be could become non-simple?  If so, between that and the new
>> hasTargetSRFs test, it might now be impossible for
>> exec_simple_check_node() to fail.

> In fact, I suspect we could get rid of exec_simple_recheck_plan
> altogether.  It could use a bit more study, but the empty-rtable
> check plus the other checks in exec_simple_check_plan (particularly,
> hasAggs, hasWindowFuncs, hasTargetSRFs, hasSubLinks) seem like
> they are enough to guarantee that what comes out of the planner
> will be "simple".

I did some more studying and it definitely seems like
exec_simple_recheck_plan can never fail anymore.  As an experimental
check, converting everything it was testing into Asserts still gets
through check-world.  Accordingly, here's a proposed patch that gets
rid of exec_simple_check_node() and simplifies some of the rest of
the logic.

regards, tom lane

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index c98492b..4eb2dd2 100644
*** a/src/pl/plpgsql/src/pl_exec.c
--- b/src/pl/plpgsql/src/pl_exec.c
*** static void exec_eval_cleanup(PLpgSQL_ex
*** 224,232 
  
  static void exec_prepare_plan(PLpgSQL_execstate *estate,
    PLpgSQL_expr *expr, int cursorOptions);
- static bool exec_simple_check_node(Node *node);
  static void exec_simple_check_plan(PLpgSQL_execstate *estate, PLpgSQL_expr *expr);
! static void exec_simple_recheck_plan(PLpgSQL_expr *expr, CachedPlan *cplan);
  static void exec_check_rw_parameter(PLpgSQL_expr *expr, int target_dno);
  static bool contains_target_param(Node *node, int *target_dno);
  static bool exec_eval_simple_expr(PLpgSQL_execstate *estate,
--- 224,231 
  
  static void exec_prepare_plan(PLpgSQL_execstate *estate,
    PLpgSQL_expr *expr, int cursorOptions);
  static void exec_simple_check_plan(PLpgSQL_execstate *estate, PLpgSQL_expr *expr);
! static void exec_save_simple_expr(PLpgSQL_expr *expr, CachedPlan *cplan);
  static void exec_check_rw_parameter(PLpgSQL_expr *expr, int target_dno);
  static bool contains_target_param(Node *node, int *target_dno);
  static bool exec_eval_simple_expr(PLpgSQL_execstate *estate,
*** loop_exit:
*** 5488,5500 
   * of the tree was aborted by an error: the tree may contain bogus state
   * so we dare not re-use it.
   *
!  * It is possible though unlikely for a simple expression to become non-simple
!  * (consider for example redefining a trivial view).  We must handle that for
!  * correctness; fortunately it's normally inexpensive to call
!  * SPI_plan_get_cached_plan for a simple expression.  We do not consider the
!  * other direction (non-simple expression becoming simple) because we'll still
!  * give correct results if that happens, and it's unlikely to be worth the
!  * cycles to check.
   *
   * Note: if pass-by-reference, the result is in the eval_mcontext.
   * It will be freed when exec_eval_cleanup is done.
--- 5487,5498 
   * of the tree was aborted by an error: the tree may contain bogus state
   * so we dare not re-use it.
   *
!  * It is possible that we'd need to replan a simple expression; for example,
!  * someone might redefine a SQL function that had been inlined into the simple
!  * expression.  That cannot cause a simple expression to become non-simple (or
!  * vice versa), but we do have to handle replacing the expression tree.
!  * Fortunately it's normally inexpensive to call SPI_plan_get_cached_plan for
!  * a simple expression.
   *
   * Note: if pass-by-reference, the result is in the eval_mcontext.
   * It will be freed when exec_eval_cleanup is done.
*** exec_eval_simple_expr(PLpgSQL_execstate 
*** 5543,5561 
  	 */
  	Assert(cplan != NULL);
  
  	if (cplan->generation != expr->expr_simple_generation)
  	{
! 		/* It got replanned ... is it still simple? */
! 		exec_simple_recheck_plan(expr, cplan);
! 		/* better recheck r/w safety, as well */
  		if (expr->rwparam >= 0)
  			exec_check_rw_parameter(expr, expr->rwparam);
- 		if (expr->expr_simple_expr == NULL)
- 		{
- 			/* Oops, release refcount and fail */
- 			ReleaseCachedPlan(cplan, true);
- 			return false;
- 		}
  	}
  
  	/*
--- 5541,5553 
  	 */
  	Assert(cplan != NULL);
  
+ 	/* If it got replanned, update our copy of the simple expression */
  	if (cplan->generation != expr->expr_simple_generation)
  	{
! 		exec_save_simple_expr(expr, cplan);
! 		/* better recheck r/w safety, as it could change due to inlining */
  		if (expr->rwparam >= 0)
  			exec_check_rw_parameter(expr, expr->rwparam);
  	}
  
  	/*
*** exec_eval_simple_expr(PLpgSQL_execstate 
*** 5567,5573 
  	/*
  	 * Prepare the expression for execution, if it's not been done already in
  	 * the current transaction.  (This will be forced to happen if we called
! 	 * 

[HACKERS] CREATE COLLATION definitional questions for ICU

2017-06-25 Thread Tom Lane
Reading over DefineCollation, I wondered:

* Should not the FROM code path copy the old collation's version?
It seems a little bit weird that "cloning" a collation takes the
liberty of installing a new version.

* Also (and this would be a pre-existing bug), why doesn't the FROM
path copy the old collation's encoding?  For example, if you attempted
to clone the "C" encoding, you wouldn't get a true clone but something
that's specific to the current DB's encoding.

* For an ICU collation, should we not insist that collcollate and
collctype be equal?  If not, what does it mean for them to be different?

* Now that it's possible for user-created collations to have encoding -1,
I do not think that the "shadowing" tests in CollationCreate and
IsThereCollationInNamespace are sufficient.  They don't prevent a new
collation with encoding -1 from shadowing an existing encoding-specific
collation that doesn't happen to match the current DB's encoding.
Now, you'd have to work at it for that to cause problems --- say,
create such a situation in template0 and then copy template0 specifying
that other encoding.  But none of that is forbidden.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Walsender timeouts and large transactions

2017-06-25 Thread Petr Jelinek
On 30/05/17 15:44, Petr Jelinek wrote:
> On 30/05/17 11:02, Kyotaro HORIGUCHI wrote:
>>
>> +TimestampTz now = GetCurrentTimestamp();
>>
>> I think it is not recommended to read the current time too
>> frequently, especially within a loop that hates slowness. (I
>> suppose that a loop that can fill up a send queue falls into that
> 
> Yeah that was my main worry for the patch as well, although given that
> the loop does tuple processing it might not be very noticeable.
> 

I realized we actually call GetCurrentTimestamp() there anyway (for the
pq_sendint64). So I just modified the patch to use the now variable
there instead which means there are no additional GetCurrentTimestamp()
calls compared to state before patch now.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
>From c6e9cdb81311371662c580db42a165ab0575d951 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Thu, 25 May 2017 17:35:24 +0200
Subject: [PATCH] Fix walsender timeouts when decoding large transaction

The logical slots have fast code path for sending data in order to not
impose too high per message overhead. The fast path skips checks for
interrupts and timeouts. However the fast path failed to consider the
fact that transaction with large number of changes may take very long to
be processed and sent to the client. This causes walsender to ignore
interrupts for potentially long time but more importantly it will cause
walsender being killed due to timeout at the end of such transaction.

This commit changes the fast path to also check for interrupts and only
allows calling the fast path when last keeplaive check happened less
than half of walsender timeout ago, otherwise the slower code path will
be taken.
---
 src/backend/replication/walsender.c | 36 ++--
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index f845180..278f882 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1160,6 +1160,8 @@ static void
 WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
 bool last_write)
 {
+	TimestampTz	now = GetCurrentTimestamp();
+
 	/* output previously gathered data in a CopyData packet */
 	pq_putmessage_noblock('d', ctx->out->data, ctx->out->len);
 
@@ -1169,23 +1171,28 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
 	 * several releases by streaming physical replication.
 	 */
 	resetStringInfo();
-	pq_sendint64(, GetCurrentTimestamp());
+	pq_sendint64(, now);
 	memcpy(>out->data[1 + sizeof(int64) + sizeof(int64)],
 		   tmpbuf.data, sizeof(int64));
 
-	/* fast path */
-	/* Try to flush pending output to the client */
-	if (pq_flush_if_writable() != 0)
-		WalSndShutdown();
+	/* Try taking fast path unless we get too close to walsender timeout. */
+	if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
+		  wal_sender_timeout / 2))
+	{
+		CHECK_FOR_INTERRUPTS();
 
-	if (!pq_is_send_pending())
-		return;
+		/* Try to flush pending output to the client */
+		if (pq_flush_if_writable() != 0)
+			WalSndShutdown();
+
+		if (!pq_is_send_pending())
+			return;
+	}
 
 	for (;;)
 	{
 		int			wakeEvents;
 		long		sleeptime;
-		TimestampTz now;
 
 		/*
 		 * Emergency bailout if postmaster has died.  This is to avoid the
@@ -1214,18 +1221,16 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
 		if (pq_flush_if_writable() != 0)
 			WalSndShutdown();
 
-		/* If we finished clearing the buffered data, we're done here. */
-		if (!pq_is_send_pending())
-			break;
-
-		now = GetCurrentTimestamp();
-
 		/* die if timeout was reached */
 		WalSndCheckTimeOut(now);
 
 		/* Send keepalive if the time has come */
 		WalSndKeepaliveIfNecessary(now);
 
+		/* If we finished clearing the buffered data, we're done here. */
+		if (!pq_is_send_pending())
+			break;
+
 		sleeptime = WalSndComputeSleeptime(now);
 
 		wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
@@ -1235,6 +1240,9 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
 		WaitLatchOrSocket(MyLatch, wakeEvents,
 		  MyProcPort->sock, sleeptime,
 		  WAIT_EVENT_WAL_SENDER_WRITE_DATA);
+
+		/* Update our idea of current time for the next cycle. */
+		now = GetCurrentTimestamp();
 	}
 
 	/* reactivate latch so WalSndLoop knows to continue */
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication: stuck spinlock at ReplicationSlotRelease

2017-06-25 Thread Petr Jelinek
On 24/06/17 04:50, Tom Lane wrote:
> Peter Eisentraut  writes:
>> Do you want to take a look at move those elog calls around a bit?  That
>> should do it.
> 
> It would be a good idea to have some clarity on *why* that should do it.
> 
> Looking at the original report's log, but without having actually
> reproduced the problem, I guess what is happening is this:
> 
> 1. Subscription worker process (23117) gets a duplicate key conflict while
> trying to apply an update, and in consequence it exits.  (Is that supposed
> to happen?)
> 
> 2. Publication server process (23124) doesn't notice client connection
> loss right away.  By chance, the next thing it tries to send to the client
> is the debug output from LogicalIncreaseRestartDecodingForSlot.  Then it
> detects loss of connection (at 2017-06-21 14:55:12.033) and FATAL's out.
> But since the spinlock stuff has no tracking infrastructure, we don't
> know we are still holding the replication slot mutex.
> 
> 3. Process exit cleanup does know that it's supposed to release the
> replication slot, so it tries to take the mutex spinlock ... again.
> Eventually that times out and we get the "stuck spinlock" panic.
> 
> All correct so far?

Sounds about right.

> So, okay, the proximate cause of the crash is a blatant violation of the
> rule that spinlocks may only be held across straight-line code segments.
> But I'm wondering about the client exit having occurred in the first
> place.  Why is that, and how would one ever recover?  It sure looks
> like this isn't the first subscription worker process that has tried
> and failed to apply the update.  If our attitude towards this situation is
> that it's okay to fork-bomb your server with worker processes continually
> respawning and making no progress, well, I don't think that's good enough.
> 

Well, we don't have conflict detection/handling in PG10 like for example
pglogical does. Even once we'll have that it will not be able to resolve
multiple unique index violations probably (there is no obvious way how
to do that automatically). And we can't really progress when there is an
unresolved constraint violation. To recover one has to either remove the
conflicting row on downstream or remove the transaction from replication
upstream by manually consuming it using
pg_logical_slot_get_binary_changes. Now that's arguably somewhat ugly
interface to do it, we might want to invent nicer interface for that
even for PG10, but it would mean catalog bump so it should be rather
soon if we'll go there.

As for fork-bombing, it should be very slow fork bomb (we rate-limit the
worker starting) but it's not ideal situation I agree with that. I am
open to suggestions what we can do there, if we had some kind of list of
non-recoverable errors we could automatically disable the subscription
on them (although we need to be able to modify the catalog for that
which may not be possible in an unrecoverable error) but it's not clear
to me how to reasonably produce such a list.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE SUBSCRIPTION log noise

2017-06-25 Thread Petr Jelinek
On 21/06/17 23:32, Euler Taveira wrote:
> 2017-06-21 15:14 GMT-03:00 Peter Eisentraut
>  >:
> 
> > It doesn't appear to be contingent on anything other than the content of
> > the command you just gave it.  I don't think we need a NOTICE saying
> > that it did that thing I just told it to do--that should be implicit by
> > the lack of an ERROR.
> 
> I'm appreciative of this complaint.  The point is that we are allocating
> resources on a remote host that should not be forgotten.  I could go
> either way.
> 
> 
> It is documented that subscription can create a replication slot on
> remote host as mentioned in section "Replication Slot Management". If we
> want to maintain the message let's downgrade it to DEBUG1 (as we do with
> "create implicit index for table") but I prefer to rip it out.
> 

I agree with Peter here, the reason why we show this is that there is
implicit action which reserves resources on different server and is not
transactional. If we only did it when explicitly asked for, I would
agree it's not needed but given that it's default behavior I prefer to
inform the user.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-06-25 Thread Petr Jelinek
(was away for a while, got some time now for this again)

On 22/06/17 04:43, Peter Eisentraut wrote:
> The alternative is that we use the LockSharedObject() approach that was
> already alluded to, like in the attached patch.  Both approaches would
> work equally fine AFAICT.

I agree, but I think we need bigger overhaul of the locking/management
in general. So here is patch which does much more changes.

The patch does several important things (in no particular order):
- Split SetSubscriptionRelState into AddSubscriptionRelState and
UpdateSubscriptionRelState for the reasons said upstream, it's cleaner,
there is no half-broken upsert logic and it has proper error checking
for each action.

- Do LockSharedObject in ALTER SUBSCRIPTION, DROP SUBSCRIPTION (this one
is preexisting but mentioning it for context), SetSubscriptionRelState,
AddSubscriptionRelState, and in the logicalrep_worker_launch. This means
we use granular per object locks to deal with concurrency.

- Because of above, the AccessExclusiveLock on pg_subscription is no
longer needed, just normal RowExlusiveLock is used now.

- logicalrep_worker_stop is also simplified due to the proper locking

- There is new interface logicalrep_worker_stop_at_commit which is used
by ALTER SUBSCRIPTION ... REFRESH PUBLICATION and by transactional
variant of DROP SUBSCRIPTION to only kill workers at the end of transaction.

- Locking/reading of subscription info is unified between DROP and ALTER
SUBSCRIPTION commands.

- DROP SUBSCRIPTION will kill all workers associated with subscription,
not just apply.

- The sync worker checks during startup if the relation is still subscribed.

- The sync worker will exit when waiting for apply and apply has shut-down.

- The log messages around workers and removed or disabled subscription
are now more consistent between startup and normal runtime of the worker.

- Some code deduplication and stylistic changes/simplification in
related areas.

- Fixed catcache's IndexScanOK() handling of the subscription catalog.

It's bit bigger patch but solves issues from multiple threads around
handling of ALTER/DROP subscription.

A lot of the locking that I added is normally done transparently by
dependency handling, but subscriptions and subscription relation status
do not use that much as it was deemed to bloat pg_depend needlessly
during the original patch review (it's also probably why this has
slipped through).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From d7038474012769c9c3b50231af76dd7796fe593f Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Sat, 24 Jun 2017 19:38:21 +0200
Subject: [PATCH] Rework subscription worker and relation status handling

---
 src/backend/catalog/pg_subscription.c   | 137 +++--
 src/backend/commands/subscriptioncmds.c |  98 +-
 src/backend/replication/logical/launcher.c  | 293 +++-
 src/backend/replication/logical/tablesync.c |  97 +
 src/backend/replication/logical/worker.c|  23 ++-
 src/backend/utils/cache/catcache.c  |   6 +-
 src/include/catalog/pg_subscription_rel.h   |   6 +-
 src/include/replication/worker_internal.h   |   6 +-
 8 files changed, 367 insertions(+), 299 deletions(-)

diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index c69c461..b643e54 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -28,6 +28,8 @@
 
 #include "nodes/makefuncs.h"
 
+#include "storage/lmgr.h"
+
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
@@ -225,84 +227,101 @@ textarray_to_stringlist(ArrayType *textarray)
 }
 
 /*
- * Set the state of a subscription table.
- *
- * If update_only is true and the record for given table doesn't exist, do
- * nothing.  This can be used to avoid inserting a new record that was deleted
- * by someone else.  Generally, subscription DDL commands should use false,
- * workers should use true.
- *
- * The insert-or-update logic in this function is not concurrency safe so it
- * might raise an error in rare circumstances.  But if we took a stronger lock
- * such as ShareRowExclusiveLock, we would risk more deadlocks.
+ * Add new state record for a subscription table.
  */
 Oid
-SetSubscriptionRelState(Oid subid, Oid relid, char state,
-		XLogRecPtr sublsn, bool update_only)
+AddSubscriptionRelState(Oid subid, Oid relid, char state,
+		XLogRecPtr sublsn)
 {
 	Relation	rel;
 	HeapTuple	tup;
-	Oid			subrelid = InvalidOid;
+	Oid			subrelid;
 	bool		nulls[Natts_pg_subscription_rel];
 	Datum		values[Natts_pg_subscription_rel];
 
+	LockSharedObject(SubscriptionRelationId, subid, 0, AccessShareLock);
+
 	rel = heap_open(SubscriptionRelRelationId, RowExclusiveLock);
 
 	/* Try finding existing mapping. */
 	tup = SearchSysCacheCopy2(SUBSCRIPTIONRELMAP,
 			  ObjectIdGetDatum(relid),
 

Re: [HACKERS] Pluggable storage

2017-06-25 Thread Julien Rouhaud
On 23/06/2017 16:07, Tomas Vondra wrote:
> 
> I think you're probably right - GIN does compress the posting lists by
> exploiting the TID redundancy (that it's page/offset structure), and I
> don't think there are other index AMs doing that.
> 
> But I'm not sure we can simply rely on that - it's possible people will
> try to improve other index types (e.g. by adding similar compression to
> other index types).
That reminds me
https://www.postgresql.org/message-id/55e4051b.7020...@postgrespro.ru
where Anastasia proposed something similar.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Race conditions with WAL sender PID lookups

2017-06-25 Thread Michael Paquier
On Sun, Jun 25, 2017 at 2:11 PM, Alvaro Herrera
 wrote:
> Noah Misch wrote:
>
>> IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is long past due
>> for your status update.  Please reacquaint yourself with the policy on open
>> item ownership[1] and then reply immediately.  If I do not hear from you by
>> 2017-06-25 06:00 UTC, I will transfer this item to release management team
>> ownership without further notice.
>
> I volunteer to own this item.  Next update before Wednesday 28th 19:00 CLT.

Thanks Álvaro.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers