date:20200413

Incremental sorts and EXEC_FLAG_REWIND

2020-04-13 Thread Michael Paquier

Hi,

When initializing an incremental sort node, we have the following as
of ExecInitIncrementalSort():
/*
 * Incremental sort can't be used with either EXEC_FLAG_REWIND,
 * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we only one of many sort
 * batches in the current sort state.
 */
 Assert((eflags & (EXEC_FLAG_BACKWARD |
   EXEC_FLAG_MARK)) == 0);
While I don't quite follow why EXEC_FLAG_REWIND should be allowed here
to begin with (because incremental sorts don't support rescans without
parameter changes, right?), the comment and the assertion are telling
a different story.  And I can see that child nodes of an
IncrementalSort one use a set of eflags where these three are removed:
/*
 * Initialize child nodes.
 *
 * We shield the child node from the need to support REWIND, BACKWARD, or
 * MARK/RESTORE.
 */
 eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);

I can also spot one case in the regression tests where we actually pass
down a REWIND flag (see incremental_sort.sql) when initializing an
IncrementalSort node:
-- We force the planner to choose a plan with incremental sort on the right side
-- of a nested loop join node. That way we trigger the rescan code path.
set local enable_hashjoin = off;
set local enable_mergejoin = off;
set local enable_material = off;
set local enable_sort = off;
explain (costs off) select * from t left join (select * from (select *
from t order by a) v order by a, b) s on s.a = t.a where t.a in (1,
2);
select * from t left join (select * from (select * from t order by a)
v order by a, b) s on s.a = t.a where t.a in (1, 2);

Alexander, Tomas, any thoughts?
--
Michael


signature.asc
Description: PGP signature

index paths and enable_indexscan

2020-04-13 Thread Amit Langote

Hi,

Maybe I am missing something obvious, but is it intentional that
enable_indexscan is checked by cost_index(), that is, *after* creating
an index path?  I was expecting that if enable_indexscan is off, then
no index paths would be generated to begin with, because I thought
they are optional.

-- 

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-04-13 Thread Amit Kapila

On Wed, Apr 8, 2020 at 6:29 AM Tomas Vondra
 wrote:
>
> On Tue, Apr 07, 2020 at 12:17:44PM +0530, Amit Kapila wrote:
> >On Mon, Mar 30, 2020 at 8:58 PM Tomas Vondra
> > wrote:
> >
> >I think having something like we discussed or what you have in the
> >patch won't be sufficient to clean the KnownAssignedXid array. The
> >point is that we won't write a WAL for xid-subxid association for
> >unlogged relations in the "Immediately WAL-log assignments" patch,
> >however, the KnownAssignedXid would have both kinds of Xids as we
> >autofill it with gaps (see RecordKnownAssignedTransactionIds).  I
> >think if my understanding is correct to make it work we might need
> >major surgery in the code or have to maintain KnownAssignedXid array
> >differently.
>
> Hmm, that's a good point. If I understand correctly, the issue is
> that if we create new subxact, write something into an unlogged table,
> and then create new subxact, the XID of the first subxact will be "known
> assigned" but we won't know it's a subxact or to which parent xact it
> belongs (because there will be no WAL records that could encode it).
>

Yeah, there could be multiple such missing subxacts.

> I wonder if there's a simple solution (e.g. when creating the second
> subxact we might notice the xid-subxid assignment was not logged, and
> write some "dummy" WAL record).
>

That WAL record can have multiple xids.

> But I admit it seems a bit ugly.
>

Yeah, I guess it could be tricky as well because while assembling some
WAL record, we need to generate an additional dummy record or might
need to add additional information to the current record being formed.
I think the handling of such WAL records during hot-standby and in
logical decoding could vary.  During logical decoding, currently, we
don't form an association for subtransactions if it doesn't have any
changes (see ReorderBufferCommitChild) and now with this new type of
record, I think we need to ensure that we don't form such association.

I think after quite some changes, tweaks and a lot of testing, we
might be able to remove XLOG_XACT_ASSIGNMENT but I am not sure if it
is worth doing along with this patch.  I think it would have been good
to do this if we are adding any visible overhead with this patch and
or it is easy to do that.  However, none of that seems to be true, so
it might be better to write good comments in the code indicating what
all we need to do to remove XLOG_XACT_ASSIGNMENT so that if we feel it
is important to do in future we can do so.  I am not against spending
effort on this but I don't see the urgency of doing it along with this
patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: pgbench - test whether a variable exists

2020-04-13 Thread Fabien COELHO



Bonjour Michaël,

Hmm.  It seems to me that this stuff needs to be more careful with the 
function handling?  For example, all those cases fail but they directly 
pass down a variable that may not be defined, so shouldn't those results 
be undefined as well instead of failing:



\set g double(:{?no_such_variable})
\set g exp(:{?no_such_variable})
\set g greatest(:{?no_such_variable}, :{?no_such_variable})
\set g int(:{?no_such_variable})


I do not understand: All the above examples are type errors, as ":{?var}" 
is a boolean, that cannot be cast to double, be exponentiated etc. So 
failing just seems appropriate.


Maybe casting boolean to int should be allowed, though, as pg does it.


It seems to me that there could be a point in having the result of any
function to become undefined if using at least one undefined argument
(the point could be made as well that things like greatest just ignore
conditioned variables), so I was surprised to not see the logic more
linked with ENODE_VARIABLE.


Hmmm… :var (ENODE_VARIABLE) replaces de variable by its value, and it 
fails if the variable is not defined, which is the intention, hence the 
point of having the ability to test whether a variable exists, and its new 
expression node type.


It could replace it by NULL when not existing, but ISTM that a script can 
wish to distinguish NULL and undefined, and it departs from SQL which just 
fails on a undefined alias or column or whatever.


If someone wants to go back to having a self propagating NULL, they can 
simply


  \if :{?var}
  \set var NULL
  \endif

Or

  \set var CASE WHEN :{?var} THEN :var ELSE NULL END

to get it.

Having some special UNDEF value looks unattractive, as it would depart for 
SQL even further.


If your intention is to keep this behavior, it should at least be tested 
I guess.


My intention is to keep failing on type errors, but maybe I'm missing 
something of your point.


Please note that this patch will have to wait until v14 opens 
for business for more.


Sure. I put it in the July CF, and I do not expect to it to be processed 
on the first CF it appears in.


--
Fabien.

Re: doc review for v13

2020-04-13 Thread Michael Paquier

On Sun, Apr 12, 2020 at 04:35:45PM -0500, Justin Pryzby wrote:
> Added a few more.
> And rebased on top of dbc60c5593f26dc777a3be032bff4fb4eab1ddd1

Thanks for the patch set, I have applied the most obvious parts (more
or less 1/3) to reduce the load.  Here is a review of the rest.

> @@ -2829,7 +2829,6 @@ 
> show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
>  
>   ExplainPropertyList("Sort Methods Used", methodNames, es);
>  
> - if (groupInfo->maxMemorySpaceUsed > 0)
>   {
>   longavgSpace = 
> groupInfo->totalMemorySpaceUsed / groupInfo->groupCount;
>   const char *spaceTypeName;
> @@ -2846,7 +2845,7 @@ 
> show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
>  
>   ExplainCloseGroup("Sort Spaces", memoryName.data, true, 
> es);
>   }
> - if (groupInfo->maxDiskSpaceUsed > 0)
> +
>   {
>   longavgSpace = 
> groupInfo->totalDiskSpaceUsed / groupInfo->groupCount;
>   const char *spaceTypeName;

If this can be reworked, it seems to me that more cleanup could be
done.

> @@ -987,7 +987,7 @@ ExecInitIncrementalSort(IncrementalSort *node, EState 
> *estate, int eflags)
>  
>   /*
>* Incremental sort can't be used with either EXEC_FLAG_REWIND,
> -  * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we only one of many 
> sort
> +  * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we only  one of 
> many sort
>* batches in the current sort state.
>*/
>   Assert((eflags & (EXEC_FLAG_BACKWARD |

The following is inconsistent with this comment block, and I guess
that "" should be "have":
Assert((eflags & (EXEC_FLAG_BACKWARD |
  EXEC_FLAG_MARK)) == 0);
This is only a doc-related change though, so I'll start a different
thread about that after looking more at it.

> @@ -1153,7 +1153,7 @@ ExecReScanIncrementalSort(IncrementalSortState *node)
>   /*
>* If we've set up either of the sort states yet, we need to reset them.
>* We could end them and null out the pointers, but there's no reason to
> -  * repay the setup cost, and because guard setting up pivot comparator
> +  * repay the setup cost, and because  guard setting up pivot 
> comparator
>* state similarly, doing so might actually cause a leak.
>*/
>   if (node->fullsort_state != NULL)

I don't quite understand this comment either, but it seems to me that
the last part should be a fully-separate sentence, aka "This guards
against..".

> @@ -631,7 +631,7 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
>   /*
>* If the partition's attributes don't match the root relation's, we'll
>* need to make a new attrmap which maps partition attribute numbers to
> -  * remoterel's, instead the original which maps root relation's 
> attribute
> +  * remoterel's, instead of the original which maps root relation's 
> attribute
>* numbers to remoterel's.
>*
>* Note that 'map' which comes from the tuple routing data structure

Okay, this is not really clear to start with.  I think that I would
rewrite that completely as follows:
"If the partition's attributes do not match the root relation's
attributes, we cannot use the original attribute map which maps the
root relation's attributes with remoterel's attributes.  Instead,
build a new attribute map which maps the partition's attributes with
remoterel's attributes."

> +++ b/src/backend/storage/lmgr/proc.c
> @@ -1373,7 +1373,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod 
> lockMethodTable)
>   else
>   LWLockRelease(ProcArrayLock);
>  
> - /* prevent signal from being resent more than once */
> + /* prevent signal from being re-sent more than once */
>   allow_autovacuum_cancel = false;
>   }

Shouldn't that just be "sent more than two times"?


> @@ -1428,11 +1428,11 @@ tuplesort_updatemax(Tuplesortstate *state)
>   }
>  
>   /*
> -  * Sort evicts data to the disk when it didn't manage to fit those data 
> to
> -  * the main memory.  This is why we assume space used on the disk to be
> +  * Sort evicts data to the disk when it didn't manage to fit the data in
> +  * main memory.  This is why we assume space used on the disk to be

Both the original and the suggestion are wrong?  It seems to me that
it should be "this data" instead because it refers to the data evicted
in the first part of the sentence. 

>* more important for tracking resource usage than space used in memory.
> -  * Note that amount of space occupied by some tuple set on the disk 
> might
> -  * be less than amount of space occupied by the same tuple set in the
> +  * Note that amount of spa

Re: Properly mark NULL returns in numeric aggregates

2020-04-13 Thread David Rowley

On Tue, 14 Apr 2020 at 06:14, Tom Lane  wrote:
> Yeah, they're relying exactly on the assumption that nodeAgg is not
> going to try to copy a value declared "internal", and therefore they
> can be loosey-goosey about whether the value pointer is null or not.
> However, if you want to claim that that's wrong, you have to explain
> why it's okay for some other code to be accessing a value that's
> declared "internal".  I'd say that the meaning of that is precisely
> "keepa u hands off".
>
> In the case at hand, the current situation is that we only expect the
> values returned by these combine functions to be read by the associated
> final functions, which are on board with the null-pointer representation
> of an empty result.  Your argument is essentially that it should be
> possible to feed the values to the aggregate's associated serialization
> function as well.  But the core code never does that, so I'm not convinced
> that we should add it to the requirements; we'd be unable to test it.

Casting my mind back to when I originally wrote that code, I attempted
to do so in such a way so that it could one day be used for a 3-stage
aggregation. e.g Parallel Partial Aggregate -> Gather -> Combine
Serial Aggregate on one node, then on some master node a Deserial
Combine Finalize Aggregate.  You're very right that we can't craft
such a plan with today's master  (We didn't even add a supporting enum
for it in AggSplit).   However, it does appear that there are
extensions or forks out there which attempt to use the code in this
way, so it would be good to not leave those people out in the cold
regarding this.

For testing, can't we just have an Assert() in
advance_transition_function that verifies isnull matches the
nullability of the return value for INTERNAL returning transfns? i.e,
the attached

I don't have a test case to hand that could cause this to fail, but it
sounds like Jesse might.

David

assert_internal_transfns_properly_set_isnull.patch
Description: Binary data

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Fabien COELHO



Hello Tom,


Before I spend more time on this, I want to make sure that people
are happy with this line of attack.


+1

I like it this way, because the structure is quite readable, which is the 
point.


My 0.02€:

Maybe column heander "Example Result" should be simply "Result", because 
it is already on the same line as "Example" on its left, and "Example | 
Example Result" looks redundant.


Maybe the signature and description lines could be exchanged: I'm more 
interested and the description first, and the signature just above the 
example would make sense.


I'm wondering whether the function/operator name should be vertically 
centered in its cell? I'd left it left justified.


--
Fabien.

Re: variation of row_number with parallel

2020-04-13 Thread Rajkumar Raghuwanshi

On Tue, Apr 14, 2020 at 9:39 AM Pavel Stehule 
wrote:

>
>
> út 14. 4. 2020 v 5:59 odesílatel Rajkumar Raghuwanshi <
> rajkumar.raghuwan...@enterprisedb.com> napsal:
>
>> Hi,
>>
>> I have observed row_number() is giving different results when query
>> executed in parallel. is this expected w.r.t parallel execution.
>>
>> CREATE TABLE tbl1 (c1 INT) partition by list (c1);
>> CREATE TABLE tbl1_p1 partition of tbl1 FOR VALUES IN (10);
>> CREATE TABLE tbl1_p2 partition of tbl1 FOR VALUES IN (20);
>> CREATE TABLE tbl1_p3 partition of tbl1 FOR VALUES IN (30);
>>
>> CREATE TABLE tbl2 (c1 INT, c2 INT,c3 INT) partition by list (c1);
>> CREATE TABLE tbl2_p1 partition of tbl2 FOR VALUES IN (1);
>> CREATE TABLE tbl2_p2 partition of tbl2 FOR VALUES IN (2);
>> CREATE TABLE tbl2_p3 partition of tbl2 FOR VALUES IN (3);
>> CREATE TABLE tbl2_p4 partition of tbl2 FOR VALUES IN (4);
>> CREATE TABLE tbl2_p5 partition of tbl2 FOR VALUES IN (5);
>>
>> INSERT INTO tbl1 VALUES (10),(20),(30);
>>
>> INSERT INTO tbl2 VALUES
>> (1,100,20),(2,200,10),(3,100,20),(4,100,30),(5,100,10);
>>
>> postgres=# explain select e.c2, row_number() over () from tbl1 d, tbl2 e
>> where d.c1=e.c3;
>>   QUERY PLAN
>>
>>
>> ---
>>  WindowAgg  (cost=1520.35..12287.73 rows=390150 width=12)
>>->  Merge Join  (cost=1520.35..7410.85 rows=390150 width=4)
>>  Merge Cond: (d.c1 = e.c3)
>>  ->  Sort  (cost=638.22..657.35 rows=7650 width=4)
>>Sort Key: d.c1
>>->  Append  (cost=0.00..144.75 rows=7650 width=4)
>>  ->  Seq Scan on tbl1_p1 d_1  (cost=0.00..35.50
>> rows=2550 width=4)
>>  ->  Seq Scan on tbl1_p2 d_2  (cost=0.00..35.50
>> rows=2550 width=4)
>>  ->  Seq Scan on tbl1_p3 d_3  (cost=0.00..35.50
>> rows=2550 width=4)
>>  ->  Sort  (cost=882.13..907.63 rows=10200 width=8)
>>Sort Key: e.c3
>>->  Append  (cost=0.00..203.00 rows=10200 width=8)
>>  ->  Seq Scan on tbl2_p1 e_1  (cost=0.00..30.40
>> rows=2040 width=8)
>>  ->  Seq Scan on tbl2_p2 e_2  (cost=0.00..30.40
>> rows=2040 width=8)
>>  ->  Seq Scan on tbl2_p3 e_3  (cost=0.00..30.40
>> rows=2040 width=8)
>>  ->  Seq Scan on tbl2_p4 e_4  (cost=0.00..30.40
>> rows=2040 width=8)
>>  ->  Seq Scan on tbl2_p5 e_5  (cost=0.00..30.40
>> rows=2040 width=8)
>> (17 rows)
>>
>> postgres=# select e.c2, row_number() over () from tbl1 d, tbl2 e where
>> d.c1=e.c3;
>>  c2  | row_number
>> -+
>>  *200 |  1*
>>  100 |  2
>>  100 |  3
>>  100 |  4
>>  100 |  5
>> (5 rows)
>>
>> postgres=#
>> postgres=# set parallel_setup_cost = 0;
>> SET
>> postgres=# set parallel_tuple_cost = 0;
>> SET
>> postgres=#
>> postgres=# explain select e.c2, row_number() over () from tbl1 d, tbl2 e
>> where d.c1=e.c3;
>>   QUERY PLAN
>>
>>
>> --
>>  WindowAgg  (cost=130.75..7521.21 rows=390150 width=12)
>>->  Gather  (cost=130.75..2644.34 rows=390150 width=4)
>>  Workers Planned: 2
>>  ->  Parallel Hash Join  (cost=130.75..2644.34 rows=162562
>> width=4)
>>Hash Cond: (e.c3 = d.c1)
>>->  Parallel Append  (cost=0.00..131.25 rows=4250 width=8)
>>  ->  Parallel Seq Scan on tbl2_p1 e_1
>>  (cost=0.00..22.00 rows=1200 width=8)
>>  ->  Parallel Seq Scan on tbl2_p2 e_2
>>  (cost=0.00..22.00 rows=1200 width=8)
>>  ->  Parallel Seq Scan on tbl2_p3 e_3
>>  (cost=0.00..22.00 rows=1200 width=8)
>>  ->  Parallel Seq Scan on tbl2_p4 e_4
>>  (cost=0.00..22.00 rows=1200 width=8)
>>  ->  Parallel Seq Scan on tbl2_p5 e_5
>>  (cost=0.00..22.00 rows=1200 width=8)
>>->  Parallel Hash  (cost=90.93..90.93 rows=3186 width=4)
>>  ->  Parallel Append  (cost=0.00..90.93 rows=3186
>> width=4)
>>->  Parallel Seq Scan on tbl1_p1 d_1
>>  (cost=0.00..25.00 rows=1500 width=4)
>>->  Parallel Seq Scan on tbl1_p2 d_2
>>  (cost=0.00..25.00 rows=1500 width=4)
>>->  Parallel Seq Scan on tbl1_p3 d_3
>>  (cost=0.00..25.00 rows=1500 width=4)
>> (16 rows)
>>
>> postgres=# select e.c2, row_number() over () from tbl1 d, tbl2 e where
>> d.c1=e.c3;
>>  c2  | row_number
>> -+
>>  100 |  1
>>  100 |  2
>>  100 |  3
>>  *200 |  4*
>>  100 |  5
>> (5 rows)
>>
>
> there are not ORDER BY clause, so order is not defined - paralel hash join
> surely doesn't ensure a order.
> I think so this behave is expected.
>
 thanks.

Re: variation of row_number with parallel

2020-04-13 Thread Pavel Stehule

út 14. 4. 2020 v 5:59 odesílatel Rajkumar Raghuwanshi <
rajkumar.raghuwan...@enterprisedb.com> napsal:

> Hi,
>
> I have observed row_number() is giving different results when query
> executed in parallel. is this expected w.r.t parallel execution.
>
> CREATE TABLE tbl1 (c1 INT) partition by list (c1);
> CREATE TABLE tbl1_p1 partition of tbl1 FOR VALUES IN (10);
> CREATE TABLE tbl1_p2 partition of tbl1 FOR VALUES IN (20);
> CREATE TABLE tbl1_p3 partition of tbl1 FOR VALUES IN (30);
>
> CREATE TABLE tbl2 (c1 INT, c2 INT,c3 INT) partition by list (c1);
> CREATE TABLE tbl2_p1 partition of tbl2 FOR VALUES IN (1);
> CREATE TABLE tbl2_p2 partition of tbl2 FOR VALUES IN (2);
> CREATE TABLE tbl2_p3 partition of tbl2 FOR VALUES IN (3);
> CREATE TABLE tbl2_p4 partition of tbl2 FOR VALUES IN (4);
> CREATE TABLE tbl2_p5 partition of tbl2 FOR VALUES IN (5);
>
> INSERT INTO tbl1 VALUES (10),(20),(30);
>
> INSERT INTO tbl2 VALUES
> (1,100,20),(2,200,10),(3,100,20),(4,100,30),(5,100,10);
>
> postgres=# explain select e.c2, row_number() over () from tbl1 d, tbl2 e
> where d.c1=e.c3;
>   QUERY PLAN
>
>
> ---
>  WindowAgg  (cost=1520.35..12287.73 rows=390150 width=12)
>->  Merge Join  (cost=1520.35..7410.85 rows=390150 width=4)
>  Merge Cond: (d.c1 = e.c3)
>  ->  Sort  (cost=638.22..657.35 rows=7650 width=4)
>Sort Key: d.c1
>->  Append  (cost=0.00..144.75 rows=7650 width=4)
>  ->  Seq Scan on tbl1_p1 d_1  (cost=0.00..35.50
> rows=2550 width=4)
>  ->  Seq Scan on tbl1_p2 d_2  (cost=0.00..35.50
> rows=2550 width=4)
>  ->  Seq Scan on tbl1_p3 d_3  (cost=0.00..35.50
> rows=2550 width=4)
>  ->  Sort  (cost=882.13..907.63 rows=10200 width=8)
>Sort Key: e.c3
>->  Append  (cost=0.00..203.00 rows=10200 width=8)
>  ->  Seq Scan on tbl2_p1 e_1  (cost=0.00..30.40
> rows=2040 width=8)
>  ->  Seq Scan on tbl2_p2 e_2  (cost=0.00..30.40
> rows=2040 width=8)
>  ->  Seq Scan on tbl2_p3 e_3  (cost=0.00..30.40
> rows=2040 width=8)
>  ->  Seq Scan on tbl2_p4 e_4  (cost=0.00..30.40
> rows=2040 width=8)
>  ->  Seq Scan on tbl2_p5 e_5  (cost=0.00..30.40
> rows=2040 width=8)
> (17 rows)
>
> postgres=# select e.c2, row_number() over () from tbl1 d, tbl2 e where
> d.c1=e.c3;
>  c2  | row_number
> -+
>  *200 |  1*
>  100 |  2
>  100 |  3
>  100 |  4
>  100 |  5
> (5 rows)
>
> postgres=#
> postgres=# set parallel_setup_cost = 0;
> SET
> postgres=# set parallel_tuple_cost = 0;
> SET
> postgres=#
> postgres=# explain select e.c2, row_number() over () from tbl1 d, tbl2 e
> where d.c1=e.c3;
>   QUERY PLAN
>
>
> --
>  WindowAgg  (cost=130.75..7521.21 rows=390150 width=12)
>->  Gather  (cost=130.75..2644.34 rows=390150 width=4)
>  Workers Planned: 2
>  ->  Parallel Hash Join  (cost=130.75..2644.34 rows=162562 width=4)
>Hash Cond: (e.c3 = d.c1)
>->  Parallel Append  (cost=0.00..131.25 rows=4250 width=8)
>  ->  Parallel Seq Scan on tbl2_p1 e_1
>  (cost=0.00..22.00 rows=1200 width=8)
>  ->  Parallel Seq Scan on tbl2_p2 e_2
>  (cost=0.00..22.00 rows=1200 width=8)
>  ->  Parallel Seq Scan on tbl2_p3 e_3
>  (cost=0.00..22.00 rows=1200 width=8)
>  ->  Parallel Seq Scan on tbl2_p4 e_4
>  (cost=0.00..22.00 rows=1200 width=8)
>  ->  Parallel Seq Scan on tbl2_p5 e_5
>  (cost=0.00..22.00 rows=1200 width=8)
>->  Parallel Hash  (cost=90.93..90.93 rows=3186 width=4)
>  ->  Parallel Append  (cost=0.00..90.93 rows=3186
> width=4)
>->  Parallel Seq Scan on tbl1_p1 d_1
>  (cost=0.00..25.00 rows=1500 width=4)
>->  Parallel Seq Scan on tbl1_p2 d_2
>  (cost=0.00..25.00 rows=1500 width=4)
>->  Parallel Seq Scan on tbl1_p3 d_3
>  (cost=0.00..25.00 rows=1500 width=4)
> (16 rows)
>
> postgres=# select e.c2, row_number() over () from tbl1 d, tbl2 e where
> d.c1=e.c3;
>  c2  | row_number
> -+
>  100 |  1
>  100 |  2
>  100 |  3
>  *200 |  4*
>  100 |  5
> (5 rows)
>

there are not ORDER BY clause, so order is not defined - paralel hash join
surely doesn't ensure a order.

I think so this behave is expected.

Regards

Pavel


> Thanks & Regards,
> Rajkumar Raghuwanshi
>

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-13 Thread Masahiko Sawada

On Tue, 14 Apr 2020 at 10:34, Tom Lane  wrote:
>
> Kyotaro Horiguchi  writes:
> > At Sat, 11 Apr 2020 18:30:30 -0400, Tom Lane  wrote in
> >> What I think we should do about this is, essentially, to get rid of
> >> SyncRepGetSyncStandbys.  Instead, let's have each walsender advertise
> >> whether *it* believes that it is a sync standby, based on its last
> >> evaluation of the relevant GUCs.  This would be a bool that it'd
> >> compute and set alongside sync_standby_priority.  (Hm, maybe we'd not
>
> > Mmm..  SyncRepGetStandbyPriority returns the "priority" that a
> > walsender thinks it is at, among synchronous_standby_names.  Then to
> > decide "I am a sync standby" we need to know how many walsenders with
> > higher priority are alive now.  SyncRepGetSyncStandbyPriority does the
> > judgment now and suffers from the inconsistency of priority values.
>
> Yeah.  After looking a bit closer, I think that the current definition
> of sync_standby_priority (that is, as the result of local evaluation
> of SyncRepGetStandbyPriority()) is OK.  The problem is what we're doing
> with it.  I suggest that what we should do in SyncRepGetSyncRecPtr()
> is make one sweep across the WalSnd array, collecting PID,
> sync_standby_priority, *and* the WAL pointers from each valid entry.
> Then examine that data and decide which WAL value we need, without assuming
> that the sync_standby_priority values are necessarily totally consistent.
> But in any case we must examine each entry just once while holding its
> mutex, not go back to it later expecting it to still be the same.

Can we have a similar approach of sync_standby_defined for
sync_standby_priority? That is, checkpionter is responsible for
changing sync_standby_priority of all walsenders when SIGHUP. That
way, all walsenders can see a consistent view of
sync_standby_priority. And when a walsender starts, it sets
sync_standby_priority by itself. The logic to decide who's a sync
standby doesn't change. SyncRepGetSyncRecPtr() gets all walsenders
having higher priority along with their WAL positions.

>
> Another thing that I'm finding interesting is that I now see this is
> not at all new code.  It doesn't look like SyncRepGetSyncStandbysPriority
> has changed much since 2016.  So how come we didn't detect this problem
> long ago?  I searched the buildfarm logs for assertion failures in
> syncrep.c, looking back one year, and here's what I found:
>
>   sysname   |branch |  snapshot   | stage |   
> l
> +---+-+---+---
>  nightjar   | REL_10_STABLE | 2019-08-13 23:04:41 | recoveryCheck | TRAP: 
> FailedAssertion("!(((bool) 0))", File: 
> "/pgbuild/root/REL_10_STABLE/pgsql.build/../pgsql/src/backend/replication/syncrep.c",
>  Line: 940)
>  hoverfly   | REL9_6_STABLE | 2019-11-07 17:19:12 | recoveryCheck | TRAP: 
> FailedAssertion("!(((bool) 0))", File: "syncrep.c", Line: 723)
>  hoverfly   | HEAD  | 2019-11-22 12:15:08 | recoveryCheck | TRAP: 
> FailedAssertion("false", File: "syncrep.c", Line: 951)
>  francolin  | HEAD  | 2020-01-16 23:10:06 | recoveryCheck | TRAP: 
> FailedAssertion("false", File: 
> "/home/andres/build/buildfarm-francolin/HEAD/pgsql.build/../pgsql/src/backend/replication/syncrep.c",
>  Line: 951)
>  hoverfly   | REL_11_STABLE | 2020-02-29 01:34:55 | recoveryCheck | TRAP: 
> FailedAssertion("!(0)", File: "syncrep.c", Line: 946)
>  hoverfly   | REL9_6_STABLE | 2020-03-26 13:51:15 | recoveryCheck | TRAP: 
> FailedAssertion("!(((bool) 0))", File: "syncrep.c", Line: 723)
>  hoverfly   | REL9_6_STABLE | 2020-04-07 21:52:07 | recoveryCheck | TRAP: 
> FailedAssertion("!(((bool) 0))", File: "syncrep.c", Line: 723)
>  curculio   | HEAD  | 2020-04-11 18:30:21 | recoveryCheck | TRAP: 
> FailedAssertion("false", File: "syncrep.c", Line: 951)
>  sidewinder | HEAD  | 2020-04-11 18:45:39 | recoveryCheck | TRAP: 
> FailedAssertion("false", File: "syncrep.c", Line: 951)
>  curculio   | HEAD  | 2020-04-11 20:30:26 | recoveryCheck | TRAP: 
> FailedAssertion("false", File: "syncrep.c", Line: 951)
>  sidewinder | HEAD  | 2020-04-11 21:45:48 | recoveryCheck | TRAP: 
> FailedAssertion("false", File: "syncrep.c", Line: 951)
>  sidewinder | HEAD  | 2020-04-13 10:45:35 | recoveryCheck | TRAP: 
> FailedAssertion("false", File: "syncrep.c", Line: 951)
>  conchuela  | HEAD  | 2020-04-13 16:00:18 | recoveryCheck | TRAP: 
> FailedAssertion("false", File: 
> "/home/pgbf/buildroot/HEAD/pgsql.build/../pgsql/src/backend/replication/syncrep.c",
>  Line: 951)
>  sidewinder | HEAD  | 2020-04-13 18:45:34 | recoveryCheck | TRAP: 
> FailedAssertion("false", File: "syncrep.c", Line: 951)
> (14 rows)
>
> The line numbers vary in the

variation of row_number with parallel

2020-04-13 Thread Rajkumar Raghuwanshi

Hi,

I have observed row_number() is giving different results when query
executed in parallel. is this expected w.r.t parallel execution.

CREATE TABLE tbl1 (c1 INT) partition by list (c1);
CREATE TABLE tbl1_p1 partition of tbl1 FOR VALUES IN (10);
CREATE TABLE tbl1_p2 partition of tbl1 FOR VALUES IN (20);
CREATE TABLE tbl1_p3 partition of tbl1 FOR VALUES IN (30);

CREATE TABLE tbl2 (c1 INT, c2 INT,c3 INT) partition by list (c1);
CREATE TABLE tbl2_p1 partition of tbl2 FOR VALUES IN (1);
CREATE TABLE tbl2_p2 partition of tbl2 FOR VALUES IN (2);
CREATE TABLE tbl2_p3 partition of tbl2 FOR VALUES IN (3);
CREATE TABLE tbl2_p4 partition of tbl2 FOR VALUES IN (4);
CREATE TABLE tbl2_p5 partition of tbl2 FOR VALUES IN (5);

INSERT INTO tbl1 VALUES (10),(20),(30);

INSERT INTO tbl2 VALUES
(1,100,20),(2,200,10),(3,100,20),(4,100,30),(5,100,10);

postgres=# explain select e.c2, row_number() over () from tbl1 d, tbl2 e
where d.c1=e.c3;
  QUERY PLAN

---
 WindowAgg  (cost=1520.35..12287.73 rows=390150 width=12)
   ->  Merge Join  (cost=1520.35..7410.85 rows=390150 width=4)
 Merge Cond: (d.c1 = e.c3)
 ->  Sort  (cost=638.22..657.35 rows=7650 width=4)
   Sort Key: d.c1
   ->  Append  (cost=0.00..144.75 rows=7650 width=4)
 ->  Seq Scan on tbl1_p1 d_1  (cost=0.00..35.50
rows=2550 width=4)
 ->  Seq Scan on tbl1_p2 d_2  (cost=0.00..35.50
rows=2550 width=4)
 ->  Seq Scan on tbl1_p3 d_3  (cost=0.00..35.50
rows=2550 width=4)
 ->  Sort  (cost=882.13..907.63 rows=10200 width=8)
   Sort Key: e.c3
   ->  Append  (cost=0.00..203.00 rows=10200 width=8)
 ->  Seq Scan on tbl2_p1 e_1  (cost=0.00..30.40
rows=2040 width=8)
 ->  Seq Scan on tbl2_p2 e_2  (cost=0.00..30.40
rows=2040 width=8)
 ->  Seq Scan on tbl2_p3 e_3  (cost=0.00..30.40
rows=2040 width=8)
 ->  Seq Scan on tbl2_p4 e_4  (cost=0.00..30.40
rows=2040 width=8)
 ->  Seq Scan on tbl2_p5 e_5  (cost=0.00..30.40
rows=2040 width=8)
(17 rows)

postgres=# select e.c2, row_number() over () from tbl1 d, tbl2 e where
d.c1=e.c3;
 c2  | row_number
-+
 *200 |  1*
 100 |  2
 100 |  3
 100 |  4
 100 |  5
(5 rows)

postgres=#
postgres=# set parallel_setup_cost = 0;
SET
postgres=# set parallel_tuple_cost = 0;
SET
postgres=#
postgres=# explain select e.c2, row_number() over () from tbl1 d, tbl2 e
where d.c1=e.c3;
  QUERY PLAN

--
 WindowAgg  (cost=130.75..7521.21 rows=390150 width=12)
   ->  Gather  (cost=130.75..2644.34 rows=390150 width=4)
 Workers Planned: 2
 ->  Parallel Hash Join  (cost=130.75..2644.34 rows=162562 width=4)
   Hash Cond: (e.c3 = d.c1)
   ->  Parallel Append  (cost=0.00..131.25 rows=4250 width=8)
 ->  Parallel Seq Scan on tbl2_p1 e_1
 (cost=0.00..22.00 rows=1200 width=8)
 ->  Parallel Seq Scan on tbl2_p2 e_2
 (cost=0.00..22.00 rows=1200 width=8)
 ->  Parallel Seq Scan on tbl2_p3 e_3
 (cost=0.00..22.00 rows=1200 width=8)
 ->  Parallel Seq Scan on tbl2_p4 e_4
 (cost=0.00..22.00 rows=1200 width=8)
 ->  Parallel Seq Scan on tbl2_p5 e_5
 (cost=0.00..22.00 rows=1200 width=8)
   ->  Parallel Hash  (cost=90.93..90.93 rows=3186 width=4)
 ->  Parallel Append  (cost=0.00..90.93 rows=3186
width=4)
   ->  Parallel Seq Scan on tbl1_p1 d_1
 (cost=0.00..25.00 rows=1500 width=4)
   ->  Parallel Seq Scan on tbl1_p2 d_2
 (cost=0.00..25.00 rows=1500 width=4)
   ->  Parallel Seq Scan on tbl1_p3 d_3
 (cost=0.00..25.00 rows=1500 width=4)
(16 rows)

postgres=# select e.c2, row_number() over () from tbl1 d, tbl2 e where
d.c1=e.c3;
 c2  | row_number
-+
 100 |  1
 100 |  2
 100 |  3
 *200 |  4*
 100 |  5
(5 rows)

Thanks & Regards,
Rajkumar Raghuwanshi

Re: WAL usage calculation patch

2020-04-13 Thread Amit Kapila

On Wed, Apr 8, 2020 at 8:36 AM Amit Kapila  wrote:
>
> On Tue, Apr 7, 2020 at 3:30 PM Peter Eisentraut
>  wrote:
> >
> >
> > We also have existing cases for the other way:
> >
> >  actual time=0.050..0.052
> >  Buffers: shared hit=3 dirtied=1
> >
>
> Buffers case is not the same because 'shared' is used for 'hit',
> 'read', 'dirtied', etc.  However, I think it is arguable.
>
> > The cases mentioned by Justin are not formatted in a key=value format,
> > so it's not quite the same, but it also raises the question why they are
> > not.
> >
> > Let's figure out a way to consolidate this without making up a third format.
> >
>
> Sure, I think my intention is to keep the format of WAL stats as close
> to Buffers stats as possible because both depict I/O and users would
> probably be interested to check/read both together.  There is a point
> to keep things in a format so that it is easier for someone to parse
> but I guess as these as fixed 'words', it shouldn't be difficult
> either way and we should give more weightage to consistency.  Any
> suggestions?
>

Peter E, others, any suggestions on how to move forward?  I think here
we should follow the rule "follow the style of nearby code" which in
this case would be to have one space after each field as we would like
it to be closer to the "Buffers" format.  It would be good if we have
a unified format among all Explain stuff but we might not want to
change the existing things and even if we want to do that it might be
a broader/bigger change and we should do that as a PG14 change.  What
do you think?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-13 Thread Amit Kapila

On Tue, Apr 14, 2020 at 7:52 AM Michael Paquier  wrote:
>
> Makes sense.  I have two comments.
>
>  ereport(ERROR,
>  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> - errmsg("cannot specify both FULL and PARALLEL options")));
> + errmsg("VACUUM FULL cannot be performed in parallel")));
> Better to avoid a full sentence here [1]?  This should be a "cannot do
> foo" errror.
>

I could see similar error messages in other places like below:
CONCURRENTLY cannot be used when the materialized view is not populated
CONCURRENTLY and WITH NO DATA options cannot be used together
COPY delimiter cannot be newline or carriage return

Also, I am not sure if it violates the style we have used in code.  It
seems the error message is short, succinct and conveys the required
information.

> -VACUUM (PARALLEL 1) tmp; -- disables parallel vacuum option
> +VACUUM (PARALLEL 1) tmp; -- parallel vacuum disabled for temp tables
>  WARNING:  disabling parallel option of vacuum on "tmp" --- cannot  vacuum 
> temporary tables in parallel
> +VACUUM (PARALLEL 0, FULL TRUE) tmp; -- can specify parallel disabled (even 
> though that's implied by FULL)
>
> To fully close the gap in the tests, I would also add a test for
> (PARALLEL 1, FULL false) where FULL directly specified, even if that
> sounds like a nit.  That's fine to test even on a temporary table.
>

Okay, I will do this once we agree on the error message stuff.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: doc review for parallel vacuum

2020-04-13 Thread Amit Kapila

On Tue, Apr 14, 2020 at 2:54 AM Justin Pryzby  wrote:
>
> Looks good.  One more change:
>
> [-Even if-]{+If+} this option is specified with the ANALYZE 
> [-option-]{+option,+}
>
> Remove "even" and add comma.
>

Pushed after making this change.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Corruption during WAL replay

2020-04-13 Thread Kyotaro Horiguchi

At Mon, 13 Apr 2020 18:53:26 +0900, Masahiko Sawada 
 wrote in 
> On Mon, 13 Apr 2020 at 17:40, Andres Freund  wrote:
> >
> > Hi,
> >
> > On 2020-04-13 15:24:55 +0900, Masahiko Sawada wrote:
> > > On Sat, 11 Apr 2020 at 09:00, Teja Mupparti  wrote:
> > > /*
> > >  * We WAL-log the truncation before actually truncating, which means
> > >  * trouble if the truncation fails. If we then crash, the WAL replay
> > >  * likely isn't going to succeed in the truncation either, and cause a
> > >  * PANIC. It's tempting to put a critical section here, but that cure
> > >  * would be worse than the disease. It would turn a usually harmless
> > >  * failure to truncate, that might spell trouble at WAL replay, into a
> > >  * certain PANIC.
> > >  */
> >
> > Yea, but that reasoning is just plain *wrong*. It's *never* ok to WAL
> > log something and then not perform the action. This leads to to primary
> > / replica getting out of sync, crash recovery potentially not completing
> > (because of records referencing the should-be-truncated pages), ...

It is introduced in 2008 by 3396000684, for 8.4.  So it can be said as
an overlook when introducing log-shipping.

The reason other operations like INSERTs (that extends the underlying
file) are "safe" after an extension failure is the following
operations are performed in shared buffers as if the new page exists,
then tries to extend the file again.  So if we continue working after
truncation failure, we need to disguise on shared buffers as if the
truncated pages are gone.  But we don't have a room for another flag
in buffer header.  For example, BM_DIRTY && !BM_VALID might be able to
be used as the state that the page should have been truncated but not
succeeded yet, but I'm not sure.

Anyway, I think the prognosis of a truncation failure is far hopeless
than extension failure in most cases and I doubt that it's good to
introduce such a complex feature only to overcome such a hopeless
situation.

In short, I think we should PANIC in that case.

> > > As a second idea, I wonder if we can defer truncation until commit
> > > time like smgrDoPendingDeletes mechanism. The sequence would be:
> >
> > This is mostly an issue during [auto]vacuum partially truncating the end
> > of the file. We intentionally release the AEL regularly to allow other
> > accesses to continue.
> >
> > For transactional truncations we don't go down this path (as we create a
> > new relfilenode).
> >
> >
> > > At RelationTruncate(),
> > > 1. WAL logging.
> > > 2. Remember buffers to be dropped.
> >
> > You definitely cannot do that, as explained above.
> 
> Ah yes, you're right.
> 
> So it seems to me currently what we can do for this issue would be to
> enclose the truncation operation in a critical section. IIUC it's not
> enough just to reverse the order of dropping buffers and physical file
> truncation because it cannot solve the problem of inconsistency on the
> standby. And as Horiguchi-san mentioned, there is no need to reverse
> that order if we envelop the truncation operation by a critical
> section because we can recover page changes during crash recovery. The
> strategy of writing out all dirty buffers before dropping buffers,
> proposed as (a) in [1], also seems not enough.

Agreed.  Since it's not acceptable ether WAL-logging->not-performing
nor performing->WAL-logging, there's no other way than working as if
truncation is succeeded (and try again) even if it actually
failed. But it would be too complex.

Just making it a critical section seems the right thing here.

> [1] 
> https://www.postgresql.org/message-id/20191207001232.klidxnm756wqxvwx%40alap3.anarazel.de
> Doing sync before truncation

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Should program exit, When close() failed for O_RDONLY mode

2020-04-13 Thread Lin, Cuiping

Hi,
 
 I find that most of the code does not check the return value of close(),  When 
open a file for reading(O_RDONLY).

 But I find that it checks the return value of close() in code 
"src/bin/pg_rewind/copy_fetch.c" When open a file for reading(O_RDONLY).
 And it will call pg_fatal to cause premature exit. 

 I think that when closing a read-only file fails, it shouid not exit  the 
program early.It  should ensure that the program execution is completed.
 Like below:

・src/bin/pg_rewind/copy_fetch.c

before
--
rewind_copy_file_range
{
...
if (close(srcfd) != 0)
pg_fatal("could not close file \"%s\": %m", srcpath); }
--

after
--
rewind_copy_file_range
{
...
close(srcfd);
}
-- 
 
Regards,
--
Lin

Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-13 Thread Michael Paquier

On Mon, Apr 13, 2020 at 05:55:43PM +0530, Amit Kapila wrote:
> On Mon, Apr 13, 2020 at 4:23 PM Masahiko Sawada
>  wrote:
> I am not very sure about this. I don't think the current text is wrong
> especially when you see the value we can specify there is described
> as: "Specifies a non-negative integer value passed to the selected
> option.".  However, we can consider changing it if others also think
> the proposed text or something like that is better than current text.

FWIW, the current formulation in the docs looked fine to me.

> Yeah, something on these lines would be a good idea. Note that, we are
> already planning to slightly change this particular sentence in
> another patch [1].
> 
> [1] - 
> https://www.postgresql.org/message-id/20200322021801.GB2563%40telsasoft.com

Makes sense.  I have two comments.

 ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("cannot specify both FULL and PARALLEL options")));
+ errmsg("VACUUM FULL cannot be performed in parallel")));
Better to avoid a full sentence here [1]?  This should be a "cannot do
foo" errror. 

-VACUUM (PARALLEL 1) tmp; -- disables parallel vacuum option
+VACUUM (PARALLEL 1) tmp; -- parallel vacuum disabled for temp tables
 WARNING:  disabling parallel option of vacuum on "tmp" --- cannot  vacuum 
temporary tables in parallel
+VACUUM (PARALLEL 0, FULL TRUE) tmp; -- can specify parallel disabled (even 
though that's implied by FULL)

To fully close the gap in the tests, I would also add a test for
(PARALLEL 1, FULL false) where FULL directly specified, even if that
sounds like a nit.  That's fine to test even on a temporary table.

[1]: https://www.postgresql.org/docs/devel/error-style-guide.html
--
Michael


signature.asc
Description: PGP signature

Re: relcache leak warnings vs. errors

2020-04-13 Thread Michael Paquier

On Mon, Apr 13, 2020 at 04:22:26PM -0400, Tom Lane wrote:
> Andres Freund  writes:
>> I'd much rather see this throw an assertion than the current
>> behaviour. But I'm wondering if there's a chance we can throw an error
>> in non-assert builds without adding too much complexity to the error
>> paths. Could we perhaps throw the error a bit later during the commit
>> processing?
> 
> Any error post-commit is a semantic disaster.

Yes, I can immediately think of two problems in the very recent
history where this has bitten.

> I guess that an assertion wouldn't be so awful, if people would rather
> do it like that in debug builds.

WARNING is useful mainly for tests where the output is checked, like
the main regression test suite.  Now that TAP scenarios get more and
more complex, +1 on the addition of an assertion for a hard failure.
I don't think either that's worth controlling with a developer GUC.
--
Michael


signature.asc
Description: PGP signature

Re: weird hash plan cost, starting with pg10

2020-04-13 Thread Richard Guo

On Mon, Apr 13, 2020 at 9:53 PM Tom Lane  wrote:

> Richard Guo  writes:
> > At first I was wondering if we need to check whether HashState.hashtable
> > is not NULL in ExecShutdownHash() before we decide to allocate save
> > space for HashState.hinstrument. And then I convinced myself that that's
> > not necessary since HashState.hinstrument and HashState.hashtable cannot
> > be both NULL there.
>
> Even if the hashtable is null at that point, creating an all-zeroes
> hinstrument struct is harmless.
>

Correct. The only benefit we may get from checking if the hashtable is
null is to avoid an unnecessary palloc0 for hinstrument. But that case
cannot happen though.

Thanks
Richard

Re: 001_rep_changes.pl stalls

2020-04-13 Thread Tom Lane

Noah Misch  writes:
> This seems to have made the following race condition easier to hit:
> https://www.postgresql.org/message-id/flat/20200206074552.GB3326097%40rfd.leadboat.com
> https://www.postgresql.org/message-id/flat/21519.1585272409%40sss.pgh.pa.us

Yeah, I just came to the same guess in the other thread.

> While I don't think that indicates anything wrong with the fix for $SUBJECT,
> creating more buildfarm noise is itself bad.  I am inclined to revert the fix
> after a week.  Not immediately, in case it uncovers lower-probability bugs.
> I'd then re-commit it after one of those threads fixes the other bug.  Would
> anyone like to argue for a revert earlier, later, or not at all?

I don't think you should revert.  Those failures are (just) often enough
to be annoying but I do not think that a proper fix is very far away.

regards, tom lane

Re: 001_rep_changes.pl stalls

2020-04-13 Thread Noah Misch

On Sun, Apr 05, 2020 at 11:36:49PM -0700, Noah Misch wrote:
> Executive summary: the "MyWalSnd->write < sentPtr" in WalSndWaitForWal() is
> important for promptly updating pg_stat_replication.  When caught up, we
> should impose that logic before every sleep.  The one-line fix is to sleep in
> WalSndLoop() only when pq_is_send_pending(), not when caught up.

This seems to have made the following race condition easier to hit:
https://www.postgresql.org/message-id/flat/20200206074552.GB3326097%40rfd.leadboat.com
https://www.postgresql.org/message-id/flat/21519.1585272409%40sss.pgh.pa.us

Now it happened eight times in three days, all on BSD machines:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio&dt=2020-04-11%2018%3A30%3A21
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2020-04-11%2018%3A45%3A39
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio&dt=2020-04-11%2020%3A30%3A26
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2020-04-11%2021%3A45%3A48
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2020-04-13%2010%3A45%3A35
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2020-04-13%2016%3A00%3A18
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2020-04-13%2018%3A45%3A34
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2020-04-13%2023%3A45%3A22

While I don't think that indicates anything wrong with the fix for $SUBJECT,
creating more buildfarm noise is itself bad.  I am inclined to revert the fix
after a week.  Not immediately, in case it uncovers lower-probability bugs.
I'd then re-commit it after one of those threads fixes the other bug.  Would
anyone like to argue for a revert earlier, later, or not at all?


There was a novel buildfarm failure, in 010_logical_decoding_timelines.pl:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&dt=2020-04-13%2008%3A35%3A05
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&dt=2020-04-13%2017%3A15%3A01

Most-relevant lines of the test script:

$node_master->safe_psql('postgres',
"INSERT INTO decoding(blah) VALUES ('afterbb');");
$node_master->safe_psql('postgres', 'CHECKPOINT');
$node_master->stop('immediate');

The failure suggested the INSERT was not replicated before the immediate stop.
I can reproduce that consistently, before or after the fix for $SUBJECT, by
modifying walsender to delay 0.2s before sending WAL:

--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -65,2 +65,3 @@
 #include "libpq/pqformat.h"
+#include "libpq/pqsignal.h"
 #include "miscadmin.h"
@@ -2781,2 +2782,5 @@ retry:
 
+   PG_SETMASK(&BlockSig);
+   pg_usleep(200 * 1000);
+   PG_SETMASK(&UnBlockSig);
pq_putmessage_noblock('d', output_message.data, output_message.len);

I will shortly push a fix adding a wait_for_catchup to the test.  I don't know
if/how fixing $SUBJECT made this 010_logical_decoding_timelines.pl race
condition easier to hit.

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-13 Thread Tom Lane

Kyotaro Horiguchi  writes:
> At Sat, 11 Apr 2020 18:30:30 -0400, Tom Lane  wrote in 
>> What I think we should do about this is, essentially, to get rid of
>> SyncRepGetSyncStandbys.  Instead, let's have each walsender advertise
>> whether *it* believes that it is a sync standby, based on its last
>> evaluation of the relevant GUCs.  This would be a bool that it'd
>> compute and set alongside sync_standby_priority.  (Hm, maybe we'd not

> Mmm..  SyncRepGetStandbyPriority returns the "priority" that a
> walsender thinks it is at, among synchronous_standby_names.  Then to
> decide "I am a sync standby" we need to know how many walsenders with
> higher priority are alive now.  SyncRepGetSyncStandbyPriority does the
> judgment now and suffers from the inconsistency of priority values.

Yeah.  After looking a bit closer, I think that the current definition
of sync_standby_priority (that is, as the result of local evaluation
of SyncRepGetStandbyPriority()) is OK.  The problem is what we're doing
with it.  I suggest that what we should do in SyncRepGetSyncRecPtr()
is make one sweep across the WalSnd array, collecting PID,
sync_standby_priority, *and* the WAL pointers from each valid entry.
Then examine that data and decide which WAL value we need, without assuming
that the sync_standby_priority values are necessarily totally consistent.
But in any case we must examine each entry just once while holding its
mutex, not go back to it later expecting it to still be the same.

Another thing that I'm finding interesting is that I now see this is
not at all new code.  It doesn't look like SyncRepGetSyncStandbysPriority
has changed much since 2016.  So how come we didn't detect this problem
long ago?  I searched the buildfarm logs for assertion failures in
syncrep.c, looking back one year, and here's what I found:

  sysname   |branch |  snapshot   | stage | 
  l 

+---+-+---+---
 nightjar   | REL_10_STABLE | 2019-08-13 23:04:41 | recoveryCheck | TRAP: 
FailedAssertion("!(((bool) 0))", File: 
"/pgbuild/root/REL_10_STABLE/pgsql.build/../pgsql/src/backend/replication/syncrep.c",
 Line: 940)
 hoverfly   | REL9_6_STABLE | 2019-11-07 17:19:12 | recoveryCheck | TRAP: 
FailedAssertion("!(((bool) 0))", File: "syncrep.c", Line: 723)
 hoverfly   | HEAD  | 2019-11-22 12:15:08 | recoveryCheck | TRAP: 
FailedAssertion("false", File: "syncrep.c", Line: 951)
 francolin  | HEAD  | 2020-01-16 23:10:06 | recoveryCheck | TRAP: 
FailedAssertion("false", File: 
"/home/andres/build/buildfarm-francolin/HEAD/pgsql.build/../pgsql/src/backend/replication/syncrep.c",
 Line: 951)
 hoverfly   | REL_11_STABLE | 2020-02-29 01:34:55 | recoveryCheck | TRAP: 
FailedAssertion("!(0)", File: "syncrep.c", Line: 946)
 hoverfly   | REL9_6_STABLE | 2020-03-26 13:51:15 | recoveryCheck | TRAP: 
FailedAssertion("!(((bool) 0))", File: "syncrep.c", Line: 723)
 hoverfly   | REL9_6_STABLE | 2020-04-07 21:52:07 | recoveryCheck | TRAP: 
FailedAssertion("!(((bool) 0))", File: "syncrep.c", Line: 723)
 curculio   | HEAD  | 2020-04-11 18:30:21 | recoveryCheck | TRAP: 
FailedAssertion("false", File: "syncrep.c", Line: 951)
 sidewinder | HEAD  | 2020-04-11 18:45:39 | recoveryCheck | TRAP: 
FailedAssertion("false", File: "syncrep.c", Line: 951)
 curculio   | HEAD  | 2020-04-11 20:30:26 | recoveryCheck | TRAP: 
FailedAssertion("false", File: "syncrep.c", Line: 951)
 sidewinder | HEAD  | 2020-04-11 21:45:48 | recoveryCheck | TRAP: 
FailedAssertion("false", File: "syncrep.c", Line: 951)
 sidewinder | HEAD  | 2020-04-13 10:45:35 | recoveryCheck | TRAP: 
FailedAssertion("false", File: "syncrep.c", Line: 951)
 conchuela  | HEAD  | 2020-04-13 16:00:18 | recoveryCheck | TRAP: 
FailedAssertion("false", File: 
"/home/pgbf/buildroot/HEAD/pgsql.build/../pgsql/src/backend/replication/syncrep.c",
 Line: 951)
 sidewinder | HEAD  | 2020-04-13 18:45:34 | recoveryCheck | TRAP: 
FailedAssertion("false", File: "syncrep.c", Line: 951)
(14 rows)

The line numbers vary in the back branches, but all of these crashes are
at that same Assert.  So (a) yes, this does happen in the back branches,
but (b) some fairly recent change has made it a whole lot more probable.
Neither syncrep.c nor 007_sync_rep.pl have changed much in some time,
so whatever the change was was indirect.  Curious.  Is it just timing?

I'm giving the side-eye to Noah's recent changes 328c70997 and 421685812,
but this isn't enough evidence to say definitely that that's what boosted
the failure rate.

regards, tom lane

Re: wrong relkind error messages

2020-04-13 Thread Michael Paquier

On Mon, Apr 13, 2020 at 11:13:15AM -0400, Tom Lane wrote:
> Fixing this while avoiding your concern about proliferation of messages
> seems a bit difficult though.  The best I can do after a couple minutes'
> thought is
> 
> ERROR:  cannot define statistics for relation "ti"
> DETAIL:  "ti" is an index, and this operation is not supported for that
> kind of relation.
> 
> which seems a little long and awkward.  Another idea is
> 
> ERROR:  cannot define statistics for relation "ti"
> DETAIL:  This operation is not supported for indexes.
> 
> which still leaves implicit that "ti" is an index, but probably that's
> something the user can figure out.
> 
> Maybe someone else can do better?

"This operation is not supported for put_relkind_here \"%s\"."?  I
think that it is better to provide a relation name in the error
message (even optionally a namespace).  That's less to guess for the
user.

+int
+errdetail_relkind(const char *relname, char relkind)
+{
+   switch (relkind)
+   {
+   case RELKIND_RELATION:
+   return errdetail("\"%s\" is a table.", relname);
+   case RELKIND_INDEX:
It seems to me that we should optionally add the namespace in the
error message, or just have a separate routine for that.  I think that
it would be useful in some cases (see for example the part about the
statistics in the patch), still annoying in some others (instability
in test output for temporary schemas for example) so there is a point
for both in my view.

-   if (rel->rd_rel->relkind != RELKIND_VIEW &&
-   rel->rd_rel->relkind != RELKIND_COMPOSITE_TYPE &&
-   rel->rd_rel->relkind != RELKIND_FOREIGN_TABLE &&
-   rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-   {
+   if (RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
RelationDropStorage(rel);
These should be applied separately in my opinion.  Nice catch.

- errmsg("\"%s\" is not a table, view, materialized view, 
sequence, or foreign table",
-rv->relname)));
+ errmsg("cannot change schema of relation \"%s\"",
+rv->relname),
+ (relkind == RELKIND_INDEX || relkind == 
RELKIND_PARTITIONED_INDEX ? errhint("Change the schema of the table instead.") :
+  (relkind == RELKIND_COMPOSITE_TYPE ? errhint("Use ALTER TYPE 
instead.") : 0;

This is not great style either and reduces readability, so I would
recommend to split the errhint generation using a switch/case.

+   ereport(ERROR,
+   (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+errmsg("action cannot be performed on relation
\"%s\"",
+   RelationGetRelationName(rel)),
Echoing Robert upthread, "action" is not really useful for the user,
and it seems to me that it should be reworked as "cannot perform foo
on relation \"hoge\""

+errmsg("relation \"%s\" does not support comments",
+   RelationGetRelationName(relation)),
This is not project-style as full sentences cannot be used in error
messages, no?  The former is not that good either, still, while this
is getting touched...  Say, "cannot use COMMENT on relation \"%s\""?

Overall +1 on the patch by the way.  Thanks for sending something to
improve the situation
--
Michael


signature.asc
Description: PGP signature

Re: [patch] some PQExpBuffer are not destroyed in pg_dump

2020-04-13 Thread Michael Paquier

On Mon, Apr 13, 2020 at 04:51:06PM +0900, Masahiko Sawada wrote:
> On Tue, 7 Apr 2020 at 11:42, Zhang, Jie  wrote:
>> In getDefaultACLs function, some PQExpBuffer are not destroy
> 
> Yes, it looks like an oversight. It's related to the commit
> e2090d9d20d809 which is back-patched to 9.6.
> 
> The patch looks good to me.

Indeed.  Any code path of pg_dump calling buildACLQueries() clears up
things, and I think that it is a better practice to clean up properly
PQExpBuffer stuff even if there is always the argument that pg_dump
is a tool running in a "short"-term context.  So I will backpatch that
unless there are any objections from others.

The part I am actually rather amazed of here is that I don't recall
seeing Coverity complaining about leaks after this commit.  Perhaps it
just got lost.
--
Michael

signature.asc
Description: PGP signature

Re: pgbench - test whether a variable exists

2020-04-13 Thread Michael Paquier

On Mon, Apr 13, 2020 at 09:54:01AM +0200, Fabien COELHO wrote:
> Attached a v4. I'm resurrecting this small patch, after "\aset" has been
> added to pgbench (9d8ef988).

Hmm.  It seems to me that this stuff needs to be more careful with the
function handling?  For example, all those cases fail but they
directly pass down a variable that may not be defined, so shouldn't 
those results be undefined as well instead of failing:
\set g double(:{?no_such_variable})
\set g exp(:{?no_such_variable})
\set g greatest(:{?no_such_variable}, :{?no_such_variable})
\set g int(:{?no_such_variable})

It seems to me that there could be a point in having the result of any
function to become undefined if using at least one undefined argument
(the point could be made as well that things like greatest just ignore
conditioned variables), so I was surprised to not see the logic more
linked with ENODE_VARIABLE.  If your intention is to keep this
behavior, it should at least be tested I guess.  Please note that this
patch will have to wait until v14 opens for business for more
comments.  :p
--
Michael

signature.asc
Description: PGP signature

Re: pg_basebackup, manifests and backends older than ~12

2020-04-13 Thread Michael Paquier

On Mon, Apr 13, 2020 at 07:55:07PM -0400, Robert Haas wrote:
> Oh, hmm. Maybe I'm getting confused with a previous version of the
> patch that behaved differently.

No problem.  If you prefer keeping this part of the code, that's fine
by me.  If you think that the patch is suited as-is, including
silencing the error forcing to use --no-manifest on server versions
older than v13, I am fine to help out and apply it myself, but I am
also fine if you wish to take care of it by yourself.
--
Michael


signature.asc
Description: PGP signature

Re: pg_basebackup, manifests and backends older than ~12

2020-04-13 Thread Robert Haas

On Mon, Apr 13, 2020 at 6:26 PM Michael Paquier  wrote:
> Well, the documentation tells me that as of protocol.sgml:
> "For compatibility with previous releases, the default is
> MANIFEST 'no'."
>
> The code also tells me that, in line with the docs:
> static void
> parse_basebackup_options(List *options, basebackup_options *opt)
> [...]
> MemSet(opt, 0, sizeof(*opt));
> opt->manifest = MANIFEST_OPTION_NO;
>
> And there is also a TAP test for that when passing down --no-manifest,
> which should not create a backup manifest:
> $node->command_ok(
> [
> 'pg_basebackup', '-D', "$tempdir/backup2", '--no-manifest',
> '--waldir', "$tempdir/xlog2"
> ],
>
> So, it seems to me that it is fine to remove this block, as when
> --no-manifest is used, then "manifest" gets set to false, and then it
> does not matter if the MANIFEST clause is added or not as we'd just
> rely on the default.  Keeping the block would matter if you want to
> make the code more robust to a change of the default value in the
> BASE_BACKUP query though, and its logic is not incorrect either.  So,
> if you wish to keep it, that's fine by me, but it looks cleaner to me
> to remove it and more consistent with the other options like MAX_RATE,
> TABLESPACE_MAP, etc.

Oh, hmm. Maybe I'm getting confused with a previous version of the
patch that behaved differently.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Tom Lane

"Jonathan S. Katz"  writes:
> On 4/13/20 7:02 PM, Jonathan S. Katz wrote:
>> Perhaps a counterproposal: We eliminate the content in the leftmost
>> "function column, but leave that there to allow the function name /
>> signature to span the full 3 columns. Then the rest of the info goes
>> below. This will also compress the table height down a bit.

> An attempt at a "POC" of what I'm describing (attached image).

Hmm ... what is determining the width of the left-hand column?
It doesn't seem to have any content, since the function entries
are being spanned across the whole table.

I think the main practical problem though is that it wouldn't
work nicely for operators, since the key "name" you'd be looking
for would not be at the left of the signature line.  I suppose we
don't necessarily have to have the same layout for operators as
for functions, but it feels like it'd be jarringly inconsistent.

regards, tom lane

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Alvaro Herrera

On 2020-Apr-13, Jonathan S. Katz wrote:

> On 4/13/20 7:02 PM, Jonathan S. Katz wrote:

> > Perhaps a counterproposal: We eliminate the content in the leftmost
> > "function column, but leave that there to allow the function name /
> > signature to span the full 3 columns. Then the rest of the info goes
> > below. This will also compress the table height down a bit.
> 
> An attempt at a "POC" of what I'm describing (attached image).
> 
> I'm not sure if I 100% like it, but it does reduce the amount of
> information we're displaying but conveys all the details (and matches
> what we have in the previous version).

Ooh, this seems a nice idea -- the indentation seems to be sufficient to
tell apart entries from each other.  Your point about information
reduction refers to the fact that we no longer keep the unadorned name
but only the signature, right?  That seems an improvement to me now that
I look at it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Tom Lane

I wrote:
> I don't think I like the  version better than  --- it adds
> quite a bit of vertical space, more than I was expecting really.

Actually, after staring more at HTML-hr.png, what's *really* bothering
me about that rendering is that the lines made by  are actually
wider than the inter-table-cell lines.  Surely we want the opposite
relationship.  Presumably that could be fixed with some css-level
adjustments; and maybe the spacing could be tightened up a bit too?
I do like having that visual separation, it just needs to be toned
down compared to the table cell separators.

Reproducing the effect in the PDF build remains an issue, too.

regards, tom lane

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Jonathan S. Katz

On 4/13/20 7:02 PM, Jonathan S. Katz wrote:
> On 4/13/20 6:51 PM, Tom Lane wrote:
>> "Jonathan S. Katz"  writes:
>>> I think one thing that was throwing me off was having the function
>>> signature before the description. I would recommend flipping them: have
>>> the function description first, followed by signature, followed be
>>> examples. I think that follows the natural flow more of what one is
>>> doing when they look up the function.
>>
>> The trouble with that is it doesn't work very well when we have
>> multiple similarly-named functions with different signatures.
>> Consider what the two enum_range() entries in 9.33 will look like,
>> for example.  I think we need the signature to establish which function
>> we're talking about.
> 
> I get that, I just find I'm doing too much thinking looking at it.
> 
> Perhaps a counterproposal: We eliminate the content in the leftmost
> "function column, but leave that there to allow the function name /
> signature to span the full 3 columns. Then the rest of the info goes
> below. This will also compress the table height down a bit.

An attempt at a "POC" of what I'm describing (attached image).

I'm not sure if I 100% like it, but it does reduce the amount of
information we're displaying but conveys all the details (and matches
what we have in the previous version).

The alignment could be adjusted if need be, too.

Jonathan


signature.asc
Description: OpenPGP digital signature

Re: pg_basebackup, manifests and backends older than ~12

2020-04-13 Thread Alvaro Herrera

On 2020-Apr-13, Michael Paquier wrote:

> On Mon, Apr 13, 2020 at 11:52:51AM +0900, Kyotaro Horiguchi wrote:
> > Since I'm not sure about the work flow that contains taking a
> > basebackup from a server of a different version, I'm not sure which is
> > better between silently disabling and erroring out.  However, it seems
> > to me, the option for replication slot is a choice of the way the tool
> > works which doesn't affect the result itself, but that for backup
> > manifest is about what the resulting backup contains. Therefore I
> > think it is better that pg_basebackup in PG13 should error out if the
> > source server doesn't support backup manifest but --no-manifest is not
> > specfied, and show how to accomplish their wants (, though I don't see
> > the wants clearly).
> 
> Not sure what Robert and other authors of the feature think about
> that.  What I am rather afraid of is somebody deciding to patch a
> script aimed at working across multiple backend versions to add
> unconditionally --no-manifest all the time, even for v13.  That would
> kill the purpose of encouraging the use of manifests.

I agree, I think forcing users to specify --no-manifest when run on old
servers will cause users to write bad scripts; I vote for silently
disabling checksums.


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Jonathan S. Katz

On 4/13/20 6:51 PM, Tom Lane wrote:
> "Jonathan S. Katz"  writes:
>> I think one thing that was throwing me off was having the function
>> signature before the description. I would recommend flipping them: have
>> the function description first, followed by signature, followed be
>> examples. I think that follows the natural flow more of what one is
>> doing when they look up the function.
> 
> The trouble with that is it doesn't work very well when we have
> multiple similarly-named functions with different signatures.
> Consider what the two enum_range() entries in 9.33 will look like,
> for example.  I think we need the signature to establish which function
> we're talking about.

I get that, I just find I'm doing too much thinking looking at it.

Perhaps a counterproposal: We eliminate the content in the leftmost
"function column, but leave that there to allow the function name /
signature to span the full 3 columns. Then the rest of the info goes
below. This will also compress the table height down a bit.

>> There are probably some things we can do with shading on the pgweb side
>> to make items more distinguishable, I don't think that would be too
>> terrible to add.
> 
> Per David's earlier comment, it seems like alternating backgrounds might
> be feasible if we can get it down to one  per function, as the
> version I just posted has.

or a classname on the "" when a new function starts or the like.
Easy enough to get the CSS to work off of that :)

Jonathan



signature.asc
Description: OpenPGP digital signature

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Tom Lane

"Jonathan S. Katz"  writes:
> I think one thing that was throwing me off was having the function
> signature before the description. I would recommend flipping them: have
> the function description first, followed by signature, followed be
> examples. I think that follows the natural flow more of what one is
> doing when they look up the function.

The trouble with that is it doesn't work very well when we have
multiple similarly-named functions with different signatures.
Consider what the two enum_range() entries in 9.33 will look like,
for example.  I think we need the signature to establish which function
we're talking about.

> There are probably some things we can do with shading on the pgweb side
> to make items more distinguishable, I don't think that would be too
> terrible to add.

Per David's earlier comment, it seems like alternating backgrounds might
be feasible if we can get it down to one  per function, as the
version I just posted has.

regards, tom lane

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Corey Huinker

>
> Yeah, back at the beginning of this exercise, Alvaro wondered aloud
> if we should go to something other than tables altogether.  I dunno
> what that'd look like though.
>

It would probably look like our acronyms and glossary pages.

Maybe the return example and return values get replaced with a
programlisting?

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Corey Huinker

>
> Thinking out loud, it'd also be great if we could add in some anchors as
> well, so perhaps in the future on the pgweb side we could add in some
> discoverable links that other documentation has -- which in turn people
> could click / link to others directly to the function name.
>

+1

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Jonathan S. Katz

On 4/13/20 1:13 PM, Tom Lane wrote:
> As discussed in the thread at [1], I've been working on redesigning
> the tables we use to present SQL functions and operators.  The
> first installment of that is now up; see tables 9.30 and 9.31 at
> 
> https://www.postgresql.org/docs/devel/functions-datetime.html
> 
> and table 9.33 at
> 
> https://www.postgresql.org/docs/devel/functions-enum.html
> 
> Before I spend more time on this, I want to make sure that people
> are happy with this line of attack.  Comparing these tables to
> the way they look in v12, they clearly take more vertical space;
> but at least to my eye they're less cluttered and more readable.
> They definitely scale a lot better for cases where a long function
> description is needed, or where we'd like to have more than one
> example.  Does anyone prefer the old way, or have a better idea?
> 
> I know that the table headings are a bit weirdly laid out; hopefully
> that can be resolved [2].

> [2] https://www.postgresql.org/message-id/6169.1586794603%40sss.pgh.pa.us

When evaluating [2], I will admit at first I was very confused about the
layout and wasn't exactly sure what you were saying was incorrect in
that note. After fixing [2] on my local copy, I started to look at it again.

For positives, I do think it's an improvement for readability on mobile.
Flow/content aside, it was easier to read and follow what was going on
and there was less side scrolling.

I think one thing that was throwing me off was having the function
signature before the description. I would recommend flipping them: have
the function description first, followed by signature, followed be
examples. I think that follows the natural flow more of what one is
doing when they look up the function.

I think that would also benefit larger tables too: instead of having to
scroll up to understand how things are laid out, it'd follow said flow.

There are probably some things we can do with shading on the pgweb side
to make items more distinguishable, I don't think that would be too
terrible to add.

Thinking out loud, it'd also be great if we could add in some anchors as
well, so perhaps in the future on the pgweb side we could add in some
discoverable links that other documentation has -- which in turn people
could click / link to others directly to the function name.

Anyway, change is hard. I'm warming up to it.

Jonathan

signature.asc
Description: OpenPGP digital signature

Re: pg_basebackup, manifests and backends older than ~12

2020-04-13 Thread Michael Paquier

On Mon, Apr 13, 2020 at 11:13:06AM -0400, Robert Haas wrote:
> I think that this patch is incorrect. I have no objection to
> introducing MINIMUM_VERSION_FOR_MANIFESTS, but this is not OK:
> 
> - else
> - {
> - if (serverMajor < 1300)
> - manifest_clause = "";
> - else
> - manifest_clause = "MANIFEST 'no'";
> - }
> 
> It seems to me that this will break --no-manifest option on v13.

Well, the documentation tells me that as of protocol.sgml:
"For compatibility with previous releases, the default is
MANIFEST 'no'."

The code also tells me that, in line with the docs:
static void
parse_basebackup_options(List *options, basebackup_options *opt)
[...]
MemSet(opt, 0, sizeof(*opt));
opt->manifest = MANIFEST_OPTION_NO;

And there is also a TAP test for that when passing down --no-manifest,
which should not create a backup manifest:
$node->command_ok(
[
'pg_basebackup', '-D', "$tempdir/backup2", '--no-manifest',
'--waldir', "$tempdir/xlog2"
],

So, it seems to me that it is fine to remove this block, as when
--no-manifest is used, then "manifest" gets set to false, and then it
does not matter if the MANIFEST clause is added or not as we'd just
rely on the default.  Keeping the block would matter if you want to
make the code more robust to a change of the default value in the
BASE_BACKUP query though, and its logic is not incorrect either.  So,
if you wish to keep it, that's fine by me, but it looks cleaner to me
to remove it and more consistent with the other options like MAX_RATE,
TABLESPACE_MAP, etc.
--
Michael


signature.asc
Description: PGP signature

Re: documenting the backup manifest file format

2020-04-13 Thread Alvaro Herrera

On 2020-Apr-13, Robert Haas wrote:

> On Mon, Apr 13, 2020 at 3:34 PM Alvaro Herrera  
> wrote:
> > Are these hex figures upper or lower case?  No leading zeroes?  This
> > would normally not matter, but the toplevel checksum will care.
> 
> Not really. You just feed the whole file except for the last line
> through shasum and you get the answer.
> 
> It so happens that the server generates lower-case, but
> pg_verifybackup will accept either.
> 
> Leading zeroes are not omitted. If the checksum's not the right
> length, it ain't gonna work. If SHA is used, it's the same output you
> would get from running shasum -a on the file, which is
> certainly a fixed length. I assumed that this followed from the
> statement that there are two characters per byte in the checksum, and
> from the fact that no checksum algorithm I know about drops leading
> zeroes in the output.

Eh, apologies, I was completely unclear -- I was looking at the LSN
fields when writing the above.  So the leading zeroes and letter case
comment refers to those in the LSN values.  I agree that it doesn't
matter as long as the same tool generates the json file and writes the
checksum.

> > Also, I see no mention of prettification-chars such as newlines or
> > indentation.  I suppose if I pass a manifest file through
> > prettification (or Windows newline conversion), the checksum may
> > break.
> 
> It would indeed break. I'm not sure what you want me to say here,
> though. If you're trying to parse a manifest, you shouldn't care about
> how the whitespace is arranged. If you're trying to generate one, you
> can arrange it any way you like, as long as you also include it in the
> checksum.

Yeah, I guess I'm just saying that it feels brittle to have a file
format that's supposed to be good for data exchange and then make it
itself depend on representation details such as the order that fields
appear in, the letter case, or the format of newlines.  Maybe this isn't
really of concern, but it seemed strange.

> > As for Last-Modification, I think the spec should indicate the exact
> > format that's used, because it'll also be critical for checksumming.
> 
> Again, I don't think it really matters for checksumming, but it's
> "-MM-DD HH:MM:SS TZ" format, where TZ is always GMT.

I agree that whatever format you use will work as long as it isn't
modified.

I think strict ISO 8601 might be preferable (with the T in the middle
and ending in Z instead of " GMT").

> > Why is the top-level checksum only allowed to be SHA-256, if the
> > files can use up to SHA-512?

Thanks for the discussion.  I think you mostly want to make sure that
the manifest is sensible (not corrupt) rather than defend against
somebody maliciously giving you an attacking manifest (??).  I incline
to agree that any SHA-2 hash is going to serve that purpose and have no
further comment to make.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread David G. Johnston

On Mon, Apr 13, 2020 at 1:57 PM Tom Lane  wrote:

> Actually ... if we did it like that, then it would be possible to treat
> the signature + description + example(s) as one big table cell with line
> breaks rather than row-separator bars.



> That would help address the
> inadequate-visual-separation-between-groups issue, but on the other hand
> maybe we'd end up with too little visual separation between the elements
> of a function description.
>

Speaking in terms of HTML if we use  instead of  we would get
the best of both worlds.

David J.

Re: doc review for parallel vacuum

2020-04-13 Thread Justin Pryzby

On Mon, Apr 13, 2020 at 03:22:06PM +0530, Amit Kapila wrote:
> On Mon, Apr 13, 2020 at 2:00 PM Justin Pryzby  wrote:
> >
> > |Copy the index
> > |bulk-deletion result returned from ambulkdelete and amvacuumcleanup to
> > |the DSM segment if it's the first time [???] because they allocate locally
> > |and it's possible that an index will be vacuumed by a different
> > |vacuum process the next time."
> >
> > Is it correct to say: "..if it's the first iteration" and "different 
> > process on
> > the next iteration" ?  Or "cycle" ?
> >
> 
> "cycle" sounds better.  I have changed the patch as per your latest
> comments.  Let me know what you think?

Looks good.  One more change:

[-Even if-]{+If+} this option is specified with the ANALYZE 
[-option-]{+option,+}

Remove "even" and add comma.

Thanks,
-- 
Justin

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread David G. Johnston

On Mon, Apr 13, 2020 at 1:41 PM Tom Lane  wrote:

> "David G. Johnston"  writes:
> > Can we lightly background color every other rowgroup (i.e., "greenbar")?
>
> If you know how to do that at all, let alone in a maintainable way (ie
> one where inserting a new function doesn't require touching the entries
> for the ones after), let's see it.
>

The nth-child({odd|even}) CSS Selector should provide the desired
functionality, at least for HTML, but the structure will need to modified
so that there is some single element that represents a single rowgroup.  I
tried (not too hard) to key off of the presence of the "rowspan" attribute
but that does not seem possible.

https://www.w3schools.com/cssref/sel_nth-child.asp

David J.

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Tom Lane

I wrote:
> "David G. Johnston"  writes:
>> I don't think having a separate Result column helps.  The additional
>> horizontal whitespace distances all relevant context information (at least
>> on a wide monitor).  Having the example rows mirror the Signature row seems
>> like an easier to consume choice.

> Interesting idea.  I'm afraid that it would not look so great in cases
> where the example-plus-result overflows one line, which would inevitably
> happen in PDF format.  Still, maybe that would be rare enough to not be
> a huge problem.  In most places it'd be a win to not have to separately
> allocate example and result space.

Actually ... if we did it like that, then it would be possible to treat
the signature + description + example(s) as one big table cell with line
breaks rather than row-separator bars.  That would help address the
inadequate-visual-separation-between-groups issue, but on the other hand
maybe we'd end up with too little visual separation between the elements
of a function description.

A quick google search turned up this suggestion about how to force
line breaks in docbook table cells:

http://www.sagehill.net/docbookxsl/LineBreaks.html

which seems pretty hacky but it should work.  Anyone know a better
way?

regards, tom lane

Re: documenting the backup manifest file format

2020-04-13 Thread David Steele


On 4/13/20 4:14 PM, Robert Haas wrote:

On Mon, Apr 13, 2020 at 3:34 PM Alvaro Herrera  wrote:


Also, I
see no mention of prettification-chars such as newlines or indentation.
I suppose if I pass a manifest file through prettification (or Windows
newline conversion), the checksum may break.


It would indeed break. I'm not sure what you want me to say here,
though. If you're trying to parse a manifest, you shouldn't care about
how the whitespace is arranged. If you're trying to generate one, you
can arrange it any way you like, as long as you also include it in the
checksum.


pgBackRest ignores whitespace but this is a legacy of the way Perl 
calculated checksums, not an intentional feature. This worked well when 
the manifest was loaded as a whole, converted to JSON, and checksummed, 
but it is a major pain for the streaming code we now have in C.


I guarantee that that our next manifest version will do a simple 
checksum of bytes as Robert has done in this feature.


So, I'm +1 as implemented.


Why is the top-level checksum only allowed to be SHA-256, if the files
can use up to SHA-512?





I agree that it's a little bit weird that you can have a stronger
checksum for the files instead of the manifest itself, but I also
wonder what the use case would be for using a stronger checksum on the
manifest. David Steele argued that strong checksums on the files could
be useful to software that wants to rifle through all the backups
you've ever taken and find another copy of that file by looking for
something with a matching checksum. CRC-32C wouldn't be strong enough
for that, because eventually you could have enough files that you
start to have collisions. The SHA algorithms output enough bits to
make that quite unlikely. But this argument only makes sense for the
files, not the manifest.


Agreed. I think SHA-256 is *more* than enough to protect the manifest 
against corruption. That said, since the cost of SHA-256 vs. SHA-512 in 
the context on the manifest is negligible we could just use the stronger 
algorithm to deflect a similar question going forward.


That choice might not age well, but we could always say, well, we picked 
it because it was the strongest available at the time. Allowing a choice 
of which algorithm to use for to manifest checksum seems like it will 
just make verifying the file harder with no tangible benefit.


Maybe just a comment in the docs about why SHA-256 was used would be fine.


(Also, did we intentionally omit the dash in
hash names, so "SHA-256" to make it SHA256?  This will also be critical
for checksumming the manifest itself.)


I debated this with myself, settled on this spelling, and nobody
complained until now. It could be changed, though. I didn't have any
particular reason for choosing it except the feeling that people would
probably prefer to type --manifest-checksum=sha256 rather than
--manifest-checksum=sha-256.


+1 for sha256 rather than sha-256.

Regards,
--
-David
da...@pgmasters.net

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Tom Lane

"David G. Johnston"  writes:
> Can we lightly background color every other rowgroup (i.e., "greenbar")?

If you know how to do that at all, let alone in a maintainable way (ie
one where inserting a new function doesn't require touching the entries
for the ones after), let's see it.  I agree it'd be a nice solution,
if we could make it work, but I don't see how.  I'd been imagining
instead that we could give a different background color to the first
line of each group; which I don't know how to do but it at least seems
plausible that a style could be attached to a .

> I don't think having a separate Result column helps.  The additional
> horizontal whitespace distances all relevant context information (at least
> on a wide monitor).  Having the example rows mirror the Signature row seems
> like an easier to consume choice.

Interesting idea.  I'm afraid that it would not look so great in cases
where the example-plus-result overflows one line, which would inevitably
happen in PDF format.  Still, maybe that would be rare enough to not be
a huge problem.  In most places it'd be a win to not have to separately
allocate example and result space.

regards, tom lane

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Tom Lane

Andrew Dunstan  writes:
> One thing that did occur to me is that the function/operator name is
> essentially redundant, as it's in the signature anyway. Not sure if that
> helps us any though.

Hm, you have a point there.  However, if we drop the lefthand column
then there really isn't any visual distinction between the row(s)
associated with one function and those of the next.  Unless we can
find another fix for that aspect (as already discussed in this thread)
I doubt it'd be an improvement.

> Maybe we're just trying to shoehorn too much information into a single
> table.

Yeah, back at the beginning of this exercise, Alvaro wondered aloud
if we should go to something other than tables altogether.  I dunno
what that'd look like though.

regards, tom lane

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread David G. Johnston

On Mon, Apr 13, 2020 at 11:27 AM Tom Lane  wrote:

> Isaac Morland  writes:
>
> > - I think there should be much more distinctive lines between the
> different
> > functions. As it is the fact that the table is groups of 3 lines doesn’t
> > jump out at the eye.
>
> I don't know any easy way to do that.  We do already have the grouping
> visible in the first column...
>

Can we lightly background color every other rowgroup (i.e., "greenbar")?

I don't think having a separate Result column helps.  The additional
horizontal whitespace distances all relevant context information (at least
on a wide monitor).  Having the example rows mirror the Signature row seems
like an easier to consume choice.

e.g.,

enum_first(null::rainbow) → red

date '2001-09-28' + 7 → 2001-10-05

Its also removes the left alignment in a fixed width column which draws
unwanted visual attention.

David J.

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Tom Lane

Robert Haas  writes:
> I just wonder if there's too much clutter here. Like, line 1:

> date - interval → timestamp

> OK, gotcha. Line 2:

> Subtract an interval from a date

> Well, is that really adding anything non-obvious?

Yeah, back in the other thread I said

>>> I decided to try converting the date/time operators table too, to
>>> see how well this works for that.  It's bulkier than before, but
>>> also (I think) more precise.  I realized that this table actually
>>> had three examples already for float8 * interval, but it wasn't
>>> at all obvious that they were the same operator.  So that aspect
>>> is a lot nicer here.  On the other hand, it seems like the text
>>> descriptions are only marginally useful here.  I can imagine that
>>> they would be useful in some other operator tables, such as
>>> geometric operators, but I'm a bit tempted to leave them out
>>> in this particular table.  The format would adapt to that easily.

I wouldn't be averse to dropping the text descriptions for operators
in places where they seem obvious ... but who decides what is obvious?

Indeed, we've gotten more than one complaint in the past that some of the
geometric and JSON operators require a longer explanation than they've
got.  So one of the points here was to have a format that could adapt to
that.  But in this particular table I agree they're marginal.

regards, tom lane

Re: relcache leak warnings vs. errors

2020-04-13 Thread Tom Lane

Andres Freund  writes:
> On 2020-04-11 10:54:49 -0400, Tom Lane wrote:
>> I guess you could make them PANICs, but it would be an option that nobody
>> could possibly want to have enabled in anything resembling production.
>> So I"m kind of -0.5 on making --enable-cassert do it automatically.
>> Although I suppose that it's not really worse than other assertion
>> failures.

> I'd much rather see this throw an assertion than the current
> behaviour. But I'm wondering if there's a chance we can throw an error
> in non-assert builds without adding too much complexity to the error
> paths. Could we perhaps throw the error a bit later during the commit
> processing?

Any error post-commit is a semantic disaster.

I guess that an assertion wouldn't be so awful, if people would rather
do it like that in debug builds.

regards, tom lane

Re: Parallel copy

2020-04-13 Thread Robert Haas

On Mon, Apr 13, 2020 at 4:16 PM Andres Freund  wrote:
> I don't think so. If only one process does the splitting, the
> exclusively locked section is just popping off a bunch of offsets of the
> ring. And that could fairly easily be done with atomic ops (since what
> we need is basically a single producer multiple consumer queue, which
> can be done lock free fairly easily ). Whereas in the case of each
> process doing the splitting, the exclusively locked part is splitting
> along lines - which takes considerably longer than just popping off a
> few offsets.

Hmm, that does seem believable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Andrew Dunstan



On 4/13/20 1:13 PM, Tom Lane wrote:
> As discussed in the thread at [1], I've been working on redesigning
> the tables we use to present SQL functions and operators.  The
> first installment of that is now up; see tables 9.30 and 9.31 at
>
> https://www.postgresql.org/docs/devel/functions-datetime.html
>
> and table 9.33 at
>
> https://www.postgresql.org/docs/devel/functions-enum.html
>
> Before I spend more time on this, I want to make sure that people
> are happy with this line of attack.  Comparing these tables to
> the way they look in v12, they clearly take more vertical space;
> but at least to my eye they're less cluttered and more readable.
> They definitely scale a lot better for cases where a long function
> description is needed, or where we'd like to have more than one
> example.  Does anyone prefer the old way, or have a better idea?
>
> I know that the table headings are a bit weirdly laid out; hopefully
> that can be resolved [2].
>
>   regards, tom lane
>
> [1] https://www.postgresql.org/message-id/flat/9326.1581457869%40sss.pgh.pa.us
> [2] https://www.postgresql.org/message-id/6169.1586794603%40sss.pgh.pa.us
>

Gotta say I'm not a huge fan. I appreciate the effort, and I get the
problem, but I'm not sure we have a net improvement here.


One thing that did occur to me is that the function/operator name is
essentially redundant, as it's in the signature anyway. Not sure if that
helps us any though.


Maybe we're just trying to shoehorn too much information into a single
table.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Robert Haas

On Mon, Apr 13, 2020 at 2:47 PM Tom Lane  wrote:
> Another possibility, which'd only help in HTML, would be to render
> some of the cells with a slightly different background color.
> That's beyond my docbook/css skills, but it might be possible.

I think some visual distinction would be really helpful, if we can get it.

I just wonder if there's too much clutter here. Like, line 1:

date - interval → timestamp

OK, gotcha. Line 2:

Subtract an interval from a date

Well, is that really adding anything non-obvious?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel copy

2020-04-13 Thread Andres Freund

Hi,

On 2020-04-13 14:13:46 -0400, Robert Haas wrote:
> On Fri, Apr 10, 2020 at 2:26 PM Andres Freund  wrote:
> > > Still, it might be the case that having the process that is reading
> > > the data also find the line endings is so fast that it makes no sense
> > > to split those two tasks. After all, whoever just read the data must
> > > have it in cache, and that helps a lot.
> >
> > Yea. And if it's not fast enough to split lines, then we have a problem
> > regardless of which process does the splitting.
> 
> Still, if the reader does the splitting, then you don't need as much
> IPC, right? The shared memory data structure is just a ring of bytes,
> and whoever reads from it is responsible for the rest.

I don't think so. If only one process does the splitting, the
exclusively locked section is just popping off a bunch of offsets of the
ring. And that could fairly easily be done with atomic ops (since what
we need is basically a single producer multiple consumer queue, which
can be done lock free fairly easily ). Whereas in the case of each
process doing the splitting, the exclusively locked part is splitting
along lines - which takes considerably longer than just popping off a
few offsets.

Greetings,

Andres Freund

Re: documenting the backup manifest file format

2020-04-13 Thread Robert Haas

On Mon, Apr 13, 2020 at 4:10 PM Andrew Dunstan
 wrote:
> Seems ok. A tiny example, or an excerpt, might be nice.

An empty database produces a manifest about 1200 lines long, so a full
example seems like too much to include in the documentation. An
excerpt could be included, I suppose.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: documenting the backup manifest file format

2020-04-13 Thread Robert Haas

On Mon, Apr 13, 2020 at 3:34 PM Alvaro Herrera  wrote:
> Are these hex figures upper or lower case?  No leading zeroes?  This
> would normally not matter, but the toplevel checksum will care.

Not really. You just feed the whole file except for the last line
through shasum and you get the answer.

It so happens that the server generates lower-case, but
pg_verifybackup will accept either.

Leading zeroes are not omitted. If the checksum's not the right
length, it ain't gonna work. If SHA is used, it's the same output you
would get from running shasum -a on the file, which is
certainly a fixed length. I assumed that this followed from the
statement that there are two characters per byte in the checksum, and
from the fact that no checksum algorithm I know about drops leading
zeroes in the output.

> Also, I
> see no mention of prettification-chars such as newlines or indentation.
> I suppose if I pass a manifest file through prettification (or Windows
> newline conversion), the checksum may break.

It would indeed break. I'm not sure what you want me to say here,
though. If you're trying to parse a manifest, you shouldn't care about
how the whitespace is arranged. If you're trying to generate one, you
can arrange it any way you like, as long as you also include it in the
checksum.

> As for Last-Modification, I think the spec should indicate the exact
> format that's used, because it'll also be critical for checksumming.

Again, I don't think it really matters for checksumming, but it's
"-MM-DD HH:MM:SS TZ" format, where TZ is always GMT.

> Why is the top-level checksum only allowed to be SHA-256, if the files
> can use up to SHA-512?

If we allowed the top-level checksum to be changed to something else,
then we'd probably we want to indicate which kind of checksum is being
used at the beginning of the file, so as to enable incremental parsing
with checksum verification at the end. pg_verifybackup doesn't
currently do incremental parsing, but I'd like to add that sometime,
if I get time to hash out the details. I think the use case for
varying the checksum type of the manifest itself is much less than for
varying it for the files. The big problem with checksumming the files
is that it can be slow, because the files can be big. However, unless
you have a truckload of empty files in the database, the manifest is
going to be very small compared to the sizes of all the files, so it
seemed harmless to use a stronger checksum algorithm for the manifest
itself. Maybe someone with a ton of empty or nearly-empty relations
will complain, but they can always use --no-manifest if they want.

I agree that it's a little bit weird that you can have a stronger
checksum for the files instead of the manifest itself, but I also
wonder what the use case would be for using a stronger checksum on the
manifest. David Steele argued that strong checksums on the files could
be useful to software that wants to rifle through all the backups
you've ever taken and find another copy of that file by looking for
something with a matching checksum. CRC-32C wouldn't be strong enough
for that, because eventually you could have enough files that you
start to have collisions. The SHA algorithms output enough bits to
make that quite unlikely. But this argument only makes sense for the
files, not the manifest.

Naturally, all this is arguable, though, and a good deal of arguing
about it has been done, as you have probably noticed. I am still of
the opinion that if somebody's goal is to use this facility for its
intended purpose, which is to find out whether your backup got
corrupted, any of these algorithms are fine, and are highly likely to
tell you that you have a problem if, in fact, you do. In fact, I bet
that even a checksum algorithm considerably stupider than anything I'd
actually consider using would accomplish that goal in a high
percentage of cases. But not everybody agrees with me, to the point
where I am starting to wonder if I really understand how computers
work.

> (Also, did we intentionally omit the dash in
> hash names, so "SHA-256" to make it SHA256?  This will also be critical
> for checksumming the manifest itself.)

I debated this with myself, settled on this spelling, and nobody
complained until now. It could be changed, though. I didn't have any
particular reason for choosing it except the feeling that people would
probably prefer to type --manifest-checksum=sha256 rather than
--manifest-checksum=sha-256.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: documenting the backup manifest file format

2020-04-13 Thread Andrew Dunstan



On 4/13/20 1:40 PM, Robert Haas wrote:
> On Fri, Mar 27, 2020 at 4:32 PM Andres Freund  wrote:
>> I don't like having a file format that's intended to be used by external
>> tools too that's undocumented except for code that assembles it in a
>> piecemeal fashion.  Do you mean in a follow-on patch this release, or
>> later? I don't have a problem with the former.
> Here is a patch for that.
>


Seems ok. A tiny example, or an excerpt, might be nice.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: relcache leak warnings vs. errors

2020-04-13 Thread Andres Freund

Hi,

On 2020-04-11 10:54:49 -0400, Tom Lane wrote:
> Peter Eisentraut  writes:
> > How about a compile-time option to turn all the warnings in resowner.c 
> > into errors?  This could be enabled automatically by --enable-cassert, 
> > similar to other defines that that option enables.
> 
> [ itch... ]  Those calls occur post-commit; throwing an error there
> is really a mess, which is why it's only WARNING now.

> I guess you could make them PANICs, but it would be an option that nobody
> could possibly want to have enabled in anything resembling production.
> So I"m kind of -0.5 on making --enable-cassert do it automatically.
> Although I suppose that it's not really worse than other assertion
> failures.

I'd much rather see this throw an assertion than the current
behaviour. But I'm wondering if there's a chance we can throw an error
in non-assert builds without adding too much complexity to the error
paths. Could we perhaps throw the error a bit later during the commit
processing?

Greetings,

Andres Freund

Re: documenting the backup manifest file format

2020-04-13 Thread Robert Haas

On Mon, Apr 13, 2020 at 2:28 PM Erik Rijkers  wrote:
> Can you double check this sentence?  Seems strange to me but I don't
> know why; it may well be that my english is not good enough.  Maybe a
> comma after 'required' makes reading easier?
>
> The timeline from which this range of WAL records will be required in
> order to make use of this backup. The value is an integer.

It sounds a little awkward to me, but not outright wrong. I'm not
exactly sure how to rephrase it, though. Maybe just shorten it to "the
timeline for this range of WAL records"?

> One typo:
>
> 'when making using'  should be
> 'when making use'

Right, thanks, fixed in my local copy.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Using Oracle SQL Client commands with PSQL 12.2 DB

2020-04-13 Thread Fred Richard

PGSQLCommunities,


We migrated Oracle 11.x Database to PostgreSQL 12.x Database on a RH Linux
7.x server.
On a different RH Linux 7.x Server, I have Oracle Client installed.  Since
we have many scripts developed in Oracle SQL, is it possible for the
PostgreSQL 12.x DB to process the Oracle Scripts?  Are there utilities or
drivers that could be installed on the PostgreSQL 12.x Database or server
for processing the Oracle SQL client commands?  We are trying to avoid
updating our Oracle Client scripts on remote servers.

Thanks
Fred

Re: documenting the backup manifest file format

2020-04-13 Thread Alvaro Herrera

+  The LSN at which replay must begin on the indicated timeline in order to
+  make use of this backup.  The LSN is stored in the format normally used
+  by PostgreSQL; that is, it is a string
+  consisting of two strings of hexademical characters, each with a length
+  of between 1 and 8, separated by a slash.

typo "hexademical"

Are these hex figures upper or lower case?  No leading zeroes?  This
would normally not matter, but the toplevel checksum will care.  Also, I
see no mention of prettification-chars such as newlines or indentation.
I suppose if I pass a manifest file through prettification (or Windows
newline conversion), the checksum may break.

As for Last-Modification, I think the spec should indicate the exact
format that's used, because it'll also be critical for checksumming.

Why is the top-level checksum only allowed to be SHA-256, if the files
can use up to SHA-512?  (Also, did we intentionally omit the dash in
hash names, so "SHA-256" to make it SHA256?  This will also be critical
for checksumming the manifest itself.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Tom Lane

Alvaro Herrera  writes:
> One improvement (that I don't know is possible in docbook) would be to
> have the inter-logical-row line be slightly thicker than the
> intra-logical-row one.  That'd make each entry visually more obvious.

Yeah, I don't see any way to do that :-(.  We could suppress the row
lines entirely between the members of the logical group, but that'd
almost surely look worse.

(I tried to implement this to see, and couldn't get rowsep="0" in
a  to render the way I expected, so there may be toolchain
bugs in the way of it anyway.)

We could leave an entirely empty row between logical groups, but
that would be really wasteful of vertical space.

Another possibility, which'd only help in HTML, would be to render
some of the cells with a slightly different background color.
That's beyond my docbook/css skills, but it might be possible.

regards, tom lane

Re: documenting the backup manifest file format

2020-04-13 Thread Erik Rijkers


On 2020-04-13 20:08, Robert Haas wrote:

[v2-0001-Document-the-backup-manifest-file-format.patch]


Can you double check this sentence?  Seems strange to me but I don't 
know why; it may well be that my english is not good enough.  Maybe a 
comma after 'required' makes reading easier?


   The timeline from which this range of WAL records will be required in
   order to make use of this backup. The value is an integer.


One typo:

'when making using'  should be
'when making use'



Erik Rijkers

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Tom Lane

Isaac Morland  writes:
> - showing the signature like this is interesting. For a moment I was
> wondering why it doesn’t say, for example, "interval → interval → interval”
> then I remembered this is Postgres, not Haskell. On the one hand, I like
> putting the signature like this; on the other, I don’t like that the return
> type is in a different place in each one. Could it be split into the same
> two columns as the example(s); first column inputs, second column results?

We tried that in an earlier iteration (see the referenced thread).  It
doesn't work very well because you end up having to allocate the max
amount of space for any result type or example result on every line.
Giving up the separate cell for return type is a lot of what makes this
workable.

> - another possibility for the parameters: list each one on a separate line,
> together with default (if applicable). Maybe that would be excessively
> tall, but it would sure make completely clear just exactly how many
> parameters there are and never wrap (well, maybe on a phone, but we can
> only do so much).

Since so few built-in functions have default parameters, that's going to
waste an awful lot of space in most cases.  I actually ended up removing
the explicit "default" clauses from make_interval (which is the only
function with defaults that I dealt with so far) and instead explained
that they all default to zero in the text description, because that took
way less space.

> - for the various current-time-related functions (age, current_time, etc.),
> rather than saying “variable”, could it be the actual result with “now”
> being taken to be a specific fixed time within the year in which the
> documentation was generated? This would be really helpful for example with
> being clear that current_time is only the time of day with no date.

Yeah, I've been waffling about that.  On the one hand, we regularly get
docs complaints from people who say "I tried this example and I didn't
get the claimed result".  On the other hand you could figure that
everybody should understand that current_timestamp won't work like that
... but the first such example in the table is age() for which that
automatic understanding might not apply.

The examples down in 9.9.4 use a specific time, which is looking pretty
long in the tooth right now, and no one has complained --- but that's
in a context where it's absolutely plain that every mentioned function
is going to have a time-varying result.

On the whole I'm kind of leaning to going back to using a specific time.
But that's a detail that's not very relevant to the bigger picture here.
(No, I'm not going to try to make it update every year; too much work
for too little reward.)

> - I think there should be much more distinctive lines between the different
> functions. As it is the fact that the table is groups of 3 lines doesn’t
> jump out at the eye.

I don't know any easy way to do that.  We do already have the grouping
visible in the first column...

regards, tom lane

Re: Parallel copy

2020-04-13 Thread Robert Haas

On Fri, Apr 10, 2020 at 2:26 PM Andres Freund  wrote:
> > Still, it might be the case that having the process that is reading
> > the data also find the line endings is so fast that it makes no sense
> > to split those two tasks. After all, whoever just read the data must
> > have it in cache, and that helps a lot.
>
> Yea. And if it's not fast enough to split lines, then we have a problem
> regardless of which process does the splitting.

Still, if the reader does the splitting, then you don't need as much
IPC, right? The shared memory data structure is just a ring of bytes,
and whoever reads from it is responsible for the rest.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-04-13 Thread Kuntal Ghosh

On Mon, Apr 13, 2020 at 6:34 PM Dilip Kumar  wrote:
>
Skipping 0003 for now. Review comments from 0004-Gracefully-handle-*.patch

@@ -5490,6 +5523,14 @@ heap_finish_speculative(Relation relation,
ItemPointer tid)
  ItemId lp = NULL;
  HeapTupleHeader htup;

+ /*
+ * We don't expect direct calls to heap_hot_search with
+ * valid CheckXidAlive for regular tables. Track that below.
+ */
+ if (unlikely(TransactionIdIsValid(CheckXidAlive) &&
+ !(IsCatalogRelation(relation) || RelationIsUsedAsCatalogTable(relation
+ elog(ERROR, "unexpected heap_hot_search call during logical decoding");
The call is to heap_finish_speculative.

@@ -481,6 +482,19 @@ systable_getnext(SysScanDesc sysscan)
  }
  }

+ if (TransactionIdIsValid(CheckXidAlive) &&
+ !TransactionIdIsInProgress(CheckXidAlive) &&
+ !TransactionIdDidCommit(CheckXidAlive))
+ ereport(ERROR,
+ (errcode(ERRCODE_TRANSACTION_ROLLBACK),
+ errmsg("transaction aborted during system catalog scan")));
s/transaction aborted/transaction aborted concurrently perhaps? Also,
can we move this check at the begining of the function? If the
condition fails, we can skip the sys scan.

Some of the checks looks repetative in the same file. Should we
declare them as inline functions?

Review comments from 0005-Implement-streaming*.patch

+static void
+AssertChangeLsnOrder(ReorderBufferTXN *txn)
+{
+#ifdef USE_ASSERT_CHECKING
+ dlist_iter iter;
...
+#endif
+}

We can implement the same as following:
#ifdef USE_ASSERT_CHECKING
static void
AssertChangeLsnOrder(ReorderBufferTXN *txn)
{
dlist_iter iter;
...
}
#else
#define AssertChangeLsnOrder(txn) ((void)true)
#endif

+ * if it is aborted we will report an specific error which we can ignore.  We
s/an specific/a specific

+ * Set the last last of the stream as the final lsn before calling
+ * stream stop.
s/last last/last

  PG_CATCH();
  {
+ MemoryContext ecxt = MemoryContextSwitchTo(ccxt);
+ ErrorData  *errdata = CopyErrorData();
When we don't re-throw, the errdata should be freed by calling
FreeErrorData(errdata), right?

+ /*
+ * Set the last last of the stream as the final lsn before
+ * calling stream stop.
+ */
+ txn->final_lsn = prev_lsn;
+ rb->stream_stop(rb, txn);
+
+ FlushErrorState();
+ }
stream_stop() can still throw some error, right? In that case, we
should flush the error state before calling stream_stop().

+ /*
+ * Remember the command ID and snapshot if transaction is streaming
+ * otherwise free the snapshot if we have copied it.
+ */
+ if (streaming)
+ {
+ txn->command_id = command_id;
+
+ /* Avoid copying if it's already copied. */
+ if (snapshot_now->copied)
+ txn->snapshot_now = snapshot_now;
+ else
+ txn->snapshot_now = ReorderBufferCopySnap(rb, snapshot_now,
+  txn, command_id);
+ }
+ else if (snapshot_now->copied)
+ ReorderBufferFreeSnap(rb, snapshot_now);
Hmm, it seems this part needs an assumption that after copying the
snapshot, no subsequent step can throw any error. If they do, then we
can again create a copy of the snapshot in catch block, which will
leak some memory. Is my understanding correct?

+ }
+ else
+ {
+ ReorderBufferCleanupTXN(rb, txn);
+ PG_RE_THROW();
+ }
Shouldn't we switch back to previously created error memory context
before re-throwing?

+ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
+{
+ ReorderBufferTXN *txn;
+ volatile Snapshot snapshot_now;
+ volatile CommandId command_id = FirstCommandId;
In the modified ReorderBufferCommit(), why is it necessary to declare
the above two variable as volatile? There is no try-catch block here.

@@ -1946,6 +2284,13 @@ ReorderBufferAbort(ReorderBuffer *rb,
TransactionId xid, XLogRecPtr lsn)
  if (txn == NULL)
  return;

+ /*
+ * When the (sub)transaction was streamed, notify the remote node
+ * about the abort only if we have sent any data for this transaction.
+ */
+ if (rbtxn_is_streamed(txn) && txn->any_data_sent)
+ rb->stream_abort(rb, txn, lsn);
+
s/When/If

+ /*
+ * When the (sub)transaction was streamed, notify the remote node
+ * about the abort.
+ */
+ if (rbtxn_is_streamed(txn))
+ rb->stream_abort(rb, txn, lsn);
s/When/If. And, in this case, if we've not sent any data, why should
we send the abort message (similar to the previous one)?

+ * Note: We never do both stream and serialize a transaction (we only spill
+ * to disk when streaming is not supported by the plugin), so only one of
+ * those two flags may be set at any given time.
+ */
+#define rbtxn_is_streamed(txn) \
+( \
+ ((txn)->txn_flags & RBTXN_IS_STREAMED) != 0 \
+)
Should we put any assert (not necessarily here) to validate the above comment?

+ txn = ReorderBufferLargestTopTXN(rb);
+
+ /* we know there has to be one, because the size is not zero */
+ Assert(txn && !txn->toptxn);
+ Assert(txn->size > 0);
+ Assert(rb->size >= txn->size);
The same three assertions are already there in ReorderBufferLargestTopTXN().

+static bool
+ReorderBufferCanStr

Re: Properly mark NULL returns in numeric aggregates

2020-04-13 Thread Tom Lane

Jesse Zhang  writes:
> On Fri, Apr 10, 2020 at 3:59 PM Tom Lane  wrote:
>> They can't be strict because the initial iteration needs to produce
>> something from a null state and non-null input.  nodeAgg's default
>> behavior won't work for those because nodeAgg doesn't know how to
>> copy a value of type "internal".

> Ah, I think I get it. A copy must happen because the input is likely in
> a shorter-lived memory context than the state, but nodeAgg's default
> behavior of copying a by-value datum won't really copy the object
> pointed to by the pointer wrapped in the datum of "internal" type, so we
> defer to the combine function. Am I right? Then it follows kinda
> naturally that those combine functions have been sloppy on arrival since
> commit 11c8669c0cc .

Yeah, they're relying exactly on the assumption that nodeAgg is not
going to try to copy a value declared "internal", and therefore they
can be loosey-goosey about whether the value pointer is null or not.
However, if you want to claim that that's wrong, you have to explain
why it's okay for some other code to be accessing a value that's
declared "internal".  I'd say that the meaning of that is precisely
"keepa u hands off".

In the case at hand, the current situation is that we only expect the
values returned by these combine functions to be read by the associated
final functions, which are on board with the null-pointer representation
of an empty result.  Your argument is essentially that it should be
possible to feed the values to the aggregate's associated serialization
function as well.  But the core code never does that, so I'm not convinced
that we should add it to the requirements; we'd be unable to test it.

regards, tom lane

Re: documenting the backup manifest file format

2020-04-13 Thread Robert Haas

On Mon, Apr 13, 2020 at 1:55 PM Justin Pryzby  wrote:
> typos:
> manifes
> hexademical (twice)

Thanks. v2 attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v2-0001-Document-the-backup-manifest-file-format.patch
Description: Binary data

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Alvaro Herrera

On 2020-Apr-13, Tom Lane wrote:

> As discussed in the thread at [1], I've been working on redesigning
> the tables we use to present SQL functions and operators.  The
> first installment of that is now up; see tables 9.30 and 9.31 at
> 
> https://www.postgresql.org/docs/devel/functions-datetime.html
> 
> and table 9.33 at
> 
> https://www.postgresql.org/docs/devel/functions-enum.html
> 
> Before I spend more time on this, I want to make sure that people
> are happy with this line of attack.  Comparing these tables to
> the way they look in v12, they clearly take more vertical space;
> but at least to my eye they're less cluttered and more readable.
> They definitely scale a lot better for cases where a long function
> description is needed, or where we'd like to have more than one
> example.

I am torn.  On the one side, I think this new format is so much better
than the old one that we should definitely use it for all tables.  On
the other side, I also think this format is slightly more complicated to
read, so perhaps it would be sensible to keep using the old format for
the simplest tables.

One argument for the first of those positions is that if this new table
layout is everywhere, it'll take less total time to get used to it.

One improvement (that I don't know is possible in docbook) would be to
have the inter-logical-row line be slightly thicker than the
intra-logical-row one.  That'd make each entry visually more obvious.

I think you already mentioned the PDF issue that these multi-row entries
are sometimes split across pages.  I cannot believe docbook is so stupid
not to have a solution to that problem, but I don't know what that
solution would be.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Isaac Morland

On Mon, 13 Apr 2020 at 13:13, Tom Lane  wrote:

> As discussed in the thread at [1], I've been working on redesigning
> the tables we use to present SQL functions and operators.  The
> first installment of that is now up; see tables 9.30 and 9.31 at
>
> https://www.postgresql.org/docs/devel/functions-datetime.html
>
> and table 9.33 at
>
> https://www.postgresql.org/docs/devel/functions-enum.html
>
> Before I spend more time on this, I want to make sure that people
> are happy with this line of attack.  Comparing these tables to
> the way they look in v12, they clearly take more vertical space;
> but at least to my eye they're less cluttered and more readable.
> They definitely scale a lot better for cases where a long function
> description is needed, or where we'd like to have more than one
> example.  Does anyone prefer the old way, or have a better idea?
>

I honestly don’t know. My initial reaction is a combination of “that’s
weird” and “that’s cool”. So a few comments, which shouldn’t be taken as
indicating a definite preference:

- showing the signature like this is interesting. For a moment I was
wondering why it doesn’t say, for example, "interval → interval → interval”
then I remembered this is Postgres, not Haskell. On the one hand, I like
putting the signature like this; on the other, I don’t like that the return
type is in a different place in each one. Could it be split into the same
two columns as the example(s); first column inputs, second column results?

- another possibility for the parameters: list each one on a separate line,
together with default (if applicable). Maybe that would be excessively
tall, but it would sure make completely clear just exactly how many
parameters there are and never wrap (well, maybe on a phone, but we can
only do so much).

- for the various current-time-related functions (age, current_time, etc.),
rather than saying “variable”, could it be the actual result with “now”
being taken to be a specific fixed time within the year in which the
documentation was generated? This would be really helpful for example with
being clear that current_time is only the time of day with no date.

- the specific fixed time should be something like (current year)-06-30
18:45:54. I’ve deliberately chosen all values to be outside of the range of
values with smaller ranges. For example, the hour is >12, the limit of the
month field.

- I think there should be much more distinctive lines between the different
functions. As it is the fact that the table is groups of 3 lines doesn’t
jump out at the eye.

Re: documenting the backup manifest file format

2020-04-13 Thread Justin Pryzby

On Mon, Apr 13, 2020 at 01:40:56PM -0400, Robert Haas wrote:
> On Fri, Mar 27, 2020 at 4:32 PM Andres Freund  wrote:
> > I don't like having a file format that's intended to be used by external
> > tools too that's undocumented except for code that assembles it in a
> > piecemeal fashion.  Do you mean in a follow-on patch this release, or
> > later? I don't have a problem with the former.
> 
> Here is a patch for that.

typos:
manifes
hexademical (twice)

-- 
Justin

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Erik Rijkers


On 2020-04-13 19:13, Tom Lane wrote:

As discussed in the thread at [1], I've been working on redesigning
the tables we use to present SQL functions and operators.  The
first installment of that is now up; see tables 9.30 and 9.31 at

https://www.postgresql.org/docs/devel/functions-datetime.html

and table 9.33 at

https://www.postgresql.org/docs/devel/functions-enum.html

Before I spend more time on this, I want to make sure that people
are happy with this line of attack.  Comparing these tables to
the way they look in v12, they clearly take more vertical space;
but at least to my eye they're less cluttered and more readable.
They definitely scale a lot better for cases where a long function
description is needed, or where we'd like to have more than one
example.  Does anyone prefer the old way, or have a better idea?



+1

In the pdf it is a big improvement; and the html is better too.

documenting the backup manifest file format

2020-04-13 Thread Robert Haas

On Fri, Mar 27, 2020 at 4:32 PM Andres Freund  wrote:
> I don't like having a file format that's intended to be used by external
> tools too that's undocumented except for code that assembles it in a
> piecemeal fashion.  Do you mean in a follow-on patch this release, or
> later? I don't have a problem with the former.

Here is a patch for that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v1-0001-Document-the-backup-manifest-file-format.patch
Description: Binary data

Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Robert Haas

On Mon, Apr 13, 2020 at 1:13 PM Tom Lane  wrote:
> As discussed in the thread at [1], I've been working on redesigning
> the tables we use to present SQL functions and operators.  The
> first installment of that is now up; see tables 9.30 and 9.31 at
>
> https://www.postgresql.org/docs/devel/functions-datetime.html
>
> and table 9.33 at
>
> https://www.postgresql.org/docs/devel/functions-enum.html
>
> Before I spend more time on this, I want to make sure that people
> are happy with this line of attack.  Comparing these tables to
> the way they look in v12, they clearly take more vertical space;
> but at least to my eye they're less cluttered and more readable.
> They definitely scale a lot better for cases where a long function
> description is needed, or where we'd like to have more than one
> example.  Does anyone prefer the old way, or have a better idea?

I find the new way quite hard to read. I prefer the old way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Properly mark NULL returns in numeric aggregates

2020-04-13 Thread Jesse Zhang

On Fri, Apr 10, 2020 at 3:59 PM Tom Lane  wrote:
>
> They can't be strict because the initial iteration needs to produce
> something from a null state and non-null input.  nodeAgg's default
> behavior won't work for those because nodeAgg doesn't know how to
> copy a value of type "internal".
>
> regards, tom lane

Ah, I think I get it. A copy must happen because the input is likely in
a shorter-lived memory context than the state, but nodeAgg's default
behavior of copying a by-value datum won't really copy the object
pointed to by the pointer wrapped in the datum of "internal" type, so we
defer to the combine function. Am I right? Then it follows kinda
naturally that those combine functions have been sloppy on arrival since
commit 11c8669c0cc .

Cheers,
Jesse

Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Tom Lane

As discussed in the thread at [1], I've been working on redesigning
the tables we use to present SQL functions and operators.  The
first installment of that is now up; see tables 9.30 and 9.31 at

https://www.postgresql.org/docs/devel/functions-datetime.html

and table 9.33 at

https://www.postgresql.org/docs/devel/functions-enum.html

Before I spend more time on this, I want to make sure that people
are happy with this line of attack.  Comparing these tables to
the way they look in v12, they clearly take more vertical space;
but at least to my eye they're less cluttered and more readable.
They definitely scale a lot better for cases where a long function
description is needed, or where we'd like to have more than one
example.  Does anyone prefer the old way, or have a better idea?

I know that the table headings are a bit weirdly laid out; hopefully
that can be resolved [2].

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/9326.1581457869%40sss.pgh.pa.us
[2] https://www.postgresql.org/message-id/6169.1586794603%40sss.pgh.pa.us

Re: cleaning perl code

2020-04-13 Thread Andrew Dunstan



On 4/12/20 3:42 AM, Noah Misch wrote:
> On Sat, Apr 11, 2020 at 12:13:08PM -0400, Andrew Dunstan wrote:
>> --- a/src/tools/msvc/Project.pm
>> +++ b/src/tools/msvc/Project.pm
>> @@ -420,13 +420,10 @@ sub read_file
>>  {
>>  my $filename = shift;
>>  my $F;
>> -my $t = $/;
>> -
>> -undef $/;
>> +local $/ = undef;
>>  open($F, '<', $filename) || croak "Could not open file $filename\n";
>>  my $txt = <$F>;
>>  close($F);
>> -$/ = $t;
> +1 for this and for the other three hunks like it.  The resulting code is
> shorter and more robust, so this is a good one-time cleanup.  It's not
> important to mandate this style going forward, so I wouldn't change
> perlcriticrc for this one.
>
>> --- a/src/tools/version_stamp.pl
>> +++ b/src/tools/version_stamp.pl
>> @@ -1,4 +1,4 @@
>> -#! /usr/bin/perl -w
>> +#! /usr/bin/perl
>>  
>>  #
>>  # version_stamp.pl -- update version stamps throughout the source tree
>> @@ -21,6 +21,7 @@
>>  #
>>  
>>  use strict;
>> +use warnings;
> This and the other "use warnings" additions look good.  I'm assuming you'd
> change perlcriticrc like this:
>
> +[TestingAndDebugging::RequireUseWarnings]
> +severity = 5



OK, I've committed all that stuff. I think that takes care of the
non-controversial part of what I proposed :-)


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: wrong relkind error messages

2020-04-13 Thread Andrew Dunstan



On 4/13/20 11:13 AM, Tom Lane wrote:
>
> ERROR:  cannot define statistics for relation "ti"
> DETAIL:  This operation is not supported for indexes.
>
> which still leaves implicit that "ti" is an index, but probably that's
> something the user can figure out.
>

+1 for this. It's clear and succinct.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: backup manifests

2020-04-13 Thread Robert Haas

On Sun, Apr 12, 2020 at 10:09 PM Fujii Masao
 wrote:
> I found other minor issues.

I think these are all correct fixes. Thanks for the post-commit
review, and sorry for this mistakes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: wrong relkind error messages

2020-04-13 Thread Tom Lane

Peter Eisentraut  writes:
> I'm not a fan of error messages like
>  relation "%s" is not a table, foreign table, or materialized view

Agreed, they're not great.

> For example:

> -ERROR:  relation "ti" is not a table, foreign table, or materialized view
> +ERROR:  cannot define statistics for relation "ti"
> +DETAIL:  "ti" is an index.

I see where you'e going, and it seems like a generally-better idea,
but I feel like this phrasing is omitting some critical background
information that users don't necessarily have.  At the very least
it's not stating clearly that the failure is *because* ti is an
index.  More generally, the whole concept that statistics can only
be defined for certain kinds of relations has disappeared from view.
I fear that users who're less deeply into Postgres hacking than we
are might not have that concept at all, or at least it might not
come to mind immediately when they get this message.

Fixing this while avoiding your concern about proliferation of messages
seems a bit difficult though.  The best I can do after a couple minutes'
thought is

ERROR:  cannot define statistics for relation "ti"
DETAIL:  "ti" is an index, and this operation is not supported for that
kind of relation.

which seems a little long and awkward.  Another idea is

ERROR:  cannot define statistics for relation "ti"
DETAIL:  This operation is not supported for indexes.

which still leaves implicit that "ti" is an index, but probably that's
something the user can figure out.

Maybe someone else can do better?

regards, tom lane

Re: pg_basebackup, manifests and backends older than ~12

2020-04-13 Thread Robert Haas

On Sun, Apr 12, 2020 at 8:56 PM Michael Paquier  wrote:
> On Sun, Apr 12, 2020 at 08:08:17AM +0900, Michael Paquier wrote:
> > Exactly.  My point is exactly that.  The current code would force
> > users maintaining scripts with pg_basebackup to use --no-manifest if
> > such a script runs with older versions of Postgres, but we should
> > encourage users not do to that because we want them to use manifests
> > with backend versions where they are supported.
>
> Please note that I have added an open item for this thread, and
> attached is a proposal of patch.  While reading the code, I have
> noticed that the minimum version handling is not consistent with the
> other MINIMUM_VERSION_*, so I have added one for manifests.

I think that this patch is incorrect. I have no objection to
introducing MINIMUM_VERSION_FOR_MANIFESTS, but this is not OK:

- else
- {
- if (serverMajor < 1300)
- manifest_clause = "";
- else
- manifest_clause = "MANIFEST 'no'";
- }

It seems to me that this will break --no-manifest option on v13.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: wrong relkind error messages

2020-04-13 Thread Robert Haas

On Mon, Apr 13, 2020 at 9:55 AM Peter Eisentraut
 wrote:
> Attached is another attempt to improve this.

Nice effort. Most of these seem like clear improvements, but some I don't like:

+ errmsg("relation \"%s\" is of unsupported kind",
+ RelationGetRelationName(rel)),
+ errdetail_relkind(RelationGetRelationName(rel), rel->rd_rel->relkind)));

It would help to work "pgstattuple" into the message somehow. "cannot
use pgstattuple on relation \"%s\"", perhaps?

+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("action cannot be performed on relation \"%s\"",
+ RelationGetRelationName(rel)),

Super-vague.

+ errmsg("cannot set relation options of relation \"%s\"",
+ RelationGetRelationName(rel)),

I suggest "cannot set options for relation \"%s\""; that is, use "for"
instead of "of", and don't say "relation" twice.

+ errmsg("cannot create trigger on relation \"%s\"",
+ RelationGetRelationName(rel)),
+ errmsg("relation \"%s\" cannot have triggers",
+ RelationGetRelationName(rel)),
+ errmsg("relation \"%s\" cannot have triggers",
+ rv->relname),

Maybe use the second wording for all three? And similarly for rules?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_validatebackup -> pg_verifybackup?

2020-04-13 Thread Robert Haas

On Sun, Apr 12, 2020 at 10:20 PM Shinoda, Noriyoshi (PN Japan A&PS
Delivery)  wrote:
> Sorry this email is not a discussion about word selection.
> Since part of the manual had left pg_validatebackup in commit 
> dbc60c5593f26dc777a3be032bff4fb4eab1ddd1.
> I've attached a patch to fix this.

Committed, thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: spin_delay() for ARM

2020-04-13 Thread Amit Khandekar

On Sat, 11 Apr 2020 at 04:18, Tom Lane  wrote:
>
> I wrote:
> > A more useful test would be to directly experiment with contended
> > spinlocks.  As I recall, we had some test cases laying about when
> > we were fooling with the spin delay stuff on Intel --- maybe
> > resurrecting one of those would be useful?
>
> The last really significant performance testing we did in this area
> seems to have been in this thread:
>
>
https://www.postgresql.org/message-id/flat/CA%2BTgmoZvATZV%2BeLh3U35jaNnwwzLL5ewUU_-t0X%3DT0Qwas%2BZdA%40mail.gmail.com
>
> A relevant point from that is Haas' comment
>
> I think optimizing spinlocks for machines with only a few CPUs is
> probably pointless.  Based on what I've seen so far, spinlock
> contention even at 16 CPUs is negligible pretty much no matter what
> you do.  Whether your implementation is fast or slow isn't going to
> matter, because even an inefficient implementation will account for
> only a negligible percentage of the total CPU time - much less than 1%
> - as opposed to a 64-core machine, where it's not that hard to find
> cases where spin-waits consume the *majority* of available CPU time
> (recall previous discussion of lseek).

Yeah, will check if I find some machines with large cores.


> So I wonder whether this patch is getting ahead of the game.  It does
> seem that ARM systems with a couple dozen cores exist, but are they
> common enough to optimize for yet?  Can we even find *one* to test on
> and verify that this is a win and not a loss?  (Also, seeing that
> there are so many different ARM vendors, results from just one
> chipset might not be too trustworthy ...)

Ok. Yes, it would be worth waiting to see if there are others in the
community with ARM systems that have implemented YIELD. May be after that
we might gain some confidence. I myself also hope that I will get one soon
to test, but right now I have one that does not support it, so it will be
just a no-op.

-- 
Thanks,
-Amit Khandekar
Huawei Technologies
-- 
Thanks,
-Amit Khandekar
Huawei Technologies

Re: spin_delay() for ARM

2020-04-13 Thread Amit Khandekar

On Sat, 11 Apr 2020 at 00:47, Andres Freund  wrote:
>
> Hi,
>
> On 2020-04-10 13:09:13 +0530, Amit Khandekar wrote:
> > On my Intel Xeon machine with 8 cores, I tried to test PAUSE also
> > using a sample C program (attached spin.c). Here, many child processes
> > (much more than CPUs) wait in a tight loop for a shared variable to
> > become 0, while the parent process continuously increments a sequence
> > number for a fixed amount of time, after which, it sets the shared
> > variable to 0. The child's tight loop calls PAUSE in each iteration.
> > What I hoped was that because of PAUSE in children, the parent process
> > would get more share of the CPU, due to which, in a given time, the
> > sequence number will reach a higher value. Also, I expected the CPU
> > cycles spent by child processes to drop down, thanks to PAUSE. None of
> > these happened. There was no change.
>
> > Possibly, this testcase is not right. Probably the process preemption
> > occurs only within the set of hyperthreads attached to a single core.
> > And in my testcase, the parent process is the only one who is ready to
> > run. Still, I have anyway attached the program (spin.c) for archival;
> > in case somebody with a YIELD-supporting ARM machine wants to use it
> > to test YIELD.
>
> PAUSE doesn't operate on the level of the CPU scheduler. So the OS won't
> just schedule another process - you won't see different CPU usage if you
> measure it purely as the time running.

Yeah, I thought that the OS scheduling would be an *indirect* consequence
of the pause because of it's slowing down the CPU, but looks like that does
not happen.


> You should be able to see a
> difference if you measure with a profiler that shows you data from the
> CPUs performance monitoring unit.
Hmm, I had tried with perf and could see the pause itself consuming 5% cpu.
But I haven't yet played with per-process figures.



-- 
Thanks,
-Amit Khandekar
Huawei Technologies
-- 
Thanks,
-Amit Khandekar
Huawei Technologies

Re: execExprInterp() questions / How to improve scalar array op expr eval?

2020-04-13 Thread James Coleman

I've read through all of the previous discussions related to stable
subexpression caching, and I'm planning to send a summary email with
all of those links in one place.

But I also happened to stumble upon mention in the TODO of some email
discussion way back in 2007 where Tom suggested [1] we should really
try planning scalar array ops (particularly those with large IN lists)
as `IN (VALUES ...)`.

That actually would solve the specific case I'd had this problem with
(seq scan on a large constant array IN expression). Ideally any query
with forms like:
select * from t where a in (1, 2,...)
select * from t where a in ((select i from x))
would always be isomorphic in planning. But thinking about this
overnight and scanning through things quickly this morning, I have a
feeling that'd be 1.) a pretty significant undertaking, and 2.) likely
to explode the number of plans considered.

Also I don't know if there's a good place to slot that into planning.
Do either of you happen to have any pointers into places that do
similar kinds of rewrites I could look at? And in those cases do we
normally always rewrite or do we consider both styles independently?

I suppose _only_ handling the case where a `IN (VALUES ...)` replaces
a seq scan with a scalar array op might be somewhat easier...but feels
like it leaves a lot of holes.

I'm still at the point where I'm trying to determine if any of the
above (subexpression caching, saop optimization only on constants,
re-planning as `IN (VALUES ...)`) is something reasonable enough
relative to the amount of effort to be worth working on.

James

[1]: https://www.postgresql.org/message-id/19001.1178823208%40sss.pgh.pa.us

Re: where should I stick that backup?

2020-04-13 Thread Robert Haas

On Sun, Apr 12, 2020 at 9:18 PM Stephen Frost  wrote:
> There's two different questions we're talking about here and I feel like
> they're being conflated.  To try and clarify:
>
> - Could you implement FDWs with shell scripts, and custom programs?  I'm
>   pretty confident that the answer is yes, but the thrust of that
>   argument is primarily to show that you *can* implement just about
>   anything using a shell script "API", so just saying it's possible to
>   do doesn't make it necessarily a good solution.  The FDW system is
>   complicated, and also good, because we made it so and because it's
>   possible to do more sophisticated things with a C API, but it could
>   have started out with shell scripts that just returned data in much
>   the same way that COPY PROGRAM works today.  What matters is that
>   forward thinking to consider what you're going to want to do tomorrow,
>   not just thinking about how you can solve for the simple cases today
>   with a shell out to an existing command.
>
> - Does providing a C-library interface deter people from implementing
>   solutions that use that interface?  Perhaps it does, but it doesn't
>   have nearly the dampening effect that is being portrayed here, and we
>   can see that pretty clearly from the FDW situation.  Sure, not all of
>   those are good solutions, but lots and lots of archive command shell
>   scripts are also pretty terrible, and there *are* a few good solutions
>   out there, including the ones that we ourselves ship.  At least when
>   it comes to FDWs, there's an option there for us to ship a *good*
>   answer ourselves for certain (and, in particular, the very very
>   common) use-cases.
>
> > - We're only talking about writing a handful of tar files, and that's
> > in the context of a full-database backup, which is a much
> > heavier-weight operation than a query.
>
> This is true for -Ft, but not -Fp, and I don't think there's enough
> thought being put into this when it comes to parallelism and that you
> don't want to be limited to one process per tablespace.
>
> > - There is not really any state that needs to be maintained across calls.
>
> As mentioned elsewhere, this isn't really true.

These are fair points, and my thinking has been somewhat refined by
this discussion, so let me try to clarify my (current) position a bit.
I believe that there are two subtly different questions here.

Question #1 is "Would it be useful to people to be able to pipe the
tar files that they get from pg_basebackup into some other command
rather than writing them to the filesystem, and should we give them
the option to do so?"

Question #2 is "Is piping the tar files that pg_basebackup would
produce into some other program the best possible way of providing
more flexibility about where backups get written?"

I'm prepared to concede that the answer to question #2 is no. I had
earlier assumed that establishing connections was pretty fast and
that, even if not, there were solutions to that problem, like setting
up an SSH tunnel in advance. Several people have said, well, no,
establishing connections is a problem. As I acknowledged from the
beginning, plain format backups are a problem. So I think a convincing
argument has been made that a shell command won't meet everyone's
needs, and a more complex API is required for some cases.

But I still think the answer to question #1 is yes. I disagree
entirely with any argument to the effect that because some users might
do unsafe things with the option, we ought not to provide it.
Practically speaking, it would work fine for many people even with no
other changes, and if we add something like pgfile, which I'm willing
to do, it would work for more people in more situations. It is a
useful thing to have, full stop.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Lexer issues

2020-04-13 Thread Patrick REED

Hi,

I am experimenting with postgres and am wondering if there is any tutorial
on how to properly add a new command to postgres.

I want to add a new constraint on "CREATE ROLE" that requires an integer,
it has an identifier that is not a known (reserved or unreserved keyword)
in postgres, say we call it TestPatrick. In other words, I want to do this
"CREATE ROLE X TestPatrick=10". I am having an issue with having postgres
recognize my new syntax.

I have seen this video: https://www.youtube.com/watch?v=uSEXTcEiXGQ and was
able to add have my postgres compile with my added word (modified gram.y,
kwlist.h, gram.cpp etc based on the video). However, when I use my syntax
on a client session, it still doesn't recognize my syntax... Are there any
specific lexer changes I need to make? I followed the example of CONNECTION
LIMIT and tried to mimic it for Create ROLE.

Best,
Patrick

Re: where should I stick that backup?

2020-04-13 Thread Bruce Momjian

On Sun, Apr 12, 2020 at 09:18:28PM -0400, Stephen Frost wrote:
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Fri, Apr 10, 2020 at 10:54:10AM -0400, Stephen Frost wrote:
> > > * Robert Haas (robertmh...@gmail.com) wrote:
> > > > On Thu, Apr 9, 2020 at 6:44 PM Bruce Momjian  wrote:
> > > > > Good point, but if there are multiple APIs, it makes shell script
> > > > > flexibility even more useful.
> > > > 
> > > > This is really the key point for me. There are so many existing tools
> > > > that store a file someplace that we really can't ever hope to support
> > > > them all in core, or even to have well-written extensions that support
> > > > them all available on PGXN or wherever. We need to integrate with the
> > > > tools that other people have created, not try to reinvent them all in
> > > > PostgreSQL.
> > > 
> > > So, this goes to what I was just mentioning to Bruce independently- you
> > > could have made the same argument about FDWs, but it just doesn't
> > > actually hold any water.  Sure, some of the FDWs aren't great, but
> > > there's certainly no shortage of them, and the ones that are
> > > particularly important (like postgres_fdw) are well written and in core.
> > 
> > No, no one made that argument.  It isn't clear how a shell script API
> > would map to relational database queries.  The point is how well the
> > APIs match, and then if they are close, does it give us the flexibility
> > we need.  You can't just look at flexibility without an API match.
> 
> If what we're talking about is the file_fdw, which certainly isn't very
> complicated, it's not hard to see how you could use shell scripts for
> it.  What happens is that it starts to get harder and require custom
> code when you want to do something more complex- which is very nearly
> what we're talking about here too.  Sure, for a simple 'bzip2' filter, a
> shell script might be alright, but it's not going to cut it for the more
> complex use-cases that users, today, expect solutions to.

Well, file_fdw is the simplest FDW, and we might have been able to do
that in shell script, but almost all the other FDWs couldn't, so we
might as well choose a C API for FDWs and use the same one for file_fdw.
It seems like basic engineering that you choose the closest API that
meets most of your deployment requirements, and meets all of the
required ones.

> To that end, if we contemplate adding support for some cloud vendor's
> storage, as an example, and discover that the command line tools for it
> suck or don't meet our expectations, I'd expect us to either refuse to
> support it, or to forgo using the command-line tools and instead
> implement support for talking to the cloud storage interface directly,
> if it works well.

Do we choose a more inflexible API on a hypothetical risk?

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

wrong relkind error messages

2020-04-13 Thread Peter Eisentraut


I'm not a fan of error messages like

relation "%s" is not a table, foreign table, or materialized view

It doesn't tell me what's wrong, it only tells me what else could have 
worked.  It's also tedious to maintain and the number of combinations 
grows over time.


This was discussed many years ago in [0], with the same arguments, and 
there appeared to have been general agreement to change this, but then 
the thread stalled somehow on some technical details.


Attached is another attempt to improve this.  I have rewritten the 
primary error messages using the principle of "cannot do this with that" 
and then added a detail message to show what relkind the object has. 
For example:


-ERROR:  relation "ti" is not a table, foreign table, or materialized view
+ERROR:  cannot define statistics for relation "ti"
+DETAIL:  "ti" is an index.

and

-ERROR:  "test_foreign_table" is not a table, materialized view, or 
TOAST table

+ERROR:  relation "test_foreign_table" does not have a visibility map
+DETAIL:  "test_foreign_table" is a foreign table.

You can see more instances of this in the test diffs in the attached patch.

In passing, I also changed a few places to use the RELKIND_HAS_STORAGE() 
macro.  This is related because it allows writing more helpful error 
messages, such as in pgstatindex.c.


One question on a detail arose:

check_relation_relkind() in pg_visibility.c accepts RELKIND_RELATION, 
RELKIND_MATVIEW, and RELKIND_TOASTVALUE, but pgstatapprox.c only accepts 
RELKIND_RELATION and RELKIND_MATVIEW, even though they both look for a 
visibility map.  Is that an intentional omission?  If so, it should be 
commented better.



[0]: 
https://www.postgresql.org/message-id/flat/AANLkTimR_sZ_wKd1cgqVG1PEvTvdr9j7zD%2B3_NPvfaa_%40mail.gmail.com


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 2e32dce36cd34e6d58c99067969f98de1bb1264b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 13 Apr 2020 15:29:46 +0200
Subject: [PATCH] Improve error messages about mismatching relkind

Most error messages about a relkind that was not supported or
appropriate for the command was of the pattern

"relation \"%s\" is not a table, foreign table, or materialized view"

This style can become verbose and tedious to maintain.  Moreover, it's
not very helpful: If I'm trying to create a comment on a TOAST table,
which is not supported, then the information that I could have created
a comment on a materialized view is pointless.

Instead, write the primary error message shorter and saying more
directly that what was attempted is not supported or possible.  Then,
in the detail message show show relkind the object was.  To simplify
that, add a new function errdetail_relkind() that does this.

In passing, make use of RELKIND_HAS_STORAGE() where appropriate,
instead of listing out the relkinds individually.
---
 .../pg_visibility/expected/pg_visibility.out  | 75 ++--
 contrib/pg_visibility/pg_visibility.c |  5 +-
 contrib/pgstattuple/expected/pgstattuple.out  | 18 ++--
 contrib/pgstattuple/pgstatapprox.c|  5 +-
 contrib/pgstattuple/pgstatindex.c | 11 +--
 src/backend/catalog/Makefile  |  1 +
 src/backend/catalog/heap.c|  7 +-
 src/backend/catalog/pg_class.c| 51 +++
 src/backend/catalog/toasting.c|  6 +-
 src/backend/commands/comment.c|  5 +-
 src/backend/commands/indexcmds.c  | 16 +---
 src/backend/commands/lockcmds.c   |  5 +-
 src/backend/commands/seclabel.c   |  5 +-
 src/backend/commands/sequence.c   |  5 +-
 src/backend/commands/statscmds.c  |  5 +-
 src/backend/commands/tablecmds.c  | 88 ---
 src/backend/commands/trigger.c| 15 ++--
 src/backend/parser/parse_utilcmd.c|  5 +-
 src/backend/postmaster/pgstat.c   |  6 +-
 src/backend/rewrite/rewriteDefine.c   |  8 +-
 src/backend/utils/adt/dbsize.c| 73 ++-
 src/include/catalog/pg_class.h|  1 +
 src/test/regress/expected/alter_table.out |  9 +-
 .../regress/expected/create_table_like.out|  3 +-
 src/test/regress/expected/foreign_data.out|  6 +-
 src/test/regress/expected/indexing.out|  3 +-
 src/test/regress/expected/sequence.out|  3 +-
 src/test/regress/expected/stats_ext.out   | 12 ++-
 28 files changed, 235 insertions(+), 217 deletions(-)
 create mode 100644 src/backend/catalog/pg_class.c

diff --git a/contrib/pg_visibility/expected/pg_visibility.out 
b/contrib/pg_visibility/expected/pg_visibility.out
index 2abc1b5107..01ff1361d4 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -41,66 +41,91 @@ ROLLBACK;
 create table test_partitioned (a int) partition by list (a)

Re: weird hash plan cost, starting with pg10

2020-04-13 Thread Tom Lane

Richard Guo  writes:
> At first I was wondering if we need to check whether HashState.hashtable
> is not NULL in ExecShutdownHash() before we decide to allocate save
> space for HashState.hinstrument. And then I convinced myself that that's
> not necessary since HashState.hinstrument and HashState.hashtable cannot
> be both NULL there.

Even if the hashtable is null at that point, creating an all-zeroes
hinstrument struct is harmless.

regards, tom lane

Re: WAL usage calculation patch

2020-04-13 Thread Julien Rouhaud

Le lun. 13 avr. 2020 à 13:47, Amit Kapila  a
écrit :

> On Mon, Apr 13, 2020 at 1:10 PM Julien Rouhaud  wrote:
> >
> > On Mon, Apr 13, 2020 at 8:11 AM Amit Kapila 
> wrote:
> > >
> > > On Sat, Apr 11, 2020 at 6:55 PM Julien Rouhaud 
> wrote:
> > > >
> > > > On Fri, Apr 10, 2020 at 9:37 PM Julien Rouhaud 
> wrote:
> > > > >
> > > > I tried to take into account all that have been discussed, but I have
> > > > to admit that I'm absolutely not sure of what was actually decided
> > > > here.  I went with those changes:
> > > >
> > > > - rename wal_num_fpw to wal_fpw for consistency, both in pgss view
> > > > fiel name but also everywhere in the code
> > > > - change comments to consistently mention "full page writes
> generated"
> > > > - changed pgss and explain documentation to mention "full page images
> > > > generated", from Justin's patch on another thread
> > > >
> > >
> > > I think it is better to use "full page writes" to be consistent with
> > > other places.
> > >
> > > > - kept "amount" of WAL bytes
> > > >
> > >
> > > Okay, but I would like to make another change suggested by Justin
> > > which is to replace "count" with "number" at a few places.
> >
> > Ah sorry I missed this one.  +1 it also sounds better.
> >
> > > I have made the above two changes in the attached.  Let me know what
> > > you think about attached?
> >
> > It all looks good to me!
> >
>
> Pushed.
>

Thanks a lot Amit!

>

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-04-13 Thread Dilip Kumar

On Mon, Apr 13, 2020 at 6:12 PM Kuntal Ghosh  wrote:
>
> On Mon, Apr 13, 2020 at 5:20 PM Dilip Kumar  wrote:
> > On Mon, Apr 13, 2020 at 4:14 PM Kuntal Ghosh  
> > wrote:
> > >
> > > +#define SizeOfTransactionId (sizeof(TransactionId) + sizeof(char))
> > > This looks wrong. We should change the name of this Macro or we can
> > > add the 1 byte directly in HEADER_SCRATCH_SIZE and some comments.
> >
> > I think this is in sync with below code (SizeOfXlogOrigin),  SO doen't
> > make much sense to add different terminology no?
> > #define SizeOfXlogOrigin (sizeof(RepOriginId) + sizeof(char))
> > +#define SizeOfTransactionId (sizeof(TransactionId) + sizeof(char))
> >
> In that case, we can rename this, for example, SizeOfXLogTransactionId.

Make sense.

>
> Some review comments from 0002-Issue-individual-*.path,
>
> +void
> +ReorderBufferAddInvalidation(ReorderBuffer *rb, TransactionId xid,
> + XLogRecPtr lsn, int nmsgs,
> + SharedInvalidationMessage *msgs)
> +{
> + MemoryContext oldcontext;
> + ReorderBufferChange *change;
> +
> + /* XXX Should we even write invalidations without valid XID? */
> + if (xid == InvalidTransactionId)
> + return;
> +
> + Assert(xid != InvalidTransactionId);
>
> It seems we don't call the function if xid is not valid. In fact,
>
> @@ -281,6 +281,24 @@ DecodeXactOp(LogicalDecodingContext *ctx,
> XLogRecordBuffer *buf)
>   }
>   case XLOG_XACT_ASSIGNMENT:
>   break;
> + case XLOG_XACT_INVALIDATIONS:
> + {
> + TransactionId xid;
> + xl_xact_invalidations *invals;
> +
> + xid = XLogRecGetXid(r);
> + invals = (xl_xact_invalidations *) XLogRecGetData(r);
> +
> + if (!TransactionIdIsValid(xid))
> + break;
> +
> + ReorderBufferAddInvalidation(reorder, xid, buf->origptr,
> + invals->nmsgs, invals->msgs);
>
> Why should we insert an WAL record for such cases?

I think we can avoid this.  I will analyze and send update in my next patch.

>
> + * When wal_level=logical, write invalidations into WAL at each command end 
> to
> + *  support the decoding of the in-progress transaction.  As of now it was
> + *  enough to log invalidation only at commit because we are only decoding 
> the
> + *  transaction at the commit time.   We only need to log the catalog cache 
> and
> + *  relcache invalidation.  There can not be any active MVCC scan in logical
> + *  decoding so we don't need to log the snapshot invalidation.
> The alignment is not right.

Will fix.

>  /*
>   * CommandEndInvalidationMessages
> - * Process queued-up invalidation messages at end of one command
> - * in a transaction.
> + *  Process queued-up invalidation messages at end of one command
> + *  in a transaction.
> Looks unnecessary changes.

Will fix.

>
>   * Note:
> - * This should be called during CommandCounterIncrement(),
> - * after we have advanced the command ID.
> + *  This should be called during CommandCounterIncrement(),
> + *  after we have advanced the command ID.
>   */
> Looks unnecessary changes.

Will fix.

>   if (transInvalInfo == NULL)
> - return;
> + return;
> Looks unnecessary changes.
>
> + /* prepare record */
> + memset(&xlrec, 0, sizeof(xlrec));
> We should use MinSizeOfXactInvalidations, no?

Right.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-04-13 Thread Kuntal Ghosh

On Mon, Apr 13, 2020 at 5:20 PM Dilip Kumar  wrote:
> On Mon, Apr 13, 2020 at 4:14 PM Kuntal Ghosh  
> wrote:
> >
> > +#define SizeOfTransactionId (sizeof(TransactionId) + sizeof(char))
> > This looks wrong. We should change the name of this Macro or we can
> > add the 1 byte directly in HEADER_SCRATCH_SIZE and some comments.
>
> I think this is in sync with below code (SizeOfXlogOrigin),  SO doen't
> make much sense to add different terminology no?
> #define SizeOfXlogOrigin (sizeof(RepOriginId) + sizeof(char))
> +#define SizeOfTransactionId (sizeof(TransactionId) + sizeof(char))
>
In that case, we can rename this, for example, SizeOfXLogTransactionId.

Some review comments from 0002-Issue-individual-*.path,

+void
+ReorderBufferAddInvalidation(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr lsn, int nmsgs,
+ SharedInvalidationMessage *msgs)
+{
+ MemoryContext oldcontext;
+ ReorderBufferChange *change;
+
+ /* XXX Should we even write invalidations without valid XID? */
+ if (xid == InvalidTransactionId)
+ return;
+
+ Assert(xid != InvalidTransactionId);

It seems we don't call the function if xid is not valid. In fact,

@@ -281,6 +281,24 @@ DecodeXactOp(LogicalDecodingContext *ctx,
XLogRecordBuffer *buf)
  }
  case XLOG_XACT_ASSIGNMENT:
  break;
+ case XLOG_XACT_INVALIDATIONS:
+ {
+ TransactionId xid;
+ xl_xact_invalidations *invals;
+
+ xid = XLogRecGetXid(r);
+ invals = (xl_xact_invalidations *) XLogRecGetData(r);
+
+ if (!TransactionIdIsValid(xid))
+ break;
+
+ ReorderBufferAddInvalidation(reorder, xid, buf->origptr,
+ invals->nmsgs, invals->msgs);

Why should we insert an WAL record for such cases?

+ * When wal_level=logical, write invalidations into WAL at each command end to
+ *  support the decoding of the in-progress transaction.  As of now it was
+ *  enough to log invalidation only at commit because we are only decoding the
+ *  transaction at the commit time.   We only need to log the catalog cache and
+ *  relcache invalidation.  There can not be any active MVCC scan in logical
+ *  decoding so we don't need to log the snapshot invalidation.
The alignment is not right.

 /*
  * CommandEndInvalidationMessages
- * Process queued-up invalidation messages at end of one command
- * in a transaction.
+ *  Process queued-up invalidation messages at end of one command
+ *  in a transaction.
Looks unnecessary changes.

  * Note:
- * This should be called during CommandCounterIncrement(),
- * after we have advanced the command ID.
+ *  This should be called during CommandCounterIncrement(),
+ *  after we have advanced the command ID.
  */
Looks unnecessary changes.

  if (transInvalInfo == NULL)
- return;
+ return;
Looks unnecessary changes.

+ /* prepare record */
+ memset(&xlrec, 0, sizeof(xlrec));
We should use MinSizeOfXactInvalidations, no?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-13 Thread Amit Kapila

On Mon, Apr 13, 2020 at 4:23 PM Masahiko Sawada
 wrote:
>
> On Mon, 13 Apr 2020 at 18:25, Amit Kapila  wrote:
> >
> > On Fri, Apr 10, 2020 at 7:05 PM Justin Pryzby  wrote:
> > >
> > >
> > > No problem.  I think I was trying to make my text similar to that from
> > > 14a4f6f37.
> > >
> > > I realized that I didn't sq!uash my last patch, so it didn't include the
> > > functional change (which is maybe what Robert was referring to).
> > >
> >
> > I think it is better to add a new test for temporary table which has
> > less data.  We don't want to increase test timings to test the
> > combination of options.  I changed that in the attached patch.  I will
> > commit this tomorrow unless you or anyone else has any more comments.
> >
>
> Thank you for updating the patch!
>
> I think we can update the documentation as well. Currently, the
> documentation says "This option can't be used with the FULL option."
> but we can say instead, for example, "VACUUM FULL can't use parallel
> vacuum.".
>

I am not very sure about this. I don't think the current text is wrong
especially when you see the value we can specify there is described
as: "Specifies a non-negative integer value passed to the selected
option.".  However, we can consider changing it if others also think
the proposed text or something like that is better than current text.

> Also, I'm concerned that the documentation says that VACUUM FULL
> cannot use parallel vacuum and we compute the parallel degree when
> PARALLEL option is omitted, but the following command is accepted:
>
> postgres(1:55514)=# vacuum (full on) test;
> VACUUM
>
> Instead, we can say:
>
> In plain VACUUM (without FULL), if the PARALLEL option is omitted,
> then VACUUM decides the number of workers based on the number of
> indexes that support parallel vacuum operation on the relation which
> is further limited by max_parallel_maintenance_workers.
>
> (it just adds "In plain VACUUM (without FULL)" to the beginning of the
> original sentence.)
>

Yeah, something on these lines would be a good idea. Note that, we are
already planning to slightly change this particular sentence in
another patch [1].

[1] - 
https://www.postgresql.org/message-id/20200322021801.GB2563%40telsasoft.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-04-13 Thread Dilip Kumar

On Mon, Apr 13, 2020 at 4:14 PM Kuntal Ghosh  wrote:
>
> On Thu, Apr 9, 2020 at 2:40 PM Dilip Kumar  wrote:
> >
> > I have rebased the patch on the latest head.  I haven't yet changed
> > anything for xid assignment thing because it is not yet concluded.
> >
> Some review comments from 0001-Immediately-WAL-log-*.patch,
>
> +bool
> +IsSubTransactionAssignmentPending(void)
> +{
> + if (!XLogLogicalInfoActive())
> + return false;
> +
> + /* we need to be in a transaction state */
> + if (!IsTransactionState())
> + return false;
> +
> + /* it has to be a subtransaction */
> + if (!IsSubTransaction())
> + return false;
> +
> + /* the subtransaction has to have a XID assigned */
> + if (!TransactionIdIsValid(GetCurrentTransactionIdIfAny()))
> + return false;
> +
> + /* and it needs to have 'assigned' */
> + return !CurrentTransactionState->assigned;
> +
> +}
> IMHO, it's important to reduce the complexity of this function since
> it's been called for every WAL insertion. During the lifespan of a
> transaction, any of these if conditions will only be evaluated if
> previous conditions are true. So, we can maintain some state machine
> to avoid multiple evaluation of a condition inside a transaction. But,
> if the overhead is not much, it's not worth I guess.

Yeah maybe, in some cases we can avoid checking multiple conditions by
maintaining that state.  But, that state will have to be at the
transaction level.  But, I am not sure how much worth it will be to
add one extra condition to skip a few if checks and it will also add
the code complexity.  And, in some cases where logical decoding is not
enabled, it may add one extra check? I mean first check the state and
that will take you to the first if check.

>
> +#define SizeOfTransactionId (sizeof(TransactionId) + sizeof(char))
> This looks wrong. We should change the name of this Macro or we can
> add the 1 byte directly in HEADER_SCRATCH_SIZE and some comments.

I think this is in sync with below code (SizeOfXlogOrigin),  SO doen't
make much sense to add different terminology no?
#define SizeOfXlogOrigin (sizeof(RepOriginId) + sizeof(char))
+#define SizeOfTransactionId (sizeof(TransactionId) + sizeof(char))

>
> @@ -195,6 +197,10 @@ XLogResetInsertion(void)
>  {
>   int i;
>
> + /* reset the subxact assignment flag (if needed) */
> + if (curinsert_flags & XLOG_INCLUDE_XID)
> + MarkSubTransactionAssigned();
> The comment looks contradictory.
>
>  XLogSetRecordFlags(uint8 flags)
>  {
>   Assert(begininsert_called);
> - curinsert_flags = flags;
> + curinsert_flags |= flags;
>  }
>  I didn't understand why we need this change in this patch.

I think it's changed so that below code can use it, but we have
directly set the flag.  I think I will change in the next version.

@@ -748,6 +754,18 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
  scratch += sizeof(replorigin_session_origin);
  }

+ /* followed by toplevel XID, if not already included in previous record */
+ if (IsSubTransactionAssignmentPending())
+ {
+ TransactionId xid = GetTopTransactionIdIfAny();
+
+ /* update the flag (later used by XLogInsertRecord) */
+ curinsert_flags |= XLOG_INCLUDE_XID;

>
> + txid = XLogRecGetTopXid(record);
> +
> + /*
> + * If the toplevel_xid is valid, we need to assign the subxact to the
> + * toplevel transaction. We need to do this for all records, hence we
> + * do it before the switch.
> + */
> s/toplevel_xid/toplevel xid or s/toplevel_xid/txid

Okay, we can change

>   if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT &&
> - info != XLOG_XACT_ASSIGNMENT)
> + !TransactionIdIsValid(r->toplevel_xid))
> Perhaps, XLogRecGetTopXid() can be used.

ok

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: WAL usage calculation patch

2020-04-13 Thread Amit Kapila

On Mon, Apr 13, 2020 at 1:10 PM Julien Rouhaud  wrote:
>
> On Mon, Apr 13, 2020 at 8:11 AM Amit Kapila  wrote:
> >
> > On Sat, Apr 11, 2020 at 6:55 PM Julien Rouhaud  wrote:
> > >
> > > On Fri, Apr 10, 2020 at 9:37 PM Julien Rouhaud  wrote:
> > > >
> > > I tried to take into account all that have been discussed, but I have
> > > to admit that I'm absolutely not sure of what was actually decided
> > > here.  I went with those changes:
> > >
> > > - rename wal_num_fpw to wal_fpw for consistency, both in pgss view
> > > fiel name but also everywhere in the code
> > > - change comments to consistently mention "full page writes generated"
> > > - changed pgss and explain documentation to mention "full page images
> > > generated", from Justin's patch on another thread
> > >
> >
> > I think it is better to use "full page writes" to be consistent with
> > other places.
> >
> > > - kept "amount" of WAL bytes
> > >
> >
> > Okay, but I would like to make another change suggested by Justin
> > which is to replace "count" with "number" at a few places.
>
> Ah sorry I missed this one.  +1 it also sounds better.
>
> > I have made the above two changes in the attached.  Let me know what
> > you think about attached?
>
> It all looks good to me!
>

Pushed.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-13 Thread Masahiko Sawada

On Mon, 13 Apr 2020 at 18:25, Amit Kapila  wrote:
>
> On Fri, Apr 10, 2020 at 7:05 PM Justin Pryzby  wrote:
> >
> >
> > No problem.  I think I was trying to make my text similar to that from
> > 14a4f6f37.
> >
> > I realized that I didn't sq!uash my last patch, so it didn't include the
> > functional change (which is maybe what Robert was referring to).
> >
>
> I think it is better to add a new test for temporary table which has
> less data.  We don't want to increase test timings to test the
> combination of options.  I changed that in the attached patch.  I will
> commit this tomorrow unless you or anyone else has any more comments.
>

Thank you for updating the patch!

I think we can update the documentation as well. Currently, the
documentation says "This option can't be used with the FULL option."
but we can say instead, for example, "VACUUM FULL can't use parallel
vacuum.".

Also, I'm concerned that the documentation says that VACUUM FULL
cannot use parallel vacuum and we compute the parallel degree when
PARALLEL option is omitted, but the following command is accepted:

postgres(1:55514)=# vacuum (full on) test;
VACUUM

Instead, we can say:

In plain VACUUM (without FULL), if the PARALLEL option is omitted,
then VACUUM decides the number of workers based on the number of
indexes that support parallel vacuum operation on the relation which
is further limited by max_parallel_maintenance_workers.

(it just adds "In plain VACUUM (without FULL)" to the beginning of the
original sentence.)

What do you think?

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-04-13 Thread Kuntal Ghosh

On Thu, Apr 9, 2020 at 2:40 PM Dilip Kumar  wrote:
>
> I have rebased the patch on the latest head.  I haven't yet changed
> anything for xid assignment thing because it is not yet concluded.
>
Some review comments from 0001-Immediately-WAL-log-*.patch,

+bool
+IsSubTransactionAssignmentPending(void)
+{
+ if (!XLogLogicalInfoActive())
+ return false;
+
+ /* we need to be in a transaction state */
+ if (!IsTransactionState())
+ return false;
+
+ /* it has to be a subtransaction */
+ if (!IsSubTransaction())
+ return false;
+
+ /* the subtransaction has to have a XID assigned */
+ if (!TransactionIdIsValid(GetCurrentTransactionIdIfAny()))
+ return false;
+
+ /* and it needs to have 'assigned' */
+ return !CurrentTransactionState->assigned;
+
+}
IMHO, it's important to reduce the complexity of this function since
it's been called for every WAL insertion. During the lifespan of a
transaction, any of these if conditions will only be evaluated if
previous conditions are true. So, we can maintain some state machine
to avoid multiple evaluation of a condition inside a transaction. But,
if the overhead is not much, it's not worth I guess.

+#define SizeOfTransactionId (sizeof(TransactionId) + sizeof(char))
This looks wrong. We should change the name of this Macro or we can
add the 1 byte directly in HEADER_SCRATCH_SIZE and some comments.

@@ -195,6 +197,10 @@ XLogResetInsertion(void)
 {
  int i;

+ /* reset the subxact assignment flag (if needed) */
+ if (curinsert_flags & XLOG_INCLUDE_XID)
+ MarkSubTransactionAssigned();
The comment looks contradictory.

 XLogSetRecordFlags(uint8 flags)
 {
  Assert(begininsert_called);
- curinsert_flags = flags;
+ curinsert_flags |= flags;
 }
 I didn't understand why we need this change in this patch.

+ txid = XLogRecGetTopXid(record);
+
+ /*
+ * If the toplevel_xid is valid, we need to assign the subxact to the
+ * toplevel transaction. We need to do this for all records, hence we
+ * do it before the switch.
+ */
s/toplevel_xid/toplevel xid or s/toplevel_xid/txid

  if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT &&
- info != XLOG_XACT_ASSIGNMENT)
+ !TransactionIdIsValid(r->toplevel_xid))
Perhaps, XLogRecGetTopXid() can be used.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: [Proposal] Global temporary tables

2020-04-13 Thread tushar


On 4/9/20 6:26 PM, 曾文旌 wrote:

On 4/7/20 2:27 PM, 曾文旌 wrote:

Vacuum full GTT, cluster GTT is already supported in 
global_temporary_table_v24-pg13.patch.

Here , it is skipping GTT

postgres=# \c foo
You are now connected to database "foo" as user "tushar".
foo=# create global temporary table  g123( c1 int) ;
CREATE TABLE
foo=# \q
[tushar@localhost bin]$ ./vacuumdb --full  foo
vacuumdb: vacuuming database "foo"
WARNING:  skipping vacuum global temp table "g123" because storage is not 
initialized for current session

The message was inappropriate at some point, so I removed it.


Thanks Wenjing. Please see -if this below behavior is correct

X terminal -

postgres=# create global temp table foo1(n int);
CREATE TABLE
postgres=# insert into foo1 values (generate_series(1,10));
INSERT 0 10
postgres=# vacuum full ;
VACUUM

Y Terminal -

[tushar@localhost bin]$ ./vacuumdb -f  postgres
vacuumdb: vacuuming database "postgres"
WARNING:  global temp table oldest relfrozenxid 3276 is the oldest in 
the entire db

DETAIL:  The oldest relfrozenxid in pg_class is 3277
HINT:  If they differ greatly, please consider cleaning up the data in 
global temp table.
WARNING:  global temp table oldest relfrozenxid 3276 is the oldest in 
the entire db

DETAIL:  The oldest relfrozenxid in pg_class is 3277
HINT:  If they differ greatly, please consider cleaning up the data in 
global temp table.



--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: Corruption during WAL replay

2020-04-13 Thread Masahiko Sawada

On Mon, 13 Apr 2020 at 17:40, Andres Freund  wrote:
>
> Hi,
>
> On 2020-04-13 15:24:55 +0900, Masahiko Sawada wrote:
> > On Sat, 11 Apr 2020 at 09:00, Teja Mupparti  wrote:
> > >
> > > Thanks Andres and Kyotaro for the quick review.  I have fixed the typos 
> > > and also included the critical section (emulated it with try-catch block 
> > > since palloc()s are causing issues in the truncate code). This time I 
> > > used git format-patch.
> > >
> >
> > I briefly looked at the latest patch but I'm not sure it's the right
> > thing here to use PG_TRY/PG_CATCH to report the PANIC error. For
> > example, with the following code you changed, we will always end up
> > with emitting a PANIC "failed to truncate the relation" regardless of
> > the actual cause of the error.
> >
> > +   PG_CATCH();
> > +   {
> > +   ereport(PANIC, (errcode(ERRCODE_INTERNAL_ERROR),
> > +   errmsg("failed to truncate the relation")));
> > +   }
> > +   PG_END_TRY();
> >
> > And the comments of RelationTruncate() mentions:
>
> I think that's just a workaround for mdtruncate not being usable in
> critical sections.
>
>
> > /*
> >  * We WAL-log the truncation before actually truncating, which means
> >  * trouble if the truncation fails. If we then crash, the WAL replay
> >  * likely isn't going to succeed in the truncation either, and cause a
> >  * PANIC. It's tempting to put a critical section here, but that cure
> >  * would be worse than the disease. It would turn a usually harmless
> >  * failure to truncate, that might spell trouble at WAL replay, into a
> >  * certain PANIC.
> >  */
>
> Yea, but that reasoning is just plain *wrong*. It's *never* ok to WAL
> log something and then not perform the action. This leads to to primary
> / replica getting out of sync, crash recovery potentially not completing
> (because of records referencing the should-be-truncated pages), ...
>
>
> > As a second idea, I wonder if we can defer truncation until commit
> > time like smgrDoPendingDeletes mechanism. The sequence would be:
>
> This is mostly an issue during [auto]vacuum partially truncating the end
> of the file. We intentionally release the AEL regularly to allow other
> accesses to continue.
>
> For transactional truncations we don't go down this path (as we create a
> new relfilenode).
>
>
> > At RelationTruncate(),
> > 1. WAL logging.
> > 2. Remember buffers to be dropped.
>
> You definitely cannot do that, as explained above.

Ah yes, you're right.

So it seems to me currently what we can do for this issue would be to
enclose the truncation operation in a critical section. IIUC it's not
enough just to reverse the order of dropping buffers and physical file
truncation because it cannot solve the problem of inconsistency on the
standby. And as Horiguchi-san mentioned, there is no need to reverse
that order if we envelop the truncation operation by a critical
section because we can recover page changes during crash recovery. The
strategy of writing out all dirty buffers before dropping buffers,
proposed as (a) in [1], also seems not enough.

Regards,

[1] 
https://www.postgresql.org/message-id/20191207001232.klidxnm756wqxvwx%40alap3.anarazel.deDoing
sync before truncation


--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

1 2 >

1 - 100 of 112 matches

Mail list logo