Re: file_fdw vs relative paths

2020-08-25 Thread Li Japin

On Aug 25, 2020, at 8:26 AM, Bruce Momjian 
mailto:br...@momjian.us>> wrote:

Yes, I tested back to 9.5 too:

CREATE EXTENSION file_fdw;
CREATE SERVER pgconf FOREIGN DATA WRAPPER file_fdw;
CREATE FOREIGN TABLE pgconf (line TEXT) SERVER pgconf OPTIONS ( filename
'postgresql.conf', format 'text', delimiter E'\x7f' );
SELECT * FROM pgconf;
 # -
 # PostgreSQL configuration file
 # -
 #
 # This file consists of lines of the form:
…

The file_fdw extension was introduced by commit 
7c5d0ae7078456bfeedb2103c45b9a32285c2631,
and I tested it supports relative paths.  This is a doc bug.

--
Japin Li



Re: Row estimates for empty tables

2020-08-25 Thread Pavel Stehule
po 24. 8. 2020 v 21:43 odesílatel Pavel Stehule 
napsal:

>
>
> ne 23. 8. 2020 v 23:08 odesílatel Tom Lane  napsal:
>
>> Pavel Stehule  writes:
>> > I am sending a patch that is years used in GoodData.
>>
>> I'm quite unexcited about that.  I'd be the first to agree that the
>> ten-pages estimate is a hack, but it's not an improvement to ask users
>> to think of a better value ... especially not as a one-size-fits-
>> all-relations GUC setting.
>>
>
> This patch is just a workaround that works well 10 years (but for one
> special use case) - nothing more. Without this patch that application
> cannot work ever.
>
>
>> I did have an idea that I think is better than my previous one:
>> rather than lying about the value of relpages, let's represent the
>> case where we don't know the tuple density by setting reltuples = -1
>> initially.  This leads to a patch that's a good bit more invasive than
>> the quick-hack solution, but I think it's a lot cleaner on the whole.
>>
>
>> A possible objection is that this changes the FDW API slightly, as
>> GetForeignRelSize callbacks now need to deal with rel->tuples possibly
>> being -1.  We could avoid an API break if we made plancat.c clamp
>> that value to zero; but then FDWs still couldn't tell the difference
>> between the "empty" and "never analyzed" cases, and I think this is
>> just as much of an issue for them as for the core code.
>>
>
>> I'll add this to the upcoming CF.
>>
>
> I'll check it
>

I  think it can work. It is a good enough solution for people who need a
different behaviour with minimal impact on people who don't need a change.

Regards

Pavel


>
> Regards
>
> Pavel
>
>>
>> regards, tom lane
>>
>>


Re: Row estimates for empty tables

2020-08-25 Thread Pavel Stehule
út 25. 8. 2020 v 9:32 odesílatel Pavel Stehule 
napsal:

>
>
> po 24. 8. 2020 v 21:43 odesílatel Pavel Stehule 
> napsal:
>
>>
>>
>> ne 23. 8. 2020 v 23:08 odesílatel Tom Lane  napsal:
>>
>>> Pavel Stehule  writes:
>>> > I am sending a patch that is years used in GoodData.
>>>
>>> I'm quite unexcited about that.  I'd be the first to agree that the
>>> ten-pages estimate is a hack, but it's not an improvement to ask users
>>> to think of a better value ... especially not as a one-size-fits-
>>> all-relations GUC setting.
>>>
>>
>> This patch is just a workaround that works well 10 years (but for one
>> special use case) - nothing more. Without this patch that application
>> cannot work ever.
>>
>>
>>> I did have an idea that I think is better than my previous one:
>>> rather than lying about the value of relpages, let's represent the
>>> case where we don't know the tuple density by setting reltuples = -1
>>> initially.  This leads to a patch that's a good bit more invasive than
>>> the quick-hack solution, but I think it's a lot cleaner on the whole.
>>>
>>
>>> A possible objection is that this changes the FDW API slightly, as
>>> GetForeignRelSize callbacks now need to deal with rel->tuples possibly
>>> being -1.  We could avoid an API break if we made plancat.c clamp
>>> that value to zero; but then FDWs still couldn't tell the difference
>>> between the "empty" and "never analyzed" cases, and I think this is
>>> just as much of an issue for them as for the core code.
>>>
>>
>>> I'll add this to the upcoming CF.
>>>
>>
>> I'll check it
>>
>
> I  think it can work. It is a good enough solution for people who need a
> different behaviour with minimal impact on people who don't need a change.
>

all tests passed

I'll mark this patch as ready for commit

Regards

Pavel


> Regards
>
> Pavel
>
>
>>
>> Regards
>>
>> Pavel
>>
>>>
>>> regards, tom lane
>>>
>>>


Re: extension patch of CREATE OR REPLACE TRIGGER

2020-08-25 Thread Peter Smith
On Mon, Aug 24, 2020 at 9:33 PM osumi.takami...@fujitsu.com
 wrote:
> I've fixed my v06 and created v07.

Hi Osumi-san.

I have reviewed the source code of the v07 patch.

(I also reviewed the test cases but I will share those comments as a
separate post).

Below are my comments - sorry, many of them are very minor.



COMMENT pg_constraint.c (wrong comment?)

- * The new constraint's OID is returned.
+ * The new constraint's OID is returned. (This will be the same as
+ * "conOid" if that is specified as nonzero.)

Shouldn't that comment say:
(This will be the same as "existing_constraint_oid" if that is other
than InvalidOid)



COMMENT pg_constraint.c (declarations)

@@ -91,6 +93,11 @@ CreateConstraintEntry(const char *constraintName,
  NameData cname;
  int i;
  ObjectAddress conobject;
+ SysScanDesc   conscan;
+ ScanKeyData   skey[2];
+ HeapTuple tuple;
+ bool  replaces[Natts_pg_constraint];
+ Form_pg_constraint constrForm;

Maybe it is more convenient/readable to declare these in the scope
where they are actually used.



COMMENT pg_constraint.c (oid checking)

- conOid = GetNewOidWithIndex(conDesc, ConstraintOidIndexId,
- Anum_pg_constraint_oid);
- values[Anum_pg_constraint_oid - 1] = ObjectIdGetDatum(conOid);
+ if (!existing_constraint_oid)
+ {
+ conOid = GetNewOidWithIndex(conDesc, ConstraintOidIndexId,
+ Anum_pg_constraint_oid);

Maybe better to use if (!OidIsValid(existing_constraint_oid)) here.



COMMENT tablecmds.c (unrelated change)

-   false); /* is_internal */
+   false); /* is_internal */

Some whitespace which has nothing to do with the patch was changed.



COMMENT trigger.c (declarations / whitespace)

+ bool is_update = false;
+ HeapTuple newtup;
+ TupleDesc tupDesc;
+ bool replaces[Natts_pg_trigger];
+ Oid existing_constraint_oid = InvalidOid;
+ bool trigger_exists = false;
+ bool trigger_deferrable = false;

1. One of those variables is misaligned with tabbing.

2. Maybe it is more convenient/readable to declare some of these in
the scope where they are actually used.
e.g. newtup, tupDesc, replaces.



COMMENT trigger.c (error messages)

+ /*
+ * If this trigger has pending event, throw an error.
+ */
+ if (stmt->replace && !isInternal && trigger_deferrable &&
+ AfterTriggerPendingOnRel(RelationGetRelid(rel)))
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_IN_USE),
+ errmsg("cannot replace \"%s\" on \"%s\" because it has pending
trigger events",
+ stmt->trigname, RelationGetRelationName(rel;
+ /*
+ * without OR REPLACE clause, can't override the trigger with the same name.
+ */
+ if (!stmt->replace && !isInternal)
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("trigger \"%s\" for relation \"%s\" already exists",
+ stmt->trigname, RelationGetRelationName(rel;
+ /*
+ * CREATE OR REPLACE CONSTRAINT TRIGGER command can't replace
non-constraint trigger.
+ */
+ if (stmt->replace && stmt->isconstraint &&
!OidIsValid(existing_constraint_oid))
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("Trigger \"%s\" for relation \"%s\" cannot be replaced with
constraint trigger",
+ stmt->trigname, RelationGetRelationName(rel;
+ /*
+ * CREATE OR REPLACE TRIGGER command can't replace constraint trigger.
+ */
+ if (stmt->replace && !stmt->isconstraint &&
OidIsValid(existing_constraint_oid))
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("Constraint trigger \"%s\" for relation \"%s\" cannot be
replaced with non-constraint trigger",
+ stmt->trigname, RelationGetRelationName(rel;


1. The order of these new errors is confusing. Maybe do the "already
exists" check first so that all the REPLACE errors can be grouped
together.

2. There is inconsistent message text capitalising the 1st word.
Should all be lower [1]
[1] - https://www.postgresql.org/docs/current/error-style-guide.html

3. That "already exists" error might benefit from a message hint. e.g.
---
ereport(ERROR, (errcode(ERRCODE_DUPLICATE_OBJECT),
errmsg("trigger \"%s\" for relation \"%s\" already exists", ...),
errhint("use CREATE OR REPLACE to replace it")));
---

4. Those last two errors seem like a complicated way just to say the
trigger types are not compatible. Maybe these messages can be
simplified, and some message hints added. e.g.
---
ereport(ERROR, (errcode(ERRCODE_DUPLICATE_OBJECT),
errmsg("trigger \"%s\" for relation \"%s\" is a regular trigger", ...),
errhint("use CREATE OR REPLACE TRIGGER to replace a regular trigger")));


ereport(ERROR, (errcode(ERRCODE_DUPLICATE_OBJECT),
errmsg("trigger \"%s\" for relation \"%s\" is a constraint trigger", ...),
errhint("use CREATE OR REPLACE CONSTRAINT to replace a constraint
trigger")));
---



COMMENT trigger.c (comment wording)

+ * In case of replace trigger, trigger should no-more dependent on old
+ * referenced objects. Always remove the old dependencies and then

Needs re-wording.



Kind Regards,
Peter Smith.
Fujitsu Australia




Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-08-25 Thread Masahiko Sawada
On Fri, 21 Aug 2020 at 22:25, Ashutosh Sharma  wrote:
>
> Hi Masahiko-san,
>
> Please find the updated patch with the following new changes:
>

Thank you for updating the patch!

> 1) It adds the code changes in heap_force_kill function to clear an
> all-visible bit on the visibility map corresponding to the page that
> is marked all-visible. Along the way it also clears PD_ALL_VISIBLE
> flag on the page header.

I think we need to clear all visibility map bits by using
VISIBILITYMAP_VALID_BITS. Otherwise, the page has all-frozen bit but
not all-visible bit, which is not a valid state.

>
> 2) It adds the code changes in heap_force_freeze function to reset the
> ctid value in a tuple header if it is corrupted.
>
> 3) It adds several notes and examples in the documentation stating
> when and how we need to use the functions provided by this module.
>
> Please have a look and let me know for any other concern.
>

And here are small comments on the heap_surgery.c:

+   /*
+* Get the offset numbers from the tids belonging to one particular page
+* and process them one by one.
+*/
+   blkno = tids_same_page_fetch_offnums(tids, ntids, &next_start_ptr,
+offnos);
+
+   /* Calculate the number of offsets stored in offnos array. */
+   noffs = next_start_ptr - curr_start_ptr;
+
+   /*
+* Update the current start pointer so that next time when
+* tids_same_page_fetch_offnums() is called, we can calculate the number
+* of offsets present in the offnos array.
+*/
+   curr_start_ptr = next_start_ptr;
+
+   /* Check whether the block number is valid. */
+   if (blkno >= nblocks)
+   {
+   ereport(NOTICE,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("skipping block %u for relation \"%s\"
because the block number is out of range",
+   blkno, RelationGetRelationName(rel;
+   continue;
+   }
+
+   CHECK_FOR_INTERRUPTS();

I guess it would be better to call CHECK_FOR_INTERRUPTS() at the top
of the do loop for safety. I think it's unlikely to happen but the
user might mistakenly specify a lot of wrong block numbers.


+   offnos = (OffsetNumber *) palloc(ntids * sizeof(OffsetNumber));
+   noffs = curr_start_ptr = next_start_ptr = 0;
+   nblocks = RelationGetNumberOfBlocks(rel);
+
+   do
+   {

(snip)

+
+   /*
+* Get the offset numbers from the tids belonging to one particular page
+* and process them one by one.
+*/
+   blkno = tids_same_page_fetch_offnums(tids, ntids, &next_start_ptr,
+offnos);
+
+   /* Calculate the number of offsets stored in offnos array. */
+   noffs = next_start_ptr - curr_start_ptr;
+

(snip)

+   /* No ereport(ERROR) from here until all the changes are logged. */
+   START_CRIT_SECTION();
+
+   for (i = 0; i < noffs; i++)

You copy all offset numbers belonging to the same page to palloc'd
array, offnos, and iterate it while processing the tuples. I might be
missing something but I think we can do that without allocating the
space for offset numbers. Is there any reason for this? I guess we can
do that by just iterating the sorted tids array.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: extension patch of CREATE OR REPLACE TRIGGER

2020-08-25 Thread Peter Smith
On Mon, Aug 24, 2020 at 9:33 PM osumi.takami...@fujitsu.com
 wrote:
> I've fixed my v06 and created v07.

Hi Osumi-san.

I have reviewed the test code of the v07 patch.

Below are my comments.



COMMENT (confusing functions)

+create function before_replacement() returns trigger as $$
+begin
+raise notice 'function replaced by another function';
+return null;
+end; $$ language plpgsql;
+create function after_replacement() returns trigger as $$
+begin
+raise notice 'function to replace the initial function';
+return null;
+end; $$ language plpgsql;

Why have function names with a hard-wired dependency on how you expect
they will be called.
I think just call them "funcA" and "funcB" is much easier and works
just as well. e.g.
---
create function funcA() returns trigger as $$
begin
raise notice 'hello from funcA';
return null;
end; $$ language plpgsql;

create function funcB() returns trigger as $$
begin
raise notice 'hello from funcB';
return null;
end; $$ language plpgsql;
---

And this same comment applies for all the other test functions created
for this v07 patch.



COMMENT (drops)

+-- setup for another test of CREATE OR REPLACE TRIGGER
+drop table if exists parted_trig;
+NOTICE:  table "parted_trig" does not exist, skipping
+drop trigger if exists my_trig on parted_trig;
+NOTICE:  relation "parted_trig" does not exist, skipping
+drop function if exists before_replacement;
+NOTICE:  function before_replacement() does not exist, skipping
+drop function if exists after_replacement;
+NOTICE:  function after_replacement() does not exist, skipping

Was it deliberate to attempt to drop the trigger after dropping the table?
Also this seems to be dropping functions which were already dropped
just several lines earlier.



COMMENT (typos)

There are a couple of typos in the test comments. e.g.
"vefify" -> "verify"
"child parition" -> "child partition"



COMMENT (partition table inserts)

1. Was it deliberate to insert explicitly into each partition table?
Why not insert everything into the top table and let the partitions
take care of themselves?

2. The choice of values to insert also seemed strange. Inserting 1 and
1 and 10 is going to all end up in the "parted_trig_1_1".

To summarise, I thought all subsequent partition tests maybe should be
inserting more like this:
---
insert into parted_trig (a) values (50); -- into parted_trig_1_1
insert into parted_trig (a) values (1500); -- into parted_trig_2
insert into parted_trig (a) values (2500); -- into default_parted_trig
---



COMMENT (missing error test cases)

There should be some more test cases to cover the new error messages
that were added to trigger.c:

e.g. test for "can't create regular trigger because already exists"
e.g. test for "can't create constraint trigger because already exists"
e.g. test for "can't replace regular trigger with constraint trigger""
e.g. test for "can't replace constraint trigger with regular trigger"
etc.



COMMENT (trigger with pending events)

This is another test where the complexity of the functions
("not_replaced", and "fail_to_replace") seemed excessive.
I think just calling these "funcA" and "funcB" as mentioned above
would be easier, and would serve just as well.



Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-25 Thread David Rowley
On Tue, 25 Aug 2020 at 08:26, Andres Freund  wrote:
> While I'm against introducing a separate node for the caching, I'm *not*
> against displaying a different node type when caching is
> present. E.g. it'd be perfectly reasonable from my POV to have a 'Cached
> Nested Loop' join and a plain 'Nested Loop' node in the above node. I'd
> probably still want to display the 'Cache Key' similar to your example,
> but I don't see how it'd be better to display it with one more
> intermediary node.

...Well, this is difficult... For the record, in case anyone missed
it, I'm pretty set on being against doing any node overloading for
this.  I think it's a pretty horrid modularity violation regardless of
what text appears in EXPLAIN. I think if we merge these nodes then we
may as well go further and merge in other simple nodes like LIMIT.
Then after a few iterations of that, we end up with with a single node
in EXPLAIN that nobody can figure out what it does. "Here Be Dragons",
as Tom said.  That might seem like a bit of an exaggeration, but it is
important to understand that this would start us down that path, and
the more steps you take down that path, the harder it is to return
from it.

Let's look at nodeProjectSet.c, for example, which I recall you spent
quite a bit of time painfully extracting the scattered logic to get it
into a single reusable node (69f4b9c85). I understand your motivation
was for JIT compilation and not to modularise the code, however, I
think the byproduct of that change of having all that code in one
executor node was a good change, and I'm certainly a fan of what it
allowed you to achieve with JIT.  I really wouldn't like to put anyone
else in a position of having to extract out some complex logic that we
add to existing nodes in some future version of PostgreSQL. It might
seem quite innocent today, but add a few more years of development and
I'm sure things will get buried a little deeper.

I'm sure you know better than most that the day will come where we go
and start rewriting all of our executor node code to implement
something like batch execution.  I'd imagine you'd agree that this job
would be easier if nodes were single-purpose, rather than overloaded
with a bunch of needless complexity that only Heath Robinson himself
could be proud of.

I find it bizarre that on one hand, for non-parameterized nested
loops, we can have the inner scan become materialized with a
Materialize node (I don't recall complaints about that) However, on
the other hand, for parameterized nested loops, we build the caching
into the Nested Loop node itself.

For the other arguments: I'm also struggling a bit to understand the
arguments that it makes EXPLAIN easier to read due to reduced nesting
depth. If that's the case, why don't we get rid of Hash below a Hash
Join? It seems nobody has felt strongly enough about that to go to the
trouble of writing the patch.  We could certainly do work to reduce
nesting depth in EXPLAIN provided you're not big on what the plan
actually does. One line should be ok if you're not worried about
what's happening to your tuples. Unfortunately, that does not seem
very useful as it tends to be that people who do look at EXPLAIN do
actually want to know what the planner has decided to do and are
interested in what's happening to their tuples. Hiding away details
that can significantly impact the performance of the plan does not
seem like a great direction to be moving in.

Also, just in case anyone is misunderstanding this Andres' argument.
It's entirely based on the performance impact of having an additional
node.  However, given the correct planner choice, there will never be
a gross slowdown due to having the extra node. The costing, the way it
currently is designed will only choose to use a Result Cache if it
thinks it'll be cheaper to do so and cheaper means having enough cache
hits for the caching overhead to be worthwhile.  If we get a good
cache hit ratio then the additional node overhead does not exist
during execution since we don't look any further than the cache during
a cache hit.  It would only be a cache miss that requires pulling the
tuples through an additional node.  Given perfect statistics (which of
course is impossible) and costs, we'll never slow down the execution
of a plan by having a separate Result Cache node. In reality, poor
statistics, e.g, massive n_distinct underestimations, could cause
slowdowns, but loading this into one node is not going to save us from
that.  All that your design will save us from is that 12 nanosecond
per-tuple hop (measured on a 5-year-old laptop) to an additional node
during cache misses. It seems like a strange thing to optimise for,
given that the planner only chooses to use a Result Cache when there's
a good number of expected cache hits.

I understand that you've voiced your feelings about this, but what I
want to know is, how strongly do you feel about overloading the node?
Will you stand in my way if I want to push ahead 

Re: Asymmetric partition-wise JOIN

2020-08-25 Thread Daniel Gustafsson
> On 21 Aug 2020, at 08:02, Andrey V. Lepikhov  
> wrote:
> 
> On 7/1/20 2:10 PM, Daniel Gustafsson wrote:
>>> On 27 Dec 2019, at 08:34, Kohei KaiGai  wrote:
>>> The attached v2 fixed the problem, and regression test finished correctly.
>> This patch no longer applies to HEAD, please submit an rebased version.
>> Marking the entry Waiting on Author in the meantime.
> Rebased version of the patch on current master (d259afa736).
> 
> I rebased it because it is a base of my experimental feature than we don't 
> break partitionwise join of a relation with foreign partition and a local 
> relation if we have info that remote server has foreign table link to the 
> local relation (by analogy with shippable extensions).
> 
> Maybe mark as 'Needs review'?

Thanks for the rebase, I've updated the commitfest entry to reflect that it
needs a round of review.

cheers ./daniel



Re: Handing off SLRU fsyncs to the checkpointer

2020-08-25 Thread Jakub Wartak
On Wed, Aug 12, 2020 at 6:06 PM Thomas Munro  wrote:
> [patch]

Hi Thomas / hackers,

I just wanted to help testing this patch (defer SLRU fsyncs during recovery) 
and also faster compactify_tuples() patch [2] as both are related to the WAL 
recovery performance in which I'm interested in. This is my first message to 
this mailing group so please let me know also if I should adjust testing style 
or formatting.

With both of those patches applied:
make check -> Passes
make check-world -> Passes
make standbycheck (section "Testing Hot Standby" from docs) -> Passes
There wasn't a single segfault or postmaster crash during the tests.
Review of the patches itself: I'm not qualified to review the PostgreSQL 
internals.

I've used redo-bench scripts [1] by Thomas to measure the performance effect 
(this approach simplifies testing and excludes network jittering effects): 1st 
column is redo start->end timing, 2nd is redo end -> end of checkpointing 
timing before opening the DB for reads. I've conducted 2-3 separate tests that 
show benefits of those patches depending on the workload:
- Handing SLRU sync work over to the checkpointer: in my testing it accelerates 
WAL recovery performance on slower / higer latency storage by ~20%
- Faster sort in compactify_tuples(): in my testing it accelerates WAL recovery 
performance for HOT updates also by ~20%

TEST1: workload profile test as per standard TPC-B pgbench -c8 -j8, with 
majority of records in WAL stream being Heap/HOT_UPDATE:

xvda, master @ a3c66de6c5e1ee9dd41ce1454496568622fb7712 (24/08/2020) - baseline:
72.991, 0.919
73.688, 1.027
72.228, 1.032

xvda, same master with, v2-0001-Defer-flushing-of-SLRU-files.patch
72.271, 0.857
72.717, 0.748
72.705, 0.81

xvda, same master with, v2-0001-Defer-flushing-of-SLRU-files.patch, fsync=off
72.49, 0.103
74.069, 0.102
73.368, 0.102

NVMe, same master with, v2-0001-Defer-flushing-of-SLRU-files.patch
70.312, 0.22
70.615, 0.201
69.739, 0.194

NVMe, same master with, v2-0001-Defer-flushing-of-SLRU-files.patch, fsync=off
69.686, 0.101
70.601, 0.102
70.042, 0.101

As Thomas stated in the standard pgbench workload profile on recovery side 
there is compactify_tuples()->pg_qsort() overhead visible. So this is where the 
2nd patch helps:

NVMe, same master with, v2-0001-Defer-flushing-of-SLRU-files.patch and 
compactify_tuples()->pg_qsort() performance improvement
58.85, 0.296
58.605, 0.269
58.603, 0.277

NVMe, same master with, v2-0001-Defer-flushing-of-SLRU-files.patch and 
compactify_tuples()->pg_qsort() performance improvement, fsync=off
58.618, 0.103
57.693, 0.101
58.779, 0.111

In the last final case the top profile is as follows related still to the 
sorting but as I understand in much optimal way:

26.68%  postgres  postgres[.] qsort_itemoff
---qsort_itemoff
   |--14.17%--qsort_itemoff
   |  |--10.96%--compactify_tuples
   |  |  PageRepairFragmentation
   |  |  heap2_redo
   |  |  StartupXLOG
   |   --3.21%--qsort_itemoff
   |  --3.10%--compactify_tuples
   |PageRepairFragmentation
   |heap2_redo
   |StartupXLOG
--12.51%--compactify_tuples
  PageRepairFragmentation
  heap2_redo
  StartupXLOG

 8.38%  postgres  libc-2.17.so[.] __memmove_ssse3_back
---__memmove_ssse3_back
   compactify_tuples
   PageRepairFragmentation
   heap2_redo
   StartupXLOG

 6.51%  postgres  postgres[.] hash_search_with_hash_value
---hash_search_with_hash_value
   |--3.62%--smgropen
   |  |--2.17%--XLogReadBufferExtended
   |  |   --1.76%--XLogReadBufferForRedoExtended
   |  |  --0.93%--heap_xlog_update
   |   --1.45%--ReadBufferWithoutRelcache
   | XLogReadBufferExtended
   |  --1.34%--XLogReadBufferForRedoExtended
   | --0.72%--heap_xlog_update
--2.69%--BufTableLookup
  ReadBuffer_common
  ReadBufferWithoutRelcache
  XLogReadBufferExtended
   --2.48%--XLogReadBufferForRedoExtended
 |--1.34%--heap2_redo
 |  StartupXLOG
  --0.83%--heap_xlog_update


So to sum, HOT update-like workload profile tends to be CPU bound on single 
process recovery side. Even slow storage (like xvda) was not the bottl

Re: proposal - function string_to_table

2020-08-25 Thread Peter Smith
On Tue, Aug 25, 2020 at 4:58 PM Pavel Stehule  wrote:
> When you run ./unused_oids script, then you get this message
>
> [pavel@nemesis catalog]$ ./unused_oids

> Patches should use a more-or-less consecutive range of OIDs.
> Best practice is to start with a random choice in the range 8000-.
> Suggested random unused OID: 8973 (1027 consecutive OID(s) available starting 
> here)
>
> For me, this is simple protection against oid collision under development, 
> and I expect so commiters does oid' space defragmentation.

I have not used that tool before. Thanks for teaching me!

===

I have re-checked the string_to_table_20200825.patch.

Everything looks good to me now, so I am marking this as "ready for committer".


Kind Regards,
Peter Smith.
Fujitsu Australia




Re: proposal - function string_to_table

2020-08-25 Thread Pavel Stehule
út 25. 8. 2020 v 11:19 odesílatel Peter Smith 
napsal:

> On Tue, Aug 25, 2020 at 4:58 PM Pavel Stehule 
> wrote:
> > When you run ./unused_oids script, then you get this message
> >
> > [pavel@nemesis catalog]$ ./unused_oids
> 
> > Patches should use a more-or-less consecutive range of OIDs.
> > Best practice is to start with a random choice in the range 8000-.
> > Suggested random unused OID: 8973 (1027 consecutive OID(s) available
> starting here)
> >
> > For me, this is simple protection against oid collision under
> development, and I expect so commiters does oid' space defragmentation.
>
> I have not used that tool before. Thanks for teaching me!
>

:)


> ===
>
> I have re-checked the string_to_table_20200825.patch.
>
> Everything looks good to me now, so I am marking this as "ready for
> committer".
>

Thank you very much :)

Regard

Pavel

>
>
> Kind Regards,
> Peter Smith.
> Fujitsu Australia
>


[PATCH] Automatic HASH and LIST partition creation

2020-08-25 Thread Anastasia Lubennikova

On 14.07.2020 00:11, Anastasia Lubennikova wrote:

On 06.07.2020 13:45, Anastasia Lubennikova wrote:
The previous discussion of automatic partition creation [1] has 
addressed static and dynamic creation of partitions and ended up with 
several syntax proposals.

In this thread, I want to continue this work.

...
[1] 
https://www.postgresql.org/message-id/flat/alpine.DEB.2.21.1907150711080.22273%40lancre


Syntax proposal v2, that takes into account received feedback.

CREATE TABLE numbers(int number)
PARTITION BY partition_method (list_of_columns)
USING (partition_desc)

where partition_desc is:

MODULUS n
| VALUES IN (value_list), [DEFAULT PARTITION part_name]
| START ([datatype] 'start_value')
   END ([datatype] 'end_value')
   EVERY (partition_step), [DEFAULT PARTITION part_name]

where partition_step is:
[datatype] [number | INTERVAL] 'interval_value'

It is less wordy than the previous version. It uses a free keyword option
style. It covers static partitioning for all methods, default 
partition for
list and range methods, and can be extended to implement dynamic 
partitioning

for range partitions.

[1] 
https://wiki.postgresql.org/wiki/Declarative_partitioning_improvements#Other_DBMS
[2] 
https://wiki.postgresql.org/wiki/Declarative_partitioning_improvements#Proposal_.28is_subject_to_change.29


Here is the patch for automated HASH and LIST partitioning, that 
implements proposed syntax.


Range partitioning is more complicated. It will require new support 
function to calculate bounds, new catalog attribute to store them and so 
on. So I want to start small and implement automated range partitioning 
in a separate patch later.


1) Syntax

New syntax is heavily based on Greenplum syntax for automated 
partitioning with one change. Keyword "USING", that was suggested above, 
causes shift/reduce conflict with "USING method" syntax of a table 
access method. It seems that Greenplum folks will face this problem later.


I stick to CONFIGURATION as an existing keyword that makes sense in this 
context.

Any better ideas are welcome.

Thus, current version is:

CREATE TABLE table_name (attrs)
PARTITION BY partition_method (list_of_columns)
CONFIGURATION (partition_desc)

where partition_desc is:

MODULUS n
| VALUES IN (value_list) [DEFAULT PARTITION part_name]

This syntax can be easily extended for range partitioning as well.

2) Implementation

PartitionBoundAutoSpec is a new part of PartitionSpec, that contains 
information needed to generate partition bounds.


For HASH and LIST automatic partition creation, transformation happens 
during parse analysis of CREATE TABLE statement.
transformPartitionAutoCreate() calculates bounds and generates 
statements to create partition tables.


Partitions are named in a format: $tablename_$partnum. One can use post 
create hook to rename relations.


For LIST partition one can also define a default partition.

3) TODO

The patch lacks documentation, because I expect some details may change 
during discussion. Other than that, the feature is ready for review.



Regards

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

commit ede0c9d8f13509915ee1db724f7bcabc0365ecd5
Author: anastasia 
Date:   Tue Aug 25 03:34:15 2020 +0300

Auto generated HASH and LIST partitions.

New syntax:

CREATE TABLE tbl_hash (i int) PARTITION BY HASH (i)
CONFIGURATION (modulus 3);

CREATE TABLE tbl_list (i int) PARTITION BY LIST (i)
CONFIGURATION (values in (1, 2), (3, 4) DEFAULT PARTITION tbl_default);
---
 src/backend/nodes/copyfuncs.c  |  17 
 src/backend/nodes/equalfuncs.c |  17 
 src/backend/nodes/outfuncs.c   |  16 
 src/backend/nodes/readfuncs.c  |  15 
 src/backend/parser/gram.y  |  82 -
 src/backend/parser/parse_utilcmd.c | 138 +
 src/include/nodes/nodes.h  |   1 +
 src/include/nodes/parsenodes.h |  23 +
 src/include/partitioning/partdefs.h|   2 +
 src/test/regress/expected/create_table.out |  37 
 src/test/regress/sql/create_table.sql  |  23 +
 11 files changed, 369 insertions(+), 2 deletions(-)

diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 89c409de66..cb537bce3a 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -4629,6 +4629,7 @@ _copyPartitionSpec(const PartitionSpec *from)
 
 	COPY_STRING_FIELD(strategy);
 	COPY_NODE_FIELD(partParams);
+	COPY_NODE_FIELD(autopart);
 	COPY_LOCATION_FIELD(location);
 
 	return newnode;
@@ -4651,6 +4652,19 @@ _copyPartitionBoundSpec(const PartitionBoundSpec *from)
 	return newnode;
 }
 
+static PartitionBoundAutoSpec *
+_copyPartitionBoundAutoSpec(const PartitionBoundAutoSpec *from)
+{
+	PartitionBoundAutoSpec *newnode = makeNode(PartitionBoundAutoSpec);
+
+	COPY_SCALAR_FIELD(strategy);
+	COPY_SCALAR_FI

passwordcheck: Log cracklib diagnostics

2020-08-25 Thread Peter Eisentraut
A user tried to use the cracklib build-time option of the passwordcheck 
module.  This failed, as it turned out because there was no dictionary 
installed in the right place, but the error was not properly reported, 
because the existing code just throws away the error message from 
cracklib.  Attached is a patch that changes this by logging any error 
message returned from the cracklib call.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 00d5c1e1a9b65339bee6449e49eab053fef2a34f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 25 Aug 2020 12:12:42 +0200
Subject: [PATCH] passwordcheck: Log cracklib diagnostics

When calling cracklib to check the password, the diagnostic from
cracklib was thrown away.  This would hide essential information such
as no dictionary being installed.  Change this to show the cracklib
error message using errdetail_log().
---
 contrib/passwordcheck/passwordcheck.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/contrib/passwordcheck/passwordcheck.c 
b/contrib/passwordcheck/passwordcheck.c
index d5f9d14b01..70f056232f 100644
--- a/contrib/passwordcheck/passwordcheck.c
+++ b/contrib/passwordcheck/passwordcheck.c
@@ -91,6 +91,9 @@ check_password(const char *username,
int i;
boolpwd_has_letter,
pwd_has_nonletter;
+#ifdef USE_CRACKLIB
+   const char *reason;
+#endif
 
/* enforce minimum length */
if (pwdlen < MIN_PWD_LENGTH)
@@ -125,10 +128,11 @@ check_password(const char *username,
 
 #ifdef USE_CRACKLIB
/* call cracklib to check password */
-   if (FascistCheck(password, CRACKLIB_DICTPATH))
+   if ((reason = FascistCheck(password, CRACKLIB_DICTPATH)))
ereport(ERROR,

(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-errmsg("password is easily cracked")));
+errmsg("password is easily cracked"),
+errdetail_log("cracklib diagnostic: 
%s", reason)));
 #endif
}
 
-- 
2.28.0



Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-25 Thread Gavin Flower

On 25/08/2020 20:48, David Rowley wrote:

On Tue, 25 Aug 2020 at 08:26, Andres Freund  wrote:

While I'm against introducing a separate node for the caching, I'm *not*
against displaying a different node type when caching is
present. E.g. it'd be perfectly reasonable from my POV to have a 'Cached
Nested Loop' join and a plain 'Nested Loop' node in the above node. I'd
probably still want to display the 'Cache Key' similar to your example,
but I don't see how it'd be better to display it with one more
intermediary node.

...Well, this is difficult... For the record, in case anyone missed
it, I'm pretty set on being against doing any node overloading for
this.  I think it's a pretty horrid modularity violation regardless of
what text appears in EXPLAIN. I think if we merge these nodes then we
may as well go further and merge in other simple nodes like LIMIT.
Then after a few iterations of that, we end up with with a single node
in EXPLAIN that nobody can figure out what it does. "Here Be Dragons",
as Tom said.  That might seem like a bit of an exaggeration, but it is
important to understand that this would start us down that path, and
the more steps you take down that path, the harder it is to return
from it.

[...]


I understand that you've voiced your feelings about this, but what I
want to know is, how strongly do you feel about overloading the node?
Will you stand in my way if I want to push ahead with the separate
node?  Will anyone else?

David


From my own experience, and thinking about issues like this, I my 
thinking keeping them separate adds robustness wrt change. Presumably 
common code can be extracted out, to avoid excessive code duplication?


-- Gavin





Re: passwordcheck: Log cracklib diagnostics

2020-08-25 Thread Daniel Gustafsson
> On 25 Aug 2020, at 12:20, Peter Eisentraut  
> wrote:
> 
> A user tried to use the cracklib build-time option of the passwordcheck 
> module.  This failed, as it turned out because there was no dictionary 
> installed in the right place, but the error was not properly reported, 
> because the existing code just throws away the error message from cracklib.  
> Attached is a patch that changes this by logging any error message returned 
> from the cracklib call.

+1 on this, it's also in line with the example documentation from cracklib.
The returned error is potentially a bit misleading now, as it might say claim
that a strong password is easily cracked if the dictionary fails load.  Given
that there is no way to distinguish between the class of returned errors it's
hard to see how we can do better though.

While poking at this, we might as well update the docs to point to the right
URL for CrackLib as it moved from Sourceforge five years ago.  The attached
diff fixes that.

cheers ./daniel



cracklib_url.diff
Description: Binary data


Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-08-25 Thread Masahiko Sawada
On Tue, 25 Aug 2020 at 17:08, Masahiko Sawada
 wrote:
>
> On Fri, 21 Aug 2020 at 22:25, Ashutosh Sharma  wrote:
> >
> > Hi Masahiko-san,
> >
> > Please find the updated patch with the following new changes:
> >
>
> Thank you for updating the patch!
>
> > 1) It adds the code changes in heap_force_kill function to clear an
> > all-visible bit on the visibility map corresponding to the page that
> > is marked all-visible. Along the way it also clears PD_ALL_VISIBLE
> > flag on the page header.
>
> I think we need to clear all visibility map bits by using
> VISIBILITYMAP_VALID_BITS. Otherwise, the page has all-frozen bit but
> not all-visible bit, which is not a valid state.
>
> >
> > 2) It adds the code changes in heap_force_freeze function to reset the
> > ctid value in a tuple header if it is corrupted.
> >
> > 3) It adds several notes and examples in the documentation stating
> > when and how we need to use the functions provided by this module.
> >
> > Please have a look and let me know for any other concern.
> >
>
> And here are small comments on the heap_surgery.c:
>
> +   /*
> +* Get the offset numbers from the tids belonging to one particular 
> page
> +* and process them one by one.
> +*/
> +   blkno = tids_same_page_fetch_offnums(tids, ntids, &next_start_ptr,
> +offnos);
> +
> +   /* Calculate the number of offsets stored in offnos array. */
> +   noffs = next_start_ptr - curr_start_ptr;
> +
> +   /*
> +* Update the current start pointer so that next time when
> +* tids_same_page_fetch_offnums() is called, we can calculate the 
> number
> +* of offsets present in the offnos array.
> +*/
> +   curr_start_ptr = next_start_ptr;
> +
> +   /* Check whether the block number is valid. */
> +   if (blkno >= nblocks)
> +   {
> +   ereport(NOTICE,
> +   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +errmsg("skipping block %u for relation \"%s\"
> because the block number is out of range",
> +   blkno, RelationGetRelationName(rel;
> +   continue;
> +   }
> +
> +   CHECK_FOR_INTERRUPTS();
>
> I guess it would be better to call CHECK_FOR_INTERRUPTS() at the top
> of the do loop for safety. I think it's unlikely to happen but the
> user might mistakenly specify a lot of wrong block numbers.
>
> 
> +   offnos = (OffsetNumber *) palloc(ntids * sizeof(OffsetNumber));
> +   noffs = curr_start_ptr = next_start_ptr = 0;
> +   nblocks = RelationGetNumberOfBlocks(rel);
> +
> +   do
> +   {
>
> (snip)
>
> +
> +   /*
> +* Get the offset numbers from the tids belonging to one particular 
> page
> +* and process them one by one.
> +*/
> +   blkno = tids_same_page_fetch_offnums(tids, ntids, &next_start_ptr,
> +offnos);
> +
> +   /* Calculate the number of offsets stored in offnos array. */
> +   noffs = next_start_ptr - curr_start_ptr;
> +
>
> (snip)
>
> +   /* No ereport(ERROR) from here until all the changes are logged. */
> +   START_CRIT_SECTION();
> +
> +   for (i = 0; i < noffs; i++)
>
> You copy all offset numbers belonging to the same page to palloc'd
> array, offnos, and iterate it while processing the tuples. I might be
> missing something but I think we can do that without allocating the
> space for offset numbers. Is there any reason for this? I guess we can
> do that by just iterating the sorted tids array.
>

Let me share other comments on the latest version patch:

Some words need to be tagged. For instance, I found the following words:

VACUUM
DISABLE_PAGE_SKIPPING
HEAP_XMIN_FROZEN
HEAP_XMAX_INVALID

---
+test=# select ctid from t1 where xmin = 507;
+ ctid
+---
+ (0,3)
+(1 row)
+
+test=# select heap_force_freeze('t1'::regclass, ARRAY['(0, 3)']::tid[]);
+-[ RECORD 1 ]-+-
+heap_force_freeze |

I think it's better to use a consistent output format. The former uses
the normal format whereas the latter uses the expanded format.

---
+ 
+ 
+  While performing surgery on a damaged relation, we must not be doing anything
+  else on that relation in parallel. This is to ensure that when we are
+  operating on a damaged tuple there is no other transaction trying to modify
+  that tuple.
+ 
+ 

If we prefer to avoid concurrent operations on the target relation why
don't we use AccessExclusiveLock?

---
+CREATE FUNCTION heap_force_kill(reloid regclass, tids tid[])
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'heap_force_kill'
+LANGUAGE C STRICT;

+CREATE FUNCTION heap_force_freeze(reloid regclass, tids tid[])
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'heap_force_freeze'
+LANGUAGE C STRICT;

I think these functions should be PARALLEL UNSAFE.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Autovacuum on partitioned table (autoanalyze)

2020-08-25 Thread Daniel Gustafsson
> On 17 Aug 2020, at 08:11, yuzuko  wrote:
> 
> I'm sorry for the late reply.
> 
>> This version seems to fail under Werror which is used in the Travis builds:
>> 
>> autovacuum.c: In function ‘relation_needs_vacanalyze’:
>> autovacuum.c:3117:59: error: ‘reltuples’ may be used uninitialized in this 
>> function [-Werror=maybe-uninitialized]
>>   anlthresh = (float4) anl_base_thresh + anl_scale_factor * reltuples;
>>   ^
>> autovacuum.c:2972:9: note: ‘reltuples’ was declared here
>>  float4  reltuples;  /* pg_class.reltuples */
>> ^
> 
> I attach the latest patch that solves the above Werror.
> Could you please check it again?

This version now pass the tests in the Travis pipeline as can be seen in the
link below, and is ready to be reviewed in the upcoming commitfest:

http://cfbot.cputube.org/yuzuko-hosoya.html

cheers ./daniel



Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-08-25 Thread Ashutosh Sharma
Hi Masahiko-san,

Thank you for the review. Please check my comments inline below:

On Tue, Aug 25, 2020 at 1:39 PM Masahiko Sawada
 wrote:
>
> On Fri, 21 Aug 2020 at 22:25, Ashutosh Sharma  wrote:
> >
> > Hi Masahiko-san,
> >
> > Please find the updated patch with the following new changes:
> >
>
> Thank you for updating the patch!
>
> > 1) It adds the code changes in heap_force_kill function to clear an
> > all-visible bit on the visibility map corresponding to the page that
> > is marked all-visible. Along the way it also clears PD_ALL_VISIBLE
> > flag on the page header.
>
> I think we need to clear all visibility map bits by using
> VISIBILITYMAP_VALID_BITS. Otherwise, the page has all-frozen bit but
> not all-visible bit, which is not a valid state.
>

Yeah, makes sense, I will do that change in the next version of patch.

> >
> > 2) It adds the code changes in heap_force_freeze function to reset the
> > ctid value in a tuple header if it is corrupted.
> >
> > 3) It adds several notes and examples in the documentation stating
> > when and how we need to use the functions provided by this module.
> >
> > Please have a look and let me know for any other concern.
> >
>
> And here are small comments on the heap_surgery.c:
>
> +   /*
> +* Get the offset numbers from the tids belonging to one particular 
> page
> +* and process them one by one.
> +*/
> +   blkno = tids_same_page_fetch_offnums(tids, ntids, &next_start_ptr,
> +offnos);
> +
> +   /* Calculate the number of offsets stored in offnos array. */
> +   noffs = next_start_ptr - curr_start_ptr;
> +
> +   /*
> +* Update the current start pointer so that next time when
> +* tids_same_page_fetch_offnums() is called, we can calculate the 
> number
> +* of offsets present in the offnos array.
> +*/
> +   curr_start_ptr = next_start_ptr;
> +
> +   /* Check whether the block number is valid. */
> +   if (blkno >= nblocks)
> +   {
> +   ereport(NOTICE,
> +   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +errmsg("skipping block %u for relation \"%s\"
> because the block number is out of range",
> +   blkno, RelationGetRelationName(rel;
> +   continue;
> +   }
> +
> +   CHECK_FOR_INTERRUPTS();
>
> I guess it would be better to call CHECK_FOR_INTERRUPTS() at the top
> of the do loop for safety. I think it's unlikely to happen but the
> user might mistakenly specify a lot of wrong block numbers.
>

Okay, np, will shift it to top of the do loop.

> 
> +   offnos = (OffsetNumber *) palloc(ntids * sizeof(OffsetNumber));
> +   noffs = curr_start_ptr = next_start_ptr = 0;
> +   nblocks = RelationGetNumberOfBlocks(rel);
> +
> +   do
> +   {
>
> (snip)
>
> +
> +   /*
> +* Get the offset numbers from the tids belonging to one particular 
> page
> +* and process them one by one.
> +*/
> +   blkno = tids_same_page_fetch_offnums(tids, ntids, &next_start_ptr,
> +offnos);
> +
> +   /* Calculate the number of offsets stored in offnos array. */
> +   noffs = next_start_ptr - curr_start_ptr;
> +
>
> (snip)
>
> +   /* No ereport(ERROR) from here until all the changes are logged. */
> +   START_CRIT_SECTION();
> +
> +   for (i = 0; i < noffs; i++)
>
> You copy all offset numbers belonging to the same page to palloc'd
> array, offnos, and iterate it while processing the tuples. I might be
> missing something but I think we can do that without allocating the
> space for offset numbers. Is there any reason for this? I guess we can
> do that by just iterating the sorted tids array.
>

Hmmm.. okay, I see your point. I think probably what you are trying to
suggest here is to make use of the current and next start pointers to
get the tids belonging to the same page and process them one by one
instead of fetching the offset numbers of all tids belonging to one
page into the offnos array and then iterate through the offnos array.
I think that is probably possible and I will try to do that in the
next version of patch. If there is something else that you have in
your mind, please let me know.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com




Fix a couple of misuages of bms_num_members()

2020-08-25 Thread David Rowley
I noticed today there are a few places where we use bms_num_memebers()
where we really should be using bms_membership().  These are not bugs,
they're mostly just bad examples to leave laying around, at best, and
a small performance penalty, at worst.

Unless there are any objections, I plan to push this to master only in
about 10 hours time.

David


fixup_bms_num_members_misusages.patch
Description: Binary data


Re: Fix a couple of misuages of bms_num_members()

2020-08-25 Thread Tomas Vondra

On Wed, Aug 26, 2020 at 12:51:37AM +1200, David Rowley wrote:

I noticed today there are a few places where we use bms_num_memebers()
where we really should be using bms_membership().  These are not bugs,
they're mostly just bad examples to leave laying around, at best, and
a small performance penalty, at worst.

Unless there are any objections, I plan to push this to master only in
about 10 hours time.



Seems OK to me. Thanks.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Problems with the FSM, heap fillfactor, and temporal locality

2020-08-25 Thread Stephen Frost
Greetings,

* Peter Geoghegan (p...@bowt.ie) wrote:
> On Mon, Aug 24, 2020 at 6:38 AM John Naylor  
> wrote:
> > Other ideas?
> 
> I've been experimenting with changing the way that we enforce heap
> fill factor with calls to heap_insert() (and even heap_update()) that
> happen to occur at a "natural temporal boundary". This works by
> remembering an XID alongside the target block in the relcache when the
> target block is set. When we have an existing target block whose XID
> does not match our backend's current XID (i.e. it's an old XID for the
> backend), then that means we're at one of these boundaries. We require
> that the page has a little more free space before we'll insert on it
> when at a boundary. If we barely have enough space to insert the
> incoming heap tuple, and it's the first of a few tuples the
> transaction will ultimately insert, then we should start early on a
> new page instead of using the last little bit of space (note that the
> "last little bit" of space does not include the space left behind by
> fill factor). The overall effect is that groups of related tuples are
> much less likely to span a heap page boundary unless and until we have
> lots of updates -- though maybe not even then. I think that it's very
> common for transactions to insert a group of 2 - 15 logically related
> tuples into a table at a time.

This all definitely sounds quite interesting and the idea to look at the
XID to see if we're in the same transaction and therefore likely
inserting a related tuple certainly makes some sense.  While I get that
it might not specifically work with TPC-C, I'm wondering about if we
could figure out how to make a multi-tuple INSERT use
heap/table_multi_insert (which seems to only be used by COPY currently,
and internally thanks to the recent work to use it for some catalog
tables) and then consider the size of the entire set of tuples being
INSERT'd when working to find a page, or deciding if we should extend
the relation.

> Roughly speaking, you can think of this as the heapam equivalent of
> the nbtree page split choice logic added by commit fab25024. We ought
> to go to at least a little bit of effort to minimize the number of
> distinct XIDs that are present on each heap page (in the tuple
> headers). We can probably figure out heuristics that result in
> respecting heap fill factor on average, while giving inserts (and even
> non-HOT updates) a little wiggle room when it comes to heap page
> boundaries.

Agreed.

> By applying both of these techniques together (locality/page split
> thing and real atomic ops for fp_next_slot) the prototype patch I'm
> working on mostly restores the system's current ability to reuse space
> (as measured by the final size of relations when everything is done),
> while maintaining most of the performance benefits of not using the
> FSM at all. The code is still pretty rough, though.
> 
> I haven't decided how far to pursue this. It's not as if there are
> that many ways to make TPC-C go 5%+ faster left; it's very
> write-heavy, and stresses many different parts of the system all at
> once. I'm sure that anything like my current prototype patch will be
> controversial, though. Maybe it will be acceptable if we only change
> the behavior for people that explicitly set heap fillfactor.

Getting a 5% improvement is pretty exciting, very cool and seems worth
spending effort on.

Thanks,

Stephen


signature.asc
Description: PGP signature


[PATCH] Resource leaks (src/backend/libpq/hba.c)

2020-08-25 Thread Ranier Vilela
Hi Tom,

Per Coverity.

The function parse_hba_auth_op at (src/backend/libpq/hba.c) allows resource
leaks when called
by the function parse_hba_line, with parameters LOG and DEBUG3 levels.

The function SplitGUCList (src/bin/pg_dump/dumputils.c) allows even
returning FALSE,
that namelist list is not empty and as memory allocated by pg_malloc.

The simplest solution is free namelist, even when calling ereport, why the
level can be
LOG or DEBUG3.

regards,
Ranier Vilela

PS. Are two SplitGUCList in codebase.
1. SplitGUCList (src/bin/pg_dump/dumputils.c)
2. SplitGUCList (src/backend/utils/adt/varlena.c)


fix_resource_leak_dba.patch
Description: Binary data


Re: Refactor pg_rewind code and make it work against a standby

2020-08-25 Thread Heikki Linnakangas

On 20/08/2020 11:32, Kyotaro Horiguchi wrote:

0002: Rewording that old->target and new->source makes the meaning far
clearer. Moving decisions core code into filemap_finalize is
reasonable.

 By the way, some of the rules are remaining in
 process_source/target_file. For example, pg_wal that is a symlink,
 or tmporary directories. and excluded files.  The number of
 excluded files doesn't seem so large so it doesn't seem that the
 exclusion give advantage so much.  They seem to me movable to
 filemap_finalize, and we can get rid of the callbacks by doing
 so. Is there any reason that the remaining rules should be in the
 callbacks?


Good idea! I changed the patch that way.


0003: Thomas is propsing sort template. It could be used if committed.

0004:

  The names of many of the functions gets an additional word "local"
  but I don't get the meaning clearly. but its about linguistic sense
  and I'm not fit to that..
  
-rewind_copy_file_range(const char *path, off_t begin, off_t end, bool trunc)

+local_fetch_file_range(rewind_source *source, const char *path, uint64 off,

  The function actually copying the soruce range to the target file. So
  "fetch" might give somewhat different meaning, but its about
  linguistic (omitted..).


Hmm. It is "fetching" the range from the source server, and writing it 
to the target. The term makes more sense with a libpq source. Perhaps 
this function should be called "local_copy_range" or something, but it'd 
also be nice to have "fetch" in the name because the function pointer 
it's assigned to is called "queue_fetch_range".



I'm going to continue reviewing this later.


Thanks! Attached is a new set of patches. The only meaningful change is 
in the 2nd patch, which I modified per your suggestion. Also, I moved 
the logic to decide each file's fate into a new subroutine called 
decide_file_action().


- Heikki
>From 1effdaeabb843b40575ae898ec9de57ffc37f301 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 19 Aug 2020 15:34:35 +0300
Subject: [PATCH v2 1/5] pg_rewind: Move syncTargetDirectory() to file_ops.c

For consistency. All the other low-level functions that operate on the
target directory are in file_ops.c.
---
 src/bin/pg_rewind/file_ops.c  | 19 +++
 src/bin/pg_rewind/file_ops.h  |  1 +
 src/bin/pg_rewind/pg_rewind.c | 22 +-
 src/bin/pg_rewind/pg_rewind.h |  1 +
 4 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/src/bin/pg_rewind/file_ops.c b/src/bin/pg_rewind/file_ops.c
index b3bf091c546..55439db20ba 100644
--- a/src/bin/pg_rewind/file_ops.c
+++ b/src/bin/pg_rewind/file_ops.c
@@ -19,6 +19,7 @@
 #include 
 
 #include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "file_ops.h"
 #include "filemap.h"
 #include "pg_rewind.h"
@@ -266,6 +267,24 @@ remove_target_symlink(const char *path)
  dstpath);
 }
 
+/*
+ * Sync target data directory to ensure that modifications are safely on disk.
+ *
+ * We do this once, for the whole data directory, for performance reasons.  At
+ * the end of pg_rewind's run, the kernel is likely to already have flushed
+ * most dirty buffers to disk.  Additionally fsync_pgdata uses a two-pass
+ * approach (only initiating writeback in the first pass), which often reduces
+ * the overall amount of IO noticeably.
+ */
+void
+sync_target_dir(void)
+{
+	if (!do_sync || dry_run)
+		return;
+
+	fsync_pgdata(datadir_target, PG_VERSION_NUM);
+}
+
 
 /*
  * Read a file into memory. The file to be read is /.
diff --git a/src/bin/pg_rewind/file_ops.h b/src/bin/pg_rewind/file_ops.h
index 025f24141c9..d8466385cf5 100644
--- a/src/bin/pg_rewind/file_ops.h
+++ b/src/bin/pg_rewind/file_ops.h
@@ -19,6 +19,7 @@ extern void remove_target_file(const char *path, bool missing_ok);
 extern void truncate_target_file(const char *path, off_t newsize);
 extern void create_target(file_entry_t *t);
 extern void remove_target(file_entry_t *t);
+extern void sync_target_dir(void);
 
 extern char *slurpFile(const char *datadir, const char *path, size_t *filesize);
 
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index 23fc749e445..c9b9e480c0f 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -20,7 +20,6 @@
 #include "catalog/pg_control.h"
 #include "common/controldata_utils.h"
 #include "common/file_perm.h"
-#include "common/file_utils.h"
 #include "common/restricted_token.h"
 #include "common/string.h"
 #include "fe_utils/recovery_gen.h"
@@ -38,7 +37,6 @@ static void createBackupLabel(XLogRecPtr startpoint, TimeLineID starttli,
 
 static void digestControlFile(ControlFileData *ControlFile, char *source,
 			  size_t size);
-static void syncTargetDirectory(void);
 static void getRestoreCommand(const char *argv0);
 static void sanityChecks(void);
 static void findCommonAncestorTimeline(XLogRecPtr *recptr, int *tliIndex);
@@ -455,7 +453,7 @@ main(int argc, char **argv)
 
 	if (showprogress)

Re: passwordcheck: Log cracklib diagnostics

2020-08-25 Thread Laurenz Albe
On Tue, 2020-08-25 at 13:48 +0200, Daniel Gustafsson wrote:
> > On 25 Aug 2020, at 12:20, Peter Eisentraut 
> >  wrote:
> > 
> > A user tried to use the cracklib build-time option of the passwordcheck 
> > module.  This failed, as it turned out because there was no dictionary 
> > installed in the right place, but the error was not
> > properly reported, because the existing code just throws away the error 
> > message from cracklib.  Attached is a patch that changes this by logging 
> > any error message returned from the cracklib call.
> 
> +1 on this, it's also in line with the example documentation from cracklib.
> The returned error is potentially a bit misleading now, as it might say claim
> that a strong password is easily cracked if the dictionary fails load.  Given
> that there is no way to distinguish between the class of returned errors it's
> hard to see how we can do better though.
> 
> While poking at this, we might as well update the docs to point to the right
> URL for CrackLib as it moved from Sourceforge five years ago.  The attached
> diff fixes that.

+1 on both patches.

Yours,
Laurenz Albe





Move OpenSSL random under USE_OPENSSL_RANDOM

2020-08-25 Thread Daniel Gustafsson
The USE_OPENSSL_RANDOM macro is defined when OpenSSL is used as a randomness
provider, but the implementation of strong randomness is guarded by USE_OPENSSL
in most places.  This is technically the same thing today, but it seems
hygienic to use the appropriate macro in case we ever want to allow OS
randomness together with OpenSSL or something similar (or just make git grep
easier which is my itch to scratch with this).

The attached moves all invocations under the correct guards.  RAND_poll() in
fork_process.c needs to happen for both OpenSSL and OpenSSL random, thus the
check for both.

cheers ./daniel



openssl_random_macros.patch
Description: Binary data


Re: Fix for configure error in 9.5/9.6 on macOS 11.0 Big Sur

2020-08-25 Thread Peter Eisentraut

On 2020-08-02 23:18, Tom Lane wrote:

Thomas Gilligan  writes:

Under the next version of macOS (11.0 unreleased beta 3), configuring Postgres 
9.5 and 9.6 fails with



checking test program... ok
checking whether long int is 64 bits... no
checking whether long long int is 64 bits... no
configure: error: Cannot find a working 64-bit integer type.


Hm, could we see the config.log output for this?  I'm not 100% convinced
that you've diagnosed the problem accurately, because it'd imply that
Apple made some fundamentally incompatible changes in libc, which
seems like stirring up trouble for nothing.


It looks like the new compiler errors out on calling undeclared 
functions.  Might be good to see config.log to confirm this.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: renaming configure.in to configure.ac

2020-08-25 Thread Peter Eisentraut

On 2020-07-24 15:23, Tom Lane wrote:

Peter Eisentraut  writes:

On 2020-07-17 10:46, Peter Eisentraut wrote:

v1-0001-Rename-configure.in-to-configure.ac.patch



I have committed that, and I have sent feedback to the Autoconf
developers about our concerns about the slowness of some of the new tests.


The slow C++ feature test has been fixed in Autoconf git.


Sounds good.  Do we want to try Noah's idea of temporarily committing
the remaining changes, to see if the buildfarm is happy?


I think to get value out of this you'd have to compare the config.status 
output files (mainly pg_config.h and Makefile.global) before and after. 
Otherwise you're just testing that the shell can parse the script. 
Perhaps some manual tests on, say, AIX and HP-UX using the native shell 
would be of some value.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: "cert" + clientcert=verify-ca in pg_hba.conf?

2020-08-25 Thread Bruce Momjian
On Tue, Aug 25, 2020 at 03:53:20PM +0900, Kyotaro Horiguchi wrote:
> At Mon, 24 Aug 2020 23:04:51 -0400, Bruce Momjian  wrote in 
> > > > I don't see "no-verify" mentioned anywhere in our docs.
> > > 
> > > no-verify itself is mentioned here.
> > > 
> > > https://www.postgresql.org/docs/13/ssl-tcp.html#SSL-CLIENT-CERTIFICATES
> > 
> > Oh, I see it now, thanks.  Do you have any idea what this part of the
> > docs means?
> > 
> > When clientcert is not specified or is set to
> > no-verify, the server will still verify any presented
> > client certificates against its CA file, if one is configured —
> > but it will not insist that a client certificate be presented.
> 
> Ah.. Indeed.
> 
> Even if clientcert is not set or set to no-verify, it checks client
> certificate against the CA if any. If verify-ca, client certificate
> must be provided. As the result, no-verify actually fails if client
> had a certificate that is not backed by the CA.

I think there are a few problems here.  In the docs, it says "will still
verify", but it doesn't say if it verifies the CA, or the CA _and_ the
CN/username.

Second, since it is optional, what value does it have?

> > Why is this useful?
> 
> I agree, but there seems to be an implementation reason for the
> behavior. To identify an hba-line, some connection parameters like
> user name and others sent over a connection is required.  Thus the
> clientcert option in the to-be-identified hba-line is unknown prior to
> the time SSL connection is made. So the documentation might need
> amendment. Roughly something like the following?

Well, I realize internally we need a way to indicate clientcert is not
used, but why do we bother exposing that to the user as a named option?

And you are right that the option name 'no-verify' is wrong since it
will verify the CA if it exists, so it more like 'optionally-verify',
which seems useless from a user interface perspective.

I guess the behavior of no-verify matches our client-side
sslmode=prefer, but at least that has the value of using SSL if
available, which prevents user-visible network traffic, but doesn't
force it, but I am not sure what the value of optional certificate
verification is, since verification is all it does.  I guess it should
be called "prefer-verify".

> ===
> When clientcert is not specified or is set
> tono-verify, clients can connect to server without
> having a client certificate.
> 
> Note: Regardless of the setting of clientcert,
> connection can end with failure if a client certificate that cannot be
> verified by the server is stored in the ~/.postgresql directory.
> ===
> 
> By the way, the following table line might need to be changed?
> 
> libpq-ssl.html:
> 
> >  ~/.postgresql/postgresql.crt
> >  client certificate
> -  requested by server
> 
> The file is actually not requested by server, client just pushes to
> server if any, unconditionally.
> 
> +  sent to server

I have just applied this change to all branches, since it is an
independent fix.  Thanks.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-25 Thread Dilip Kumar
On Tue, Aug 25, 2020 at 6:27 PM Amit Kapila  wrote:
>
> On Tue, Aug 25, 2020 at 10:41 AM Dilip Kumar  wrote:
> >
> > On Tue, Aug 25, 2020 at 9:31 AM Amit Kapila  wrote:
> > >
> > >
> > > I think the existing design is superior as it allows the flexibility
> > > to create transaction files in different temp_tablespaces which is
> > > quite important to consider as we know the files will be created only
> > > for large transactions. Once we fix the sharedfileset for a worker all
> > > the files will be created in the temp_tablespaces chosen for the first
> > > time apply worker creates it even if it got changed at some later
> > > point of time (user can change its value and then do reload config
> > > which I think will impact the worker settings as well). This all can
> > > happen because we set the tablespaces at the time of
> > > SharedFileSetInit.
> >
> > Yeah, I agree with this point,  that if we use the single shared
> > fileset then it will always use the same tablespace for all the
> > streaming transactions.  And, we might get the benefit of concurrent
> > I/O if we use different tablespaces as we are not immediately flushing
> > the files to the disk.
> >
>
> Okay, so let's retain the original approach then. I have made a few
> cosmetic modifications in the first two patches which include updating
> docs, comments, slightly modify the commit message, and change the
> code to match the nearby code. One change which you might have a
> different opinion is below:
>
> + case WAIT_EVENT_LOGICAL_CHANGES_READ:
> + event_name = "ReorderLogicalChangesRead";
> + break;
> + case WAIT_EVENT_LOGICAL_CHANGES_WRITE:
> + event_name = "ReorderLogicalChangesWrite";
> + break;
> + case WAIT_EVENT_LOGICAL_SUBXACT_READ:
> + event_name = "ReorderLogicalSubxactRead";
> + break;
> + case WAIT_EVENT_LOGICAL_SUBXACT_WRITE:
> + event_name = "ReorderLogicalSubxactWrite";
> + break;
>
> Why do we want to name these events starting with name as Reorder*? I
> think these are used in subscriber-side, so no need to use the word
> Reorder, so I have removed it from the attached patch. I am planning
> to push the first patch (v53-0001-Extend-the-BufFile-interface) in
> this series tomorrow unless you have any comments on the same.

Your changes in 0001 and 0002,  looks fine to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




[PATCH] Dereference null return value (NULL_RETURNS) (src/backend/commands/statscmds.c)

2020-08-25 Thread Ranier Vilela
Hi Tom,

Per Coverity.

The SearchSysCache1 allows return NULL and at function AlterStatistics,
has one mistake, lack of, check of return, which enables a dereference NULL
pointer,
at function heap_modify_tuple.

While there is room for improvement.
Avoid calling SearchSysCache1 and table_open if the user "is not the owner
of the existing statistics object".

regards,
Ranier Vilela


fix_dereference_null_statscmds.patch
Description: Binary data


Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-25 Thread Andres Freund
Hi,

On 2020-08-25 20:48:37 +1200, David Rowley wrote:
> On Tue, 25 Aug 2020 at 08:26, Andres Freund  wrote:
> > While I'm against introducing a separate node for the caching, I'm *not*
> > against displaying a different node type when caching is
> > present. E.g. it'd be perfectly reasonable from my POV to have a 'Cached
> > Nested Loop' join and a plain 'Nested Loop' node in the above node. I'd
> > probably still want to display the 'Cache Key' similar to your example,
> > but I don't see how it'd be better to display it with one more
> > intermediary node.
> 
> ...Well, this is difficult... For the record, in case anyone missed
> it, I'm pretty set on being against doing any node overloading for
> this.  I think it's a pretty horrid modularity violation regardless of
> what text appears in EXPLAIN. I think if we merge these nodes then we
> may as well go further and merge in other simple nodes like LIMIT.

Huh? That doesn't make any sense. LIMIT is applicable to every single
node type with the exception of hash. The caching you talk about is
applicable only to node types that parametrize their sub-nodes, of which
there are exactly two instances.

Limit doesn't shuttle through huge amounts of tuples normally. What you
talk about does.



> Also, just in case anyone is misunderstanding this Andres' argument.
> It's entirely based on the performance impact of having an additional
> node.

Not entirely, no. It's also just that it doesn't make sense to have two
nodes setting parameters that then half magically picked up by a special
subsidiary node type and used as a cache key. This is pseudo modularity,
not real modularity. And makes it harder to display useful information
in explain etc. And makes it harder to e.g. clear the cache in cases we
know that there's no further use of the current cache. At least without
piercing the abstraction veil.


> However, given the correct planner choice, there will never be
> a gross slowdown due to having the extra node.

There'll be a significant reduction in increase in performance.


> I understand that you've voiced your feelings about this, but what I
> want to know is, how strongly do you feel about overloading the node?
> Will you stand in my way if I want to push ahead with the separate
> node?  Will anyone else?

I feel pretty darn strongly about this. If there's plenty people on your
side I'll not stand in your way, but I think this is a bad design based on
pretty flimsy reasons.

Greetings,

Andres Freund




Re: Continuing instability in insert-conflict-specconflict test

2020-08-25 Thread Tom Lane
I wrote:
> I've spent the day fooling around with a re-implementation of
> isolationtester that waits for all its controlled sessions to quiesce
> (either wait for client input, or block on a lock held by another
> session) before moving on to the next step.  That was not a feasible
> approach before we had the wait_event infrastructure, but it's
> seeming like it might be workable now.  Still have a few issues to
> sort out though ...

I wasted a good deal of time on this idea, and eventually concluded
that it's a dead end, because there is an unremovable race condition.
Namely, that even if the isolationtester's observer backend has
observed that test session X has quiesced according to its
wait_event_info, it is possible for the report of that fact to arrive
at the isolationtester client process before test session X's output
does.

It's quite obvious how that might happen if the isolationtester is
on a different machine than the PG server --- just imagine a dropped
packet in X's output that has to be retransmitted.  You might think
it shouldn't happen within a single machine, but I'm seeing results
that I cannot explain any other way (on an 8-core RHEL8 box).
It appears to not be particularly rare, either.

> Andres Freund  writes:
>> ISTM the issue at hand isn't so much that X expects something to be
>> printed by Y before it terminates, but that it expects the next step to
>> not be executed before Y unlocks. If I understand the wrong output
>> correctly, what happens is that "controller_print_speculative_locks" is
>> executed, even though s1 hasn't yet acquired the next lock.

The problem as I'm now understanding it is that
insert-conflict-specconflict.spec issues multiple commands in sequence
to its "controller" session, and expects that NOTICE outputs from a
different test session will arrive at a determinate point in that
sequence.  In practice that's not guaranteed, because (a) the other
test session might not send the NOTICE soon enough --- as my modified
specfile proves --- and (b) even if the NOTICE is timely sent, the
kernel will not guarantee timely receipt.  We could fix (a) by
introducing some explicit interlock between the controller and test
sessions, but (b) is a killer.

I think what we have to do to salvage this test is to get rid of the
use of NOTICE outputs, and instead have the test functions insert
log records into some table, which we can inspect after the fact
to verify that things happened as we expect.

regards, tom lane




[PATCH] Fix Uninitialized scalar variable (UNINIT) (src/backend/access/heap/heapam_handler.c)

2020-08-25 Thread Ranier Vilela
Hi Tom,

Per Coverity.

The variable root_offsets is read at line 1641, but, at this point,
the content is unknown, so it is impossible to test works well.

regards,
Ranier Vilela


fix_uninitialized_variable_heapam_handler.patch
Description: Binary data


Re: renaming configure.in to configure.ac

2020-08-25 Thread Tom Lane
Peter Eisentraut  writes:
> On 2020-07-24 15:23, Tom Lane wrote:
>> Sounds good.  Do we want to try Noah's idea of temporarily committing
>> the remaining changes, to see if the buildfarm is happy?

> I think to get value out of this you'd have to compare the config.status 
> output files (mainly pg_config.h and Makefile.global) before and after. 
> Otherwise you're just testing that the shell can parse the script. 
> Perhaps some manual tests on, say, AIX and HP-UX using the native shell 
> would be of some value.

I already did that on assorted boxes, using the patches you previously
posted [1].  Do you think there's value in re-doing it manually,
rather than just having at it with the buildfarm?

(I did not try to test whether the configure script itself could be
regenerated on an ancient platform; I doubt we care.)

regards, tom lane

[1] https://www.postgresql.org/message-id/30379.1595117752%40sss.pgh.pa.us




Re: some unused parameters cleanup

2020-08-25 Thread Tom Lane
Peter Eisentraut  writes:
> Here is a series of patches to remove some unused function parameters. 
> In each case, the need for them was removed by some other code changes 
> over time but the unusedness was not noticed.  I have included a 
> reference to when they became unused in each case.

For some of these, there's an argument for keeping the unused parameter
for consistency with sibling functions that do use it.  Not sure how
important that is, though.

regards, tom lane




Re: Continuing instability in insert-conflict-specconflict test

2020-08-25 Thread Andrew Dunstan


On 8/24/20 4:42 PM, Andres Freund wrote:
>
> This test is really hairy, which isn't great. But until we have a proper
> framework for controlling server side execution, I am not sure how we
> better can achieve test coverage for this stuff. And there've been bugs,
> so it's worth testing.
>


What would the framework look like?


cheers


andrew


-- 

Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Out-of-bounds access (ARRAY_VS_SINGLETON) (src/backend/access/nbtree/nbtdedup.c)

2020-08-25 Thread Ranier Vilela
Hi,

Per Coverity.

ARRAY vs SINGLETON

If variable htids is accessed like array, but is a simple pointer, can be
"This might corrupt or misinterpret adjacent memory locations."

at line 723:
/* Form standard non-pivot tuple */
itup->t_info &= ~INDEX_ALT_TID_MASK;
htids = &itup->t_tid;

1. Here htids is a SINGLETON?

So:

At line 723:
htids[ui++] = *BTreeTupleGetPostingN(origtuple, i);

2. htids is accessed how ARRAY?

And is acessed at positions 0 and 1, according (nhtids == 1):
Assert(ui == nhtids);

The htids[1] are destroying something at this memory position.

regards,
Ranier Vilela


Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-25 Thread Andy Fan
On Tue, Aug 25, 2020 at 11:53 PM Andres Freund  wrote:

> Hi,
>
> On 2020-08-25 20:48:37 +1200, David Rowley wrote:
> > On Tue, 25 Aug 2020 at 08:26, Andres Freund  wrote:
> > > While I'm against introducing a separate node for the caching, I'm
> *not*
> > > against displaying a different node type when caching is
> > > present. E.g. it'd be perfectly reasonable from my POV to have a
> 'Cached
> > > Nested Loop' join and a plain 'Nested Loop' node in the above node. I'd
> > > probably still want to display the 'Cache Key' similar to your example,
> > > but I don't see how it'd be better to display it with one more
> > > intermediary node.
> >
> > ...Well, this is difficult... For the record, in case anyone missed
> > it, I'm pretty set on being against doing any node overloading for
> > this.  I think it's a pretty horrid modularity violation regardless of
> > what text appears in EXPLAIN. I think if we merge these nodes then we
> > may as well go further and merge in other simple nodes like LIMIT.
>
> Huh? That doesn't make any sense. LIMIT is applicable to every single
> node type with the exception of hash. The caching you talk about is
> applicable only to node types that parametrize their sub-nodes, of which
> there are exactly two instances.
>
> Limit doesn't shuttle through huge amounts of tuples normally. What you
> talk about does.
>
>
>
> > Also, just in case anyone is misunderstanding this Andres' argument.
> > It's entirely based on the performance impact of having an additional
> > node.
>
> Not entirely, no. It's also just that it doesn't make sense to have two
> nodes setting parameters that then half magically picked up by a special
> subsidiary node type and used as a cache key. This is pseudo modularity,
> not real modularity. And makes it harder to display useful information
> in explain etc. And makes it harder to e.g. clear the cache in cases we
> know that there's no further use of the current cache. At least without
> piercing the abstraction veil.
>
>
> > However, given the correct planner choice, there will never be
> > a gross slowdown due to having the extra node.
>
> There'll be a significant reduction in increase in performance.


If this is a key blocking factor for this topic, I'd like to do a simple
hack
to put the cache function into the subplan node, then do some tests to
show the real difference.  But it is better to decide how much difference
can be thought of as a big difference.  And  for education purposes,
I'd like to understand where these differences come from.  For my
current knowledge,  my basic idea is it saves some function calls?


>

> > I understand that you've voiced your feelings about this, but what I
> > want to know is, how strongly do you feel about overloading the node?
> > Will you stand in my way if I want to push ahead with the separate
> > node?  Will anyone else?
>
> I feel pretty darn strongly about this. If there's plenty people on your
> side I'll not stand in your way, but I think this is a bad design based on
> pretty flimsy reasons.
>
>
Nice to see the different opinions from two great guys and interesting to
see how this can be resolved at last:)

-- 
Best Regards
Andy Fan


Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-25 Thread Andy Fan
On Tue, Aug 25, 2020 at 11:53 PM Andres Freund  wrote:

> Hi,
>
> On 2020-08-25 20:48:37 +1200, David Rowley wrote:
> > On Tue, 25 Aug 2020 at 08:26, Andres Freund  wrote:
> > > While I'm against introducing a separate node for the caching, I'm
> *not*
> > > against displaying a different node type when caching is
> > > present. E.g. it'd be perfectly reasonable from my POV to have a
> 'Cached
> > > Nested Loop' join and a plain 'Nested Loop' node in the above node. I'd
> > > probably still want to display the 'Cache Key' similar to your example,
> > > but I don't see how it'd be better to display it with one more
> > > intermediary node.
> >
> > ...Well, this is difficult... For the record, in case anyone missed
> > it, I'm pretty set on being against doing any node overloading for
> > this.  I think it's a pretty horrid modularity violation regardless of
> > what text appears in EXPLAIN. I think if we merge these nodes then we
> > may as well go further and merge in other simple nodes like LIMIT.
>
> Huh? That doesn't make any sense. LIMIT is applicable to every single
> node type with the exception of hash. The caching you talk about is
> applicable only to node types that parametrize their sub-nodes, of which
> there are exactly two instances.
>
> Limit doesn't shuttle through huge amounts of tuples normally. What you
> talk about does.
>
>
>
> > Also, just in case anyone is misunderstanding this Andres' argument.
> > It's entirely based on the performance impact of having an additional
> > node.
>
> Not entirely, no. It's also just that it doesn't make sense to have two
> nodes setting parameters that then half magically picked up by a special
> subsidiary node type and used as a cache key. This is pseudo modularity,
> not real modularity. And makes it harder to display useful information
> in explain etc. And makes it harder to e.g. clear the cache in cases we
> know that there's no further use of the current cache. At least without
> piercing the abstraction veil.
>
>
Sorry that I missed this when I replied to the last thread.  I understand
this, I remain neutral about this.


> > However, given the correct planner choice, there will never be
> > a gross slowdown due to having the extra node.
>
> There'll be a significant reduction in increase in performance.
>
>
> > I understand that you've voiced your feelings about this, but what I
> > want to know is, how strongly do you feel about overloading the node?
> > Will you stand in my way if I want to push ahead with the separate
> > node?  Will anyone else?
>
> I feel pretty darn strongly about this. If there's plenty people on your
> side I'll not stand in your way, but I think this is a bad design based on
> pretty flimsy reasons.
>
> Greetings,
>
> Andres Freund
>
>
>

-- 
Best Regards
Andy Fan


Re: some unused parameters cleanup

2020-08-25 Thread Bruce Momjian
On Tue, Aug 25, 2020 at 12:59:31PM -0400, Tom Lane wrote:
> Peter Eisentraut  writes:
> > Here is a series of patches to remove some unused function parameters. 
> > In each case, the need for them was removed by some other code changes 
> > over time but the unusedness was not noticed.  I have included a 
> > reference to when they became unused in each case.
> 
> For some of these, there's an argument for keeping the unused parameter
> for consistency with sibling functions that do use it.  Not sure how
> important that is, though.

I think if they are kept for that reason, we should document that so we
know not to revisit this issue for them.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Re: [HACKERS] Custom compression methods

2020-08-25 Thread Robert Haas
On Mon, Aug 24, 2020 at 2:12 AM Dilip Kumar  wrote:
> IIUC,  the main reason for using this flag is for taking the decision
> whether we need any detoasting for this tuple.  For example, if we are
> rewriting the table because the compression method is changed then if
> HEAP_HASCUSTOMCOMPRESSED bit is not set in the tuple header and tuple
> length, not tup->t_len > TOAST_TUPLE_THRESHOLD then we don't need to
> call heap_toast_insert_or_update function for this tuple.  Whereas if
> this flag is set then we need to because we might need to uncompress
> and compress back using a different compression method.  The same is
> the case with INSERT into SELECT * FROM.

This doesn't really seem worth it to me. I don't see how we can
justify burning an on-disk bit just to save a little bit of overhead
during a rare maintenance operation. If there's a performance problem
here we need to look for another way of mitigating it. Slowing CLUSTER
and/or VACUUM FULL down by a large amount for this feature would be
unacceptable, but is that really a problem? And if so, can we solve it
without requiring this bit?

> > > something, but I'd really strongly suggest looking for a way to get
> > > rid of this. It also invents the concept of a TupleDesc flag, and the
> > > flag invented is TD_ATTR_CUSTOM_COMPRESSED; I'm not sure I see why we
> > > need that, either.
>
> This is also used in a similar way as the above but for the target
> table, i.e. if the target table has the custom compressed attribute
> then maybe we can not directly insert the tuple because it might have
> compressed data which are compressed using the default compression
> methods.

I think this is just an in-memory flag, which is much less precious
than an on-disk bit. However, I still wonder whether it's really the
right design. I think that if we offer lz4 we may well want to make it
the default eventually, or perhaps even right away. If that ends up
causing this flag to get set on every tuple all the time, then it
won't really optimize anything.

> I have already extracted these 2 patches from the main patch set.
> But, in these patches, I am still storing the am_oid in the toast
> header.  I am not sure can we get rid of that at least for these 2
> patches?  But, then wherever we try to uncompress the tuple we need to
> know the tuple descriptor to get the am_oid but I think that is not
> possible in all the cases.  Am I missing something here?

I think we should instead use the high bits of the toast size word for
patches #1-#4, as discussed upthread.

> > > Patch #3. Add support for changing the compression method associated
> > > with a column, forcing a table rewrite.
> > >
> > > Patch #4. Add support for PRESERVE, so that you can change the
> > > compression method associated with a column without forcing a table
> > > rewrite, by including the old method in the PRESERVE list, or with a
> > > rewrite, by not including it in the PRESERVE list.
>
> Does this make sense to have Patch #3 and Patch #4, without having
> Patch #5? I mean why do we need to support rewrite or preserve unless
> we have the customer compression methods right? because the build-in
> compression method can not be dropped so why do we need to preserve?

I think that patch #3 makes sense because somebody might have a table
that is currently compressed with pglz and they want to switch to lz4,
and I think patch #4 also makes sense because they might want to start
using lz4 for future data but not force a rewrite to get rid of all
the pglz data they've already got. Those options are valuable as soon
as there is more than one possible compression algorithm, even if
they're all built in. Now, as I said upthread, it's also true that you
could do #5 before #3 and #4. I don't think that's insane. But I
prefer it in the other order, because I think having #5 without #3 and
#4 wouldn't be too much fun for users.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: some unused parameters cleanup

2020-08-25 Thread Andreas Karlsson

On 8/25/20 7:42 PM, Bruce Momjian wrote:

On Tue, Aug 25, 2020 at 12:59:31PM -0400, Tom Lane wrote:

Peter Eisentraut  writes:

Here is a series of patches to remove some unused function parameters.
In each case, the need for them was removed by some other code changes
over time but the unusedness was not noticed.  I have included a
reference to when they became unused in each case.


For some of these, there's an argument for keeping the unused parameter
for consistency with sibling functions that do use it.  Not sure how
important that is, though.


I think if they are kept for that reason, we should document that so we
know not to revisit this issue for them.


+1

That way we can avoid new people discovering the same unused parameters 
and then submitting patches for them.


Andreas





Re: LWLockAcquire and LockBuffer mode argument

2020-08-25 Thread Robert Haas
On Mon, Aug 24, 2020 at 6:35 PM Andres Freund  wrote:
> Thoughts?

This is likely to cause a certain amount of annoyance to many
PostgreSQL developers, but if you have evidence that it will improve
performance significantly, I think it's very reasonable to do it
anyway. However, if we do it all in a backward-compatible way as you
propose, then we're likely to keep reintroducing code that does it the
old way for a really long time. I'm not sure that actually makes a lot
of sense. It might be better to just bite the bullet and make a hard
break.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Out-of-bounds access (ARRAY_VS_SINGLETON) (src/backend/access/nbtree/nbtdedup.c)

2020-08-25 Thread Peter Geoghegan
On Tue, Aug 25, 2020 at 10:15 AM Ranier Vilela  wrote:
> If variable htids is accessed like array, but is a simple pointer, can be
> "This might corrupt or misinterpret adjacent memory locations."

This exact Coverity complaint has already been discussed, and marked
as a false positive on the community's Coverity installation.

-- 
Peter Geoghegan




Re: More tests with USING INDEX replident and dropped indexes

2020-08-25 Thread Rahila

Hi,



I couldn't test the patch as it does not apply cleanly on master.

The CF bot is green, and I can apply v2 cleanly FWIW:
http://commitfest.cputube.org/michael-paquier.html


Sorry, I didn't apply correctly.  The  tests pass for me. In addition, I 
tested


with partitioned tables.  It works as expected and makes the REPLICA 
IDENTITY


'n' for the partitions as well when an index on a partitioned table is 
dropped.



Thank you,




Re: LWLockAcquire and LockBuffer mode argument

2020-08-25 Thread Andres Freund
Hi,

On 2020-08-25 13:59:35 -0400, Robert Haas wrote:
> On Mon, Aug 24, 2020 at 6:35 PM Andres Freund  wrote:
> > Thoughts?
> 
> This is likely to cause a certain amount of annoyance to many
> PostgreSQL developers, but if you have evidence that it will improve
> performance significantly, I think it's very reasonable to do it
> anyway.

I don't think it'll be a "significant" performance benefit directly. It
appears to be measurable, but I think to reach significant performance
improvements it'll take a while and it'll come from profilers and other
tools working better.

> However, if we do it all in a backward-compatible way as you propose,
> then we're likely to keep reintroducing code that does it the old way
> for a really long time. I'm not sure that actually makes a lot of
> sense. It might be better to just bite the bullet and make a hard
> break.

It seems easy enough to slap a compiler "enforced" deprecation warning
on the new compat version, in master only. Seems unnecessary to make
life immediately harder for extensions authors desiring cross-version
compatibility.

Greetings,

Andres Freund




Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-08-25 Thread Robert Haas
On Tue, Aug 25, 2020 at 8:17 AM Masahiko Sawada
 wrote:
> + 
> + 
> +  While performing surgery on a damaged relation, we must not be doing 
> anything
> +  else on that relation in parallel. This is to ensure that when we are
> +  operating on a damaged tuple there is no other transaction trying to modify
> +  that tuple.
> + 
> + 
>
> If we prefer to avoid concurrent operations on the target relation why
> don't we use AccessExclusiveLock?

I disagree with the content of the note. It's up to the user whether
to perform any concurrent operations on the target relation, and in
many cases it would be fine to do so. Users who can afford to take the
table off-line to repair the problem don't really need this tool in
the first place.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: LWLockAcquire and LockBuffer mode argument

2020-08-25 Thread Robert Haas
On Tue, Aug 25, 2020 at 2:17 PM Andres Freund  wrote:
> It seems easy enough to slap a compiler "enforced" deprecation warning
> on the new compat version, in master only. Seems unnecessary to make
> life immediately harder for extensions authors desiring cross-version
> compatibility.

I don't know exactly how you'd go about implementing that, but I am
not against compatibility. I *am* against coding rules that require a
lot of manual enforcement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: LWLockAcquire and LockBuffer mode argument

2020-08-25 Thread Andres Freund
Hi,

On 2020-08-25 14:22:28 -0400, Robert Haas wrote:
> On Tue, Aug 25, 2020 at 2:17 PM Andres Freund  wrote:
> > It seems easy enough to slap a compiler "enforced" deprecation warning
> > on the new compat version, in master only. Seems unnecessary to make
> > life immediately harder for extensions authors desiring cross-version
> > compatibility.
> 
> I don't know exactly how you'd go about implementing that, but I am
> not against compatibility. I *am* against coding rules that require a
> lot of manual enforcement.

#if I_AM_GCC_OR_CLANG
#define pg_attribute_deprecated __attribute__((deprecated))
#elif I_AM_MSVC
#define pg_attribute_deprecated __declspec(deprecated)
#else
#define pg_attribute_deprecated
#endif

Greetings,

Andres Freund




Re: LWLockAcquire and LockBuffer mode argument

2020-08-25 Thread Peter Geoghegan
On Mon, Aug 24, 2020 at 3:35 PM Andres Freund  wrote:
> To avoid unnecessary backward compat pains it seems best to first
> introduce compat wrappers using the current signature, and then
> subsequently replace in-core callers with the direct calls.

I like the idea of doing this, purely to make profiler output easier
to interpret.

Passing a shared-or-exclusive flag is kind of a natural thing to do
within code like _bt_search(), where we sometimes want to
exclusive-lock the leaf level page but not the internal pages that we
descend through first. Fortunately we can handle the flag inside the
existing nbtree wrapper functions quite easily -- the recently added
_bt_lockbuf() can test the flag directly. We already have
nbtree-private flags (BT_READ and BT_WRITE) that we can continue to
use after the old interface is fully deprecated.

More generally, it probably is kind of natural to have a flag like
BUFFER_LOCK_SHARE/BUFFER_LOCK_EXCLUSIVE (though not like
BUFFER_LOCK_UNLOCK) within index access methods. But I think that
there are several good reasons to add something equivalent to
_bt_lockbuf() to all index access methods.

-- 
Peter Geoghegan




Re: Missing "Up" navigation link between parts and doc root?

2020-08-25 Thread Bruce Momjian
On Sat, Jul  4, 2020 at 08:47:53AM +0200, Fabien COELHO wrote:
> 
> Hello Peter,
> 
> > The original stylesheets explicitly go out of their way to do it that
> > way. We can easily fix that by removing that special case.  See attached
> > patch.
> > 
> > That patch only fixes it for the header.  To fix it for the footer as
> > well, we'd first need to import the navfooter template to be able to
> > customize it.
> 
> Thanks for the patch, which applies cleanly, doc compiles, works for me with
> w3m.
> 
> > Not a big problem though.
> 
> Nope, just mildly irritating for quite a long time:-) So I'd go for back
> patching if it applies cleanly.

Can we get Peter's patch for this applied soon?  Thanks.  Should I apply
it?

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: [PATCH] Fix Uninitialized scalar variable (UNINIT) (src/backend/access/heap/heapam_handler.c)

2020-08-25 Thread Alvaro Herrera
On 2020-Aug-25, Ranier Vilela wrote:

> The variable root_offsets is read at line 1641, but, at this point,
> the content is unknown, so it is impossible to test works well.

Surely it is set by heap_get_root_tuples() in line 1347?  The root_blkno
variable is used exclusively to know whether root_offsets has been
initialized for the current block.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Fix Uninitialized scalar variable (UNINIT) (src/backend/access/heap/heapam_handler.c)

2020-08-25 Thread Ranier Vilela
Em ter., 25 de ago. de 2020 às 18:06, Alvaro Herrera <
alvhe...@2ndquadrant.com> escreveu:

> On 2020-Aug-25, Ranier Vilela wrote:
>
> > The variable root_offsets is read at line 1641, but, at this point,
> > the content is unknown, so it is impossible to test works well.
>
> Surely it is set by heap_get_root_tuples() in line 1347?  The root_blkno
> variable is used exclusively to know whether root_offsets has been
> initialized for the current block.
>
Hi Álvaro,

20. Condition hscan->rs_cblock != root_blkno, taking false branch.

If the variable hscan->rs_cblock is InvalidBlockNumber the test can
protect root_offsets fail.

The var root_blkno only is checked at line 1853.

regards,
Ranier Vilela


Re: [PATCH] Fix Uninitialized scalar variable (UNINIT) (src/backend/access/heap/heapam_handler.c)

2020-08-25 Thread Alvaro Herrera
On 2020-Aug-25, Ranier Vilela wrote:

> If the variable hscan->rs_cblock is InvalidBlockNumber the test can
> protect root_offsets fail.

When does that happen?

> The var root_blkno only is checked at line 1853.

That's a different function.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Fix a couple of misuages of bms_num_members()

2020-08-25 Thread David Rowley
On Wed, 26 Aug 2020 at 01:18, Tomas Vondra  wrote:
>
> On Wed, Aug 26, 2020 at 12:51:37AM +1200, David Rowley wrote:
> >I noticed today there are a few places where we use bms_num_memebers()
> >where we really should be using bms_membership().  These are not bugs,
> >they're mostly just bad examples to leave laying around, at best, and
> >a small performance penalty, at worst.
> >
> >Unless there are any objections, I plan to push this to master only in
> >about 10 hours time.
> >
>
> Seems OK to me. Thanks.

Thanks for having a look.  Pushed.

David




Re: [PATCH] Fix Uninitialized scalar variable (UNINIT) (src/backend/access/heap/heapam_handler.c)

2020-08-25 Thread Ranier Vilela
Em ter., 25 de ago. de 2020 às 19:45, Alvaro Herrera <
alvhe...@2ndquadrant.com> escreveu:

> On 2020-Aug-25, Ranier Vilela wrote:
>
> > If the variable hscan->rs_cblock is InvalidBlockNumber the test can
> > protect root_offsets fail.
>
> When does that happen?
>
At first pass into the while loop?
hscan->rs_cblock is InvalidBlockNumber, what happens?


> > The var root_blkno only is checked at line 1853.
>
> That's a different function.
>
My mistake. Find editor.

regards,
Ranier Vilela


Re: Strange behavior with polygon and NaN

2020-08-25 Thread Bruce Momjian


I can confirm that this two-month old email report still produces
different results with indexes on/off in git master, which I don't think
is ever correct behavior.

---

On Wed, Jun 24, 2020 at 03:11:03PM -0700, Jesse Zhang wrote:
> Hi hackers,
> 
> While working with Chris Hajas on merging Postgres 12 with Greenplum
> Database we stumbled upon the following strange behavior in the geometry
> type polygon:
> 
> -- >8 
> 
> CREATE TEMP TABLE foo (p point);
> CREATE INDEX ON foo USING gist(p);
> 
> INSERT INTO foo VALUES ('0,0'), ('1,1'), ('NaN,NaN');
> 
> SELECT $q$
> SELECT * FROM foo WHERE p <@ polygon '(0,0), (0, 100), (100, 100), (100, 0)'
> $q$ AS qry \gset
> 
> BEGIN;
> SAVEPOINT yolo;
> SET LOCAL enable_seqscan TO off;
> :qry;
> 
> ROLLBACK TO SAVEPOINT yolo;
> SET LOCAL enable_indexscan TO off;
> SET LOCAL enable_bitmapscan TO off;
> :qry;
> 
> -- 8< 
> 
> If you run the above repro SQL in HEAD (and 12, and likely all older
> versions), you get the following output:
> 
> CREATE TABLE
> CREATE INDEX
> INSERT 0 3
> BEGIN
> SAVEPOINT
> SET
>p
> ---
>  (0,0)
>  (1,1)
> (2 rows)
> 
> ROLLBACK
> SET
> SET
>  p
> ---
>  (0,0)
>  (1,1)
>  (NaN,NaN)
> (3 rows)
> 
> 
> At first glance, you'd think this is the gist AM's bad, but on a second
> thought, something else is strange here. The following query returns
> true:
> 
> SELECT point '(NaN, NaN)' <@ polygon '(0,0), (0, 100), (100, 100), (100, 0)'
> 
> The above behavior of the "contained in" operator is surprising, and
> it's probably not what the GiST AM is expecting. I took a look at
> point_inside() in geo_ops.c, and it doesn't seem well equipped to handle
> NaN. Similary ill-equipped is dist_ppoly_internal() which underlies the
> distnace operator for polygon. It gives the following interesting
> output:
> 
> SELECT *, c <-> polygon '(0,0),(0,100),(100,100),(100,0)' as distance
> FROM (
>   SELECT circle(point(100 * i, 'NaN'), 50) AS c
>   FROM generate_series(-2, 4) i
> ) t(c)
> ORDER BY 2;
> 
> c| distance
> -+--
>  <(-200,NaN),50> |0
>  <(-100,NaN),50> |0
>  <(0,NaN),50>|0
>  <(100,NaN),50>  |0
>  <(200,NaN),50>  |  NaN
>  <(300,NaN),50>  |  NaN
>  <(400,NaN),50>  |  NaN
> (7 rows)
> 
> Should they all be NaN? Am I alone in thinking the index is right but
> the operators are wrong? Or should we call the indexes wrong here?
> 
> Cheers,
> Jesse and Chris
> 
> 

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: [PATCH] Fix Uninitialized scalar variable (UNINIT) (src/backend/access/heap/heapam_handler.c)

2020-08-25 Thread Ranier Vilela
Em ter., 25 de ago. de 2020 às 19:54, Ranier Vilela 
escreveu:

> Em ter., 25 de ago. de 2020 às 19:45, Alvaro Herrera <
> alvhe...@2ndquadrant.com> escreveu:
>
>> On 2020-Aug-25, Ranier Vilela wrote:
>>
>> > If the variable hscan->rs_cblock is InvalidBlockNumber the test can
>> > protect root_offsets fail.
>>
>> When does that happen?
>>
> At first pass into the while loop?
> hscan->rs_cblock is InvalidBlockNumber, what happens?
>
> Other things.
1. Even heap_get_root_tuples at line 1347, be called.
Does it fill all roots_offsets?
root_offsets[offnum - 1] is secure at this point (line 1641 or is trash)?

2. hscan->rs_cbuf is constant?
if (hscan->rs_cblock != root_blkno)
{
Page page = BufferGetPage(hscan->rs_cbuf);

LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
heap_get_root_tuples(page, root_offsets);
LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);

root_blkno = hscan->rs_cblock;
}

This can move outside while loop?
Am I wrong or hscan do not change?

regards,
Ranier Vilela


Re: [PATCH] Fix Uninitialized scalar variable (UNINIT) (src/backend/access/heap/heapam_handler.c)

2020-08-25 Thread Ranier Vilela
Em ter., 25 de ago. de 2020 às 20:13, Alvaro Herrera <
alvhe...@2ndquadrant.com> escreveu:

> On 2020-Aug-25, Ranier Vilela wrote:
>
> > Em ter., 25 de ago. de 2020 às 19:45, Alvaro Herrera <
> > alvhe...@2ndquadrant.com> escreveu:
> >
> > > On 2020-Aug-25, Ranier Vilela wrote:
> > >
> > > > If the variable hscan->rs_cblock is InvalidBlockNumber the test can
> > > > protect root_offsets fail.
> > >
> > > When does that happen?
> >
> > At first pass into the while loop?
> > hscan->rs_cblock is InvalidBlockNumber, what happens?
>
> No, it is set when the page is read.
>
And it is guaranteed that, rs_cblock is not InvalidBlockNumber when the
page is read?

Ranier Vilela


Re: [PATCH] Fix Uninitialized scalar variable (UNINIT) (src/backend/access/heap/heapam_handler.c)

2020-08-25 Thread Alvaro Herrera
On 2020-Aug-25, Ranier Vilela wrote:

> And it is guaranteed that, rs_cblock is not InvalidBlockNumber when the
> page is read?

It could be InvalidBlockNumber if sufficient neutrinos hit the memory
bank and happen to set all the bits in the block number.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Fix Uninitialized scalar variable (UNINIT) (src/backend/access/heap/heapam_handler.c)

2020-08-25 Thread Alvaro Herrera
On 2020-Aug-25, Ranier Vilela wrote:

> kkk, I think it's enough for me.
> I believe that PostgreSQL will not run on the ISS yet.

Actually, I believe there are some satellites that run Postgres -- not
100% sure about this.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Fix Uninitialized scalar variable (UNINIT) (src/backend/access/heap/heapam_handler.c)

2020-08-25 Thread Alvaro Herrera
On 2020-Aug-25, Ranier Vilela wrote:

> 1. Even heap_get_root_tuples at line 1347, be called.
> Does it fill all roots_offsets?

Yes -- read the comments there.

> 2. hscan->rs_cbuf is constant?
> if (hscan->rs_cblock != root_blkno)

It is the buffer that contains the given block.  Those two things move
in unison.  See heapgettup and heapgettup_pagemode.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Problems with the FSM, heap fillfactor, and temporal locality

2020-08-25 Thread Peter Geoghegan
On Tue, Aug 25, 2020 at 6:21 AM Stephen Frost  wrote:
> This all definitely sounds quite interesting and the idea to look at the
> XID to see if we're in the same transaction and therefore likely
> inserting a related tuple certainly makes some sense.  While I get that
> it might not specifically work with TPC-C, I'm wondering about if we
> could figure out how to make a multi-tuple INSERT use
> heap/table_multi_insert (which seems to only be used by COPY currently,
> and internally thanks to the recent work to use it for some catalog
> tables) and then consider the size of the entire set of tuples being
> INSERT'd when working to find a page, or deciding if we should extend
> the relation.

There are probably quite a variety of ways in which we can capture
locality, and I'm sure that I'm only scratching the surface right now.
I agree that table_multi_insert could definitely be one of them.

John said something about concentrating garbage in certain pages
up-thread. I also wonder if there is some visibility + freeze map
angle on this.

What I see with the benchmark is that the order_line table (the
largest table by quite some margin, and one that grows indefinitely)
does not make much use of the visibility map during VACUUM -- even
though it's the kind of table + workload that you'd hope and expect
would make good use of it if you were looking at it in a real world
situation. Each tuple is only inserted once and later updated once, so
what we really ought to do better. The logs show that
VACUUM/autovacuum dirties lots of pages, probably due to fragmentation
from free space management (though there are undoubtedly other
factors).

The best "locality orientated" reference guide to TPC-C that I've been
able to find is "A Modeling Study of the TPC-C Benchmark", which was
published in 1993 by NASA (shortly after the introduction of TPC-C).
You can get it from:

https://apps.dtic.mil/dtic/tr/fulltext/u2/a264793.pdf  (Unfortunately
this reproduction is a low quality photocopy -- ACM members can get a
clear copy.)

If you think about the TPC-C workload at a high level, and Postgres
internals stuff at a low level, and then run the benchmark, you'll
find various ways in which we don't live up to our potential. The big
picture is that the "physical database" is not close enough to the
"logical database", especially over time and after a lot of churn.
This causes problems all over the place, that look like nothing in
particular in profiles.

It's not that TPC-C is unsympathetic to Postgres in any of the usual
ways -- there are very few UPDATEs that affect indexed columns, which
are not really a factor at all. There are also no transactions that
run longer than 2 seconds (any more than ~50ms per execution is
exceptional, in fact). We already do a significant amount of the
necessary garbage collection opportunistically (by pruning) --
probably the vast majority, in fact. In particular, HOT pruning works
well, since fill factor has been tuned. It just doesn't work as well
as you'd hope, in that it cannot stem the tide of fragmentation. And
not just because of heapam's use of the FSM.

If we implemented a simple differential heap tuple compression scheme
within HOT chains (though not across HOT chains/unrelated tuples),
then we'd probably do much better -- we could keep the same logical
tuple on the same heap page much more often, maybe always. For
example, "stock" table is a major source of FPIs, and I bet that this
is greatly amplified by our failure to keep versions of the same
frequently updated tuple together. We can only fit ~25 stock tuples on
each heap page (with heap fill factor at 95, the BenchmarkSQL
default), so individual tuples are ~320 bytes (including tuple
header). If we found a way to store the changed columns for successor
tuples within a HOT chain, then we would do much better -- without
changing the HOT invariant (or anything else that's truly scary). If
our scheme worked by abusing the representation that we use for NULL
values in the successor/HOT tuples (which is not terribly space
efficient), then we could still store about 6 more versions of each
stock tuple on the same page -- the actual changed columns are
typically much much smaller than the unchanged columns. Our 23/24 byte
tuple header is usually small potatoes compared to storing unchanged
values several times.

As I said, the HOT optimization (and opportunistic pruning) already
work well with this benchmark. But I think that they'd work a lot
better if we could just temporarily absorb a few extra versions on the
heap page, so we have enough breathing room to prune before the page
"truly fills to capacity". It could help in roughly the same way that
deduplication now helps in nbtree indexes with "version churn".

I'm also reminded of the nbtree optimization I prototyped recently,
which more or less prevented all unique index bloat provided there is
no long running transaction:

https://postgr.es/m/CAH2-Wzm+maE3apHB8NOtmM=p-do65j2v5gzawcoeeu

Re: [PATCH] Fix Uninitialized scalar variable (UNINIT) (src/backend/access/heap/heapam_handler.c)

2020-08-25 Thread Ranier Vilela
Em ter., 25 de ago. de 2020 às 20:20, Alvaro Herrera <
alvhe...@2ndquadrant.com> escreveu:

> On 2020-Aug-25, Ranier Vilela wrote:
>
> > And it is guaranteed that, rs_cblock is not InvalidBlockNumber when the
> > page is read?
>
> It could be InvalidBlockNumber if sufficient neutrinos hit the memory
> bank and happen to set all the bits in the block number.
>
kkk, I think it's enough for me.
I believe that PostgreSQL will not run on the ISS yet.

Ranier Vilela


Re: [PATCH] Fix Uninitialized scalar variable (UNINIT) (src/backend/access/heap/heapam_handler.c)

2020-08-25 Thread Alvaro Herrera
On 2020-Aug-25, Ranier Vilela wrote:

> Em ter., 25 de ago. de 2020 às 19:45, Alvaro Herrera <
> alvhe...@2ndquadrant.com> escreveu:
> 
> > On 2020-Aug-25, Ranier Vilela wrote:
> >
> > > If the variable hscan->rs_cblock is InvalidBlockNumber the test can
> > > protect root_offsets fail.
> >
> > When does that happen?
>
> At first pass into the while loop?
> hscan->rs_cblock is InvalidBlockNumber, what happens?

No, it is set when the page is read.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-25 Thread David Rowley
On Wed, 26 Aug 2020 at 05:18, Andy Fan  wrote:
>
>
> On Tue, Aug 25, 2020 at 11:53 PM Andres Freund  wrote:
>>
>> On 2020-08-25 20:48:37 +1200, David Rowley wrote:
>> > Also, just in case anyone is misunderstanding this Andres' argument.
>> > It's entirely based on the performance impact of having an additional
>> > node.
>>
>> Not entirely, no. It's also just that it doesn't make sense to have two
>> nodes setting parameters that then half magically picked up by a special
>> subsidiary node type and used as a cache key. This is pseudo modularity,
>> not real modularity. And makes it harder to display useful information
>> in explain etc. And makes it harder to e.g. clear the cache in cases we
>> know that there's no further use of the current cache. At least without
>> piercing the abstraction veil.
>>
>>
>> > However, given the correct planner choice, there will never be
>> > a gross slowdown due to having the extra node.
>>
>> There'll be a significant reduction in increase in performance.
>
>
> If this is a key blocking factor for this topic, I'd like to do a simple hack
> to put the cache function into the subplan node, then do some tests to
> show the real difference.  But it is better to decide how much difference
> can be thought of as a big difference.  And  for education purposes,
> I'd like to understand where these differences come from.  For my
> current knowledge,  my basic idea is it saves some function calls?

If testing this, the cache hit ratio will be pretty key to the
results. You'd notice the overhead much less with a larger cache hit
ratio since you're not pulling the tuple from as deeply a nested node.
  I'm unsure how you'd determine what is a good cache hit ratio to
test it with. The lower the cache expected cache hit ratio, the higher
the cost of the Result Cache node will be, so the planner has less
chance of choosing to use it.   Maybe some experiments will find a
case where the planner picks a Result Cache plan with a low hit ratio
can be tested.

Say you find a case with the hit ratio of 90%.  Going by [1] I found
pulling a tuple through an additional node to cost about 12
nanoseconds on an intel 4712HQ CPU.  With a hit ratio of 90% we'll
only pull 10% of tuples through the additional node, so that's about
1.2 nanoseconds per tuple, or 1.2 milliseconds per million tuples. It
might become hard to measure above the noise. More costly inner scans
will have the planner choose to Result Cache with lower estimated hit
ratios, but in that case, pulling the tuple through the additional
node during a cache miss will be less noticeable due to the more
costly inner side of the join.

Likely you could test the overhead only in theory without going to the
trouble of adapting the code to make SubPlan and Nested Loop do the
caching internally.  If you just modify ExecResultCache() to have it
simply return its subnode, then measure the performance with and
without enable_resultcache, you should get an idea of the per-tuple
overhead of pulling the tuple through the additional node on your CPU.
After you know that number, you could put the code back to what the
patches have and then experiment with a number of cases to find a case
that chooses Result Cache and gets a low hit ratio.


For example, from the plan I used in the initial email on this thread:

 ->  Index Only Scan using lookup_a_idx on lookup l
(actual time=0.002..0.011 rows=100 loops=1000)
 Index Cond: (a = hk.thousand)
 Heap Fetches: 0
 Planning Time: 0.113 ms
 Execution Time: 1876.741 ms

I don't have the exact per tuple overhead on the machine I ran that
on, but it's an AMD 3990x CPU, so I'll guess the overhead is about 8
nanoseconds per tuple, given I found it to be 12 nanoseconds on a 2014
CPU  If that's right, then the overhead is something like 8 * 100
(rows) * 1000 (loops) = 80 nanoseconds = 0.8 milliseconds. If I
compare that to the execution time of the query, it's about 0.04%.

I imagine we'll need to find something with a much worse hit ratio so
we can actually measure the overhead.

David

[1] 
https://www.postgresql.org/message-id/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B%3DZRh-rxy9qxfPA5Gw%40mail.gmail.com




Re: [PATCH] Fix Uninitialized scalar variable (UNINIT) (src/backend/access/heap/heapam_handler.c)

2020-08-25 Thread Ranier Vilela
Em ter., 25 de ago. de 2020 às 20:29, Alvaro Herrera <
alvhe...@2ndquadrant.com> escreveu:

> On 2020-Aug-25, Ranier Vilela wrote:
>
> > kkk, I think it's enough for me.
> > I believe that PostgreSQL will not run on the ISS yet.
>
> Actually, I believe there are some satellites that run Postgres -- not
> 100% sure about this.
>
Yeah, ESA uses:
https://resources.2ndquadrant.com/european-space-agency-case-study-download

In fact, Postgres is to be congratulated.
Guess who didn't make any bug?
https://changochen.github.io/publication/squirrel_ccs2020.pdf
"Sqirrel has successfully detected 63 bugs from tested DBMS,including 51
bugs from SQLite, 7 from MySQL, and 5 from MariaDB."

Ranier Vilela


Re: some unused parameters cleanup

2020-08-25 Thread Fujii Masao




On 2020/08/26 2:50, Andreas Karlsson wrote:

On 8/25/20 7:42 PM, Bruce Momjian wrote:

On Tue, Aug 25, 2020 at 12:59:31PM -0400, Tom Lane wrote:

Peter Eisentraut  writes:

Here is a series of patches to remove some unused function parameters.
In each case, the need for them was removed by some other code changes
over time but the unusedness was not noticed.  I have included a
reference to when they became unused in each case.


For some of these, there's an argument for keeping the unused parameter
for consistency with sibling functions that do use it.  Not sure how
important that is, though.


I think if they are kept for that reason, we should document that so we
know not to revisit this issue for them.> 

+1

That way we can avoid new people discovering the same unused parameters and 
then submitting patches for them.


I agree that some parameters were kept for that reason,
but ISTM that also some were kept just accidentally.
For example, regarding unused parameter "encoding" that 0010 patch
tries to remove, commit f0d6f20278 got rid of the use of "encoding"
from generate_normalized_query() but ISTM that it just forgot to
remove that parameter.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-08-25 Thread Ashutosh Sharma
On Tue, Aug 25, 2020 at 11:51 PM Robert Haas  wrote:
>
> On Tue, Aug 25, 2020 at 8:17 AM Masahiko Sawada
>  wrote:
> > + 
> > + 
> > +  While performing surgery on a damaged relation, we must not be doing 
> > anything
> > +  else on that relation in parallel. This is to ensure that when we are
> > +  operating on a damaged tuple there is no other transaction trying to 
> > modify
> > +  that tuple.
> > + 
> > + 
> >
> > If we prefer to avoid concurrent operations on the target relation why
> > don't we use AccessExclusiveLock?
>
> I disagree with the content of the note. It's up to the user whether
> to perform any concurrent operations on the target relation, and in
> many cases it would be fine to do so. Users who can afford to take the
> table off-line to repair the problem don't really need this tool in
> the first place.
>

The only reason I added this note was to ensure that we do not revive
the tuple that is deleted but not yet vacuumed. There is one
corner-case scenario as reported by you in - [1] where you have
explained a scenario under which vacuum can report "found xmin ...
from before relfrozenxid ..." sort of error for the  deleted tuples.
And as per the explanation provided there, it can happen when there
are multiple transactions operating on the same tuple. However, I
think we can take care of this scenario by doing some code changes in
heap_force_freeze to identify the deleted tuples and maybe skip such
tuples. So, yeah, I will do the code changes for handling this and
remove the note added in the documentation. Thank you Robert and
Masahiko-san for pointing this out.

[1] - 
https://www.postgresql.org/message-id/CA%2BTgmobfJ8CkabKJZ-1FGfvbSz%2Bb8bBX807Y6hHEtVfzVe%2Bg6A%40mail.gmail.com

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com




Re: Creating a function for exposing memory usage of backend process

2020-08-25 Thread Fujii Masao




On 2020/08/25 11:39, Fujii Masao wrote:



On 2020/08/24 21:56, torikoshia wrote:

On 2020-08-24 13:13, Fujii Masao wrote:

On 2020/08/24 13:01, torikoshia wrote:

On 2020-08-22 21:18, Michael Paquier wrote:

Thanks for reviewing!


On Fri, Aug 21, 2020 at 11:27:06PM +0900, torikoshia wrote:

OK. Added a regression test on sysviews.sql.
(0001-Added-a-regression-test-for-pg_backend_memory_contex.patch)

Fujii-san gave us an example, but I added more simple one considering
the simplicity of other tests on that.


What you have sent in 0001 looks fine to me.  A small test is much
better than nothing.


+1

But as I proposed upthread, what about a bit complicated test as follows,
e.g., to confirm that the internal logic for level works expectedly?

 SELECT name, ident, parent, level, total_bytes >= free_bytes FROM
pg_backend_memory_contexts WHERE level = 0;


OK!
Attached an updated patch.


Thanks for updating the patch! Looks good to me.
Barring any objection, I will commit this patch.









Added a patch for relocating the codes to mcxtfuncs.c.
(patches/0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch)


Thanks for the patch! Looks good to me.
Barring any objection, I will commit this patch at first.




The same code is moved around line-by-line.


Of course, this restriction makes pg_backend_memory_contexts hard to use
when the user of the target session is not granted pg_monitor because the
scope of this view is session local.

In this case, I imagine additional operations something like temporarily
granting pg_monitor to that user.


Hmm.  I am not completely sure either that pg_monitor is the best fit
here, because this view provides information about a bunch of internal
structures.  Something that could easily be done though is to revoke
the access from public, and then users could just set up GRANT
permissions post-initdb, with pg_monitor as one possible choice. This
is the safest path by default, and this stuff is of a caliber similar
to pg_shmem_allocations in terms of internal contents.


I think this is a better way than what I did in
0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch.


You mean 0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch?


Oops, I meant 0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch.


This patch also looks good to me. Thanks!


I pushed the proposed three patches. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: [PATCH] Resource leaks (src/backend/libpq/hba.c)

2020-08-25 Thread Kyotaro Horiguchi
At Tue, 25 Aug 2020 10:20:07 -0300, Ranier Vilela  wrote 
in 
> Hi Tom,
> 
> Per Coverity.
> 
> The function parse_hba_auth_op at (src/backend/libpq/hba.c) allows resource
> leaks when called
> by the function parse_hba_line, with parameters LOG and DEBUG3 levels.
> The function SplitGUCList (src/bin/pg_dump/dumputils.c) allows even
> returning FALSE,
> that namelist list is not empty and as memory allocated by pg_malloc.

As you know, there are two implementations of the function. One that
uses pg_malloc is used in pg_dump and the returned char *namelist is
always pg_free'd after use. The other that returns a pg_list, and the
returned list is reclaimed by MemoryContextDelete at callers
(concretely load_hba and fill_hba_view).  Indeed they share the same
name but have different signatures so the two are statically
distinguishable but Coverity seems failing to do so. You may need to
avoid feeding the whole source tree to Coverity at once.

Anyway this is a very common style in the PostgreSQL code so I
recommend to verify the outcome from such tools against the actual
code.

> The simplest solution is free namelist, even when calling ereport, why the
> level can be
> LOG or DEBUG3.

So we don't need to do anything there. Rather we can remove the
existing list_free(parsed_servers) in parse_hba_auth_opt.

> regards,
> Ranier Vilela
> 
> PS. Are two SplitGUCList in codebase.
> 1. SplitGUCList (src/bin/pg_dump/dumputils.c)
> 2. SplitGUCList (src/backend/utils/adt/varlena.c)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [PATCH] Resource leaks (src/backend/libpq/hba.c)

2020-08-25 Thread Ranier Vilela
Em ter., 25 de ago. de 2020 às 23:02, Kyotaro Horiguchi <
horikyota@gmail.com> escreveu:

> At Tue, 25 Aug 2020 10:20:07 -0300, Ranier Vilela 
> wrote in
> > Hi Tom,
> >
> > Per Coverity.
> >
> > The function parse_hba_auth_op at (src/backend/libpq/hba.c) allows
> resource
> > leaks when called
> > by the function parse_hba_line, with parameters LOG and DEBUG3 levels.
> > The function SplitGUCList (src/bin/pg_dump/dumputils.c) allows even
> > returning FALSE,
> > that namelist list is not empty and as memory allocated by pg_malloc.
>
> As you know, there are two implementations of the function. One that
> uses pg_malloc is used in pg_dump and the returned char *namelist is
> always pg_free'd after use. The other that returns a pg_list, and the
> returned list is reclaimed by MemoryContextDelete at callers
> (concretely load_hba and fill_hba_view).  Indeed they share the same
> name but have different signatures so the two are statically
> distinguishable but Coverity seems failing to do so. You may need to
> avoid feeding the whole source tree to Coverity at once.
>
Yes, thanks for the hit.

>
> Anyway this is a very common style in the PostgreSQL code so I
> recommend to verify the outcome from such tools against the actual
> code.
>
 Ok.


> > The simplest solution is free namelist, even when calling ereport, why
> the
> > level can be
> > LOG or DEBUG3.
>
> So we don't need to do anything there. Rather we can remove the
> existing list_free(parsed_servers) in parse_hba_auth_opt.
>
It would be good, the call helped to confuse.

Very thanks, for the explanation.

Ranier Vilela


Re: "cert" + clientcert=verify-ca in pg_hba.conf?

2020-08-25 Thread Kyotaro Horiguchi
At Tue, 25 Aug 2020 10:04:44 -0400, Bruce Momjian  wrote in 
> On Tue, Aug 25, 2020 at 03:53:20PM +0900, Kyotaro Horiguchi wrote:
> > At Mon, 24 Aug 2020 23:04:51 -0400, Bruce Momjian  wrote 
> > in 
> > > > > I don't see "no-verify" mentioned anywhere in our docs.
> > > > 
> > > > no-verify itself is mentioned here.
> > > > 
> > > > https://www.postgresql.org/docs/13/ssl-tcp.html#SSL-CLIENT-CERTIFICATES
> > > 
> > > Oh, I see it now, thanks.  Do you have any idea what this part of the
> > > docs means?
> > > 
> > >   When clientcert is not specified or is set to
> > >   no-verify, the server will still verify any presented
> > >   client certificates against its CA file, if one is configured —
> > >   but it will not insist that a client certificate be presented.
> > 
> > Ah.. Indeed.
> > 
> > Even if clientcert is not set or set to no-verify, it checks client
> > certificate against the CA if any. If verify-ca, client certificate
> > must be provided. As the result, no-verify actually fails if client
> > had a certificate that is not backed by the CA.
> 
> I think there are a few problems here.  In the docs, it says "will still
> verify", but it doesn't say if it verifies the CA, or the CA _and_ the
> CN/username.

It verifies only CA.

> Second, since it is optional, what value does it have?
> 
> > > Why is this useful?
> > 
> > I agree, but there seems to be an implementation reason for the
> > behavior. To identify an hba-line, some connection parameters like
> > user name and others sent over a connection is required.  Thus the
> > clientcert option in the to-be-identified hba-line is unknown prior to
> > the time SSL connection is made. So the documentation might need
> > amendment. Roughly something like the following?
> 
> Well, I realize internally we need a way to indicate clientcert is not
> used, but why do we bother exposing that to the user as a named option?

Because we think we need any named value for every alternatives
including the default value?

> And you are right that the option name 'no-verify' is wrong since it
> will verify the CA if it exists, so it more like 'optionally-verify',
> which seems useless from a user interface perspective.
> 
> I guess the behavior of no-verify matches our client-side
> sslmode=prefer, but at least that has the value of using SSL if
> available, which prevents user-visible network traffic, but doesn't
> force it, but I am not sure what the value of optional certificate
> verification is, since verification is all it does.  I guess it should
> be called "prefer-verify".

The point of no-verify is to allow the absence of client
certificate. It is similar to "prefer" in a sense that it allows the
absence of availability of an SSL connection. (In a similar way to
"prefer", we could "fall back" to "no client cert" SSL connection
after verification failure but I think it's not worth doing.)

"prefer-verify" seems right in that sense. But I'm not sure we may
break backward compatibility for the reason.

> > ===
> > When clientcert is not specified or is set
> > tono-verify, clients can connect to server without
> > having a client certificate.
> > 
> > Note: Regardless of the setting of clientcert,
> > connection can end with failure if a client certificate that cannot be
> > verified by the server is stored in the ~/.postgresql directory.
> > ===
> > 
> > By the way, the following table line might need to be changed?
> > 
> > libpq-ssl.html:
> > 
> > >  ~/.postgresql/postgresql.crt
> > >  client certificate
> > -  requested by server
> > 
> > The file is actually not requested by server, client just pushes to
> > server if any, unconditionally.
> > 
> > +  sent to server
> 
> I have just applied this change to all branches, since it is an
> independent fix.  Thanks.

Thanks.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: "cert" + clientcert=verify-ca in pg_hba.conf?

2020-08-25 Thread Bruce Momjian
On Wed, Aug 26, 2020 at 11:41:39AM +0900, Kyotaro Horiguchi wrote:
> At Tue, 25 Aug 2020 10:04:44 -0400, Bruce Momjian  wrote in 
> > I think there are a few problems here.  In the docs, it says "will still
> > verify", but it doesn't say if it verifies the CA, or the CA _and_ the
> > CN/username.
> 
> It verifies only CA.

OK, that will need to be clarified.

> > Second, since it is optional, what value does it have?
> > 
> > > > Why is this useful?
> > > 
> > > I agree, but there seems to be an implementation reason for the
> > > behavior. To identify an hba-line, some connection parameters like
> > > user name and others sent over a connection is required.  Thus the
> > > clientcert option in the to-be-identified hba-line is unknown prior to
> > > the time SSL connection is made. So the documentation might need
> > > amendment. Roughly something like the following?
> > 
> > Well, I realize internally we need a way to indicate clientcert is not
> > used, but why do we bother exposing that to the user as a named option?
> 
> Because we think we need any named value for every alternatives
> including the default value?

Well, not putting clientcert at all gives the default behavior, so why
have clientcert=no-verify?

> > And you are right that the option name 'no-verify' is wrong since it
> > will verify the CA if it exists, so it more like 'optionally-verify',
> > which seems useless from a user interface perspective.
> > 
> > I guess the behavior of no-verify matches our client-side
> > sslmode=prefer, but at least that has the value of using SSL if
> > available, which prevents user-visible network traffic, but doesn't
> > force it, but I am not sure what the value of optional certificate
> > verification is, since verification is all it does.  I guess it should
> > be called "prefer-verify".
> 
> The point of no-verify is to allow the absence of client
> certificate. It is similar to "prefer" in a sense that it allows the
> absence of availability of an SSL connection. (In a similar way to
> "prefer", we could "fall back" to "no client cert" SSL connection
> after verification failure but I think it's not worth doing.)

Well, sslmode=prefer gives encryption without identification. 
clientcert=no-verify has no value because it is just an optional CA
check that has no value because optional authentication is useless.  It
is like saying you can type in the password if you want, and we will
check it, or you can just not type in the password.

> "prefer-verify" seems right in that sense. But I'm not sure we may
> break backward compatibility for the reason.

True, but right now it is inaccurate so I think it just need to be fixed
or removed and documented in the PG 14 release notes.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: display offset along with block number in vacuum errors

2020-08-25 Thread Mahendra Singh Thalor
On Thu, 20 Aug 2020 at 17:42, Amit Kapila  wrote:
>
> On Thu, Aug 20, 2020 at 12:32 PM Amit Kapila  wrote:
> >
> > On Thu, Aug 20, 2020 at 12:18 PM Masahiko Sawada
> >  wrote:
> > >
> > > On Thu, 20 Aug 2020 at 14:01, Amit Kapila  wrote:
> > > >
> > > > On Wed, Aug 19, 2020 at 12:54 PM Masahiko Sawada
> > > >  wrote:
> > > >
> > > > Here, we can notice that for the index, we are getting context
> > > > information but not for the heap. The reason is that in
> > > > vacuum_error_callback, we are not printing additional information for
> > > > phases VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP
> > > > when block number is invalid. If we want to cover the 'info' messages
> > > > then won't it be better if we print a message in those phases even
> > > > block number is invalid (something like 'while scanning relation
> > > > \"%s.%s\"")
> > >
> > > Yeah, there is an inconsistency. I agree to print the message even
> > > when the block number is invalid.
> > >
> >
> > Okay, I will update this and send this patch and rebased patch to
> > display offsets later today or tomorrow.
> >
>
> Attached are both the patches. The first one is to improve existing
> error context information, so I think we should back-patch to 13. The
> second one is to add additional vacuum error context information, so
> that is for only HEAD. Does that make sense? Also, let me know if you
> have any more comments.

Thanks Amit for updating the patch. All changes in v7-02 look fine to me.

Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com
>
> --
> With Regards,
> Amit Kapila.



-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com




Re: Handing off SLRU fsyncs to the checkpointer

2020-08-25 Thread Thomas Munro
On Tue, Aug 25, 2020 at 9:16 PM Jakub Wartak  wrote:
> I just wanted to help testing this patch (defer SLRU fsyncs during recovery) 
> and also faster compactify_tuples() patch [2] as both are related to the WAL 
> recovery performance in which I'm interested in. This is my first message to 
> this mailing group so please let me know also if I should adjust testing 
> style or formatting.

Hi Jakub,

Thanks very much for these results!

> - Handing SLRU sync work over to the checkpointer: in my testing it 
> accelerates WAL recovery performance on slower / higer latency storage by ~20%

Wow.  Those fsyncs must have had fairly high latency (presumably due
to queuing behind other write back activity).

> - Faster sort in compactify_tuples(): in my testing it accelerates WAL 
> recovery performance for HOT updates also by ~20%

Nice.

> In the last final case the top profile is as follows related still to the 
> sorting but as I understand in much optimal way:
>
> 26.68%  postgres  postgres[.] qsort_itemoff
> ---qsort_itemoff
>|--14.17%--qsort_itemoff
>|  |--10.96%--compactify_tuples
>|  |  PageRepairFragmentation
>|  |  heap2_redo
>|  |  StartupXLOG
>|   --3.21%--qsort_itemoff
>|  --3.10%--compactify_tuples
>|PageRepairFragmentation
>|heap2_redo
>|StartupXLOG
> --12.51%--compactify_tuples
>   PageRepairFragmentation
>   heap2_redo
>   StartupXLOG

I wonder if there is something higher level that could be done to
reduce the amount of compaction work required in the first place, but
in the meantime I'm very happy if we can improve the situation so much
with such a microscopic improvement that might eventually benefit
other sorting stuff...

>  8.38%  postgres  libc-2.17.so[.] __memmove_ssse3_back
> ---__memmove_ssse3_back
>compactify_tuples
>PageRepairFragmentation
>heap2_redo

Hmm, I wonder if this bit could go teensy bit faster by moving as many
adjacent tuples as you can in one go rather than moving them one at a
time...

> The append-only bottleneck appears to be limited by syscalls/s due to small 
> block size even with everything in FS cache (but not in shared buffers, 
> please compare with TEST1 as there was no such bottleneck at all):
>
> 29.62%  postgres  [kernel.kallsyms]   [k] copy_user_enhanced_fast_string
> ---copy_user_enhanced_fast_string
>|--17.98%--copyin
> [..]
>|  __pwrite_nocancel
>|  FileWrite
>|  mdwrite
>|  FlushBuffer
>|  ReadBuffer_common
>|  ReadBufferWithoutRelcache
>|  XLogReadBufferExtended
>|  XLogReadBufferForRedoExtended
>|   --17.57%--btree_xlog_insert

To move these writes out of recovery's way, we should probably just
run the bgwriter process during crash recovery.  I'm going to look
into that.

The other thing is of course the checkpointer process, and our
end-of-recovery checkpoint.  I was going to suggest it should be
optional and not done by the recovery process itself, which is why
some earlier numbers I shared didn't include the end-of-recovery
checkpoint, but then I realised it complicated the numbers for this
little patch and, anyway, it'd be a good idea to open that can of
worms separately...

>| btree_redo
>| StartupXLOG
>|
> --11.64%--copyout
> [..]
>   __pread_nocancel
>--11.44%--FileRead
>  mdread
>  ReadBuffer_common
>  ReadBufferWithoutRelcache
>  XLogReadBufferExtended
>  XLogReadBufferForRedoExtended

For these reads, the solution should be WAL prefetching, but the patch
I shared for that (and will be updating soon) is just one piece of the
puzzle, and as it stands it actually *increases* the number of
syscalls by adding some posix_fadvise() calls, so ... erm, for an
all-in-kernel-cache-already workload like what you profiled there it
can only make things worse on that front.  But... when combined with
Andres's work-in-progress AIO stuff, a whole bunch of reads can be
submitted with a single system call ahead of time and then the results
are delivered directly into ou

Re: some unused parameters cleanup

2020-08-25 Thread Peter Eisentraut

On 2020-08-25 18:59, Tom Lane wrote:

Peter Eisentraut  writes:

Here is a series of patches to remove some unused function parameters.
In each case, the need for them was removed by some other code changes
over time but the unusedness was not noticed.  I have included a
reference to when they became unused in each case.


For some of these, there's an argument for keeping the unused parameter
for consistency with sibling functions that do use it.  Not sure how
important that is, though.


I had meant to exclude cases like this from this patch set.  If you see 
a case like this in *this* patch set, please point it out.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Handing off SLRU fsyncs to the checkpointer

2020-08-25 Thread Andres Freund
Hi,

On 2020-08-26 15:58:14 +1200, Thomas Munro wrote:
> > --12.51%--compactify_tuples
> >   PageRepairFragmentation
> >   heap2_redo
> >   StartupXLOG
> 
> I wonder if there is something higher level that could be done to
> reduce the amount of compaction work required in the first place, but
> in the meantime I'm very happy if we can improve the situation so much
> with such a microscopic improvement that might eventually benefit
> other sorting stuff...

Another approach could be to not perform any sorting during recovery,
instead including enough information in the WAL record to avoid doing a
full blown PageRepairFragmentation during recovery.

Greetings,

Andres Freund




Re: display offset along with block number in vacuum errors

2020-08-25 Thread Amit Kapila
On Wed, Aug 26, 2020 at 8:54 AM Mahendra Singh Thalor
 wrote:
>
> On Thu, 20 Aug 2020 at 17:42, Amit Kapila  wrote:
> >
> > Attached are both the patches. The first one is to improve existing
> > error context information, so I think we should back-patch to 13. The
> > second one is to add additional vacuum error context information, so
> > that is for only HEAD. Does that make sense? Also, let me know if you
> > have any more comments.
>
> Thanks Amit for updating the patch. All changes in v7-02 look fine to me.
>

Okay, pushed v7-02 as well. I have marked the entry for this in CF as committed.

-- 
With Regards,
Amit Kapila.




Re: display offset along with block number in vacuum errors

2020-08-25 Thread Masahiko Sawada
On Wed, 26 Aug 2020 at 15:07, Amit Kapila  wrote:
>
> On Wed, Aug 26, 2020 at 8:54 AM Mahendra Singh Thalor
>  wrote:
> >
> > On Thu, 20 Aug 2020 at 17:42, Amit Kapila  wrote:
> > >
> > > Attached are both the patches. The first one is to improve existing
> > > error context information, so I think we should back-patch to 13. The
> > > second one is to add additional vacuum error context information, so
> > > that is for only HEAD. Does that make sense? Also, let me know if you
> > > have any more comments.
> >
> > Thanks Amit for updating the patch. All changes in v7-02 look fine to me.
> >
>
> Okay, pushed v7-02 as well. I have marked the entry for this in CF as 
> committed.

Thank you!


Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services