date:20200115

On Thu, Jan 16, 2020 at 04:27:27PM +0900, Michael Paquier wrote:
> While looking at the code, I think that we could refactor things a bit
> for raw_wait_event, wait_event_type and wait_event which has some
> duplicated code for backend and auxiliary processes.  What about
> filling in the wait event data after fetching the PGPROC entry, and
> also fill in leader_pid for auxiliary processes.  This does not matter
> now, perhaps it will never matter (or not), but that would make the
> code much more consistent.

And actually, the way you are looking at the leader's PID is visibly
incorrect and inconsistent because the patch takes no shared LWLock on
the leader using LockHashPartitionLockByProc() followed by
LWLockAcquire(), no?  That's incorrect because it could be perfectly
possible to crash with this code between the moment you check if 
lockGroupLeader is NULL and the moment you look at
lockGroupLeader->pid if a process is being stopped in-between and
removes itself from a lock group in ProcKill().  That's also
inconsistent because it could be perfectly possible to finish with an 
incorrect view of the data while scanning for all the backend entries,
like a leader set to NULL with workers pointing to the leader for
example, or even workers marked incorrectly as NULL.  The second one
may not be a problem, but the first one could be confusing.
--
Michael


signature.asc
Description: PGP signature

Re: Expose lock group leader pid in pg_stat_activity

On Fri, Dec 27, 2019 at 10:15:33AM +0100, Julien Rouhaud wrote:
> I think that not using "parallel" to name this field will help to
> avoid confusion if the lock group infrastructure is eventually used
> for something else, but that's only true if indeed we explain what a
> lock group is.

As you already pointed out, src/backend/storage/lmgr/README includes a
full description of this stuff under the section "Group Locking".  So
I agree that the patch ought to document your new field in a much
better way, without mentioning the term "group locking" that's even
better to not confuse the reader because this term not present in the
docs at all.

> The leader_pid is NULL for processes not involved in parallel query.
> When a process wants to cooperate with parallel workers, it becomes a
> lock group leader, which means that this field will be valued to its
> own pid. When a parallel worker starts up, this field will be valued
> with the leader pid.

The first sentence is good to have.  Now instead of "lock group
leader", I think that we had better use "parallel group leader" as in
other parts of the docs (see wait events for example).  Then we just
need to say that if leader_pid has the same value as
pg_stat_activity.pid, then we have a group leader.  If not, then it is
a parallel worker process initially spawned by the leader whose PID is
leader_pid (when executing Gather, Gather Merge, soon-to-be parallel
vacuum or for a parallel btree build, but this does not need a mention
in the docs).  There could be an argument as well to have leader_pid
set to NULL for a leader, but that would be inconsistent with what
the PGPROC entry reports for the backend.

While looking at the code, I think that we could refactor things a bit
for raw_wait_event, wait_event_type and wait_event which has some
duplicated code for backend and auxiliary processes.  What about
filling in the wait event data after fetching the PGPROC entry, and
also fill in leader_pid for auxiliary processes.  This does not matter
now, perhaps it will never matter (or not), but that would make the
code much more consistent.
--
Michael

signature.asc
Description: PGP signature

Re: [Proposal] Global temporary tables

2020-01-15 Thread Konstantin Knizhnik





On 15.01.2020 16:10, 曾文旌(义从) wrote:



I do not see principle difference here with scenario when 50 sessions create 
(local) temp table,
populate it with GB of data and create index for it.

I think the problem is that when one session completes the creation of the 
index on GTT,
it will trigger the other sessions build own local index of GTT in a 
centralized time.
This will consume a lot of hardware resources (cpu io memory) in a short time,
and even the database service becomes slow, because 50 sessions are building 
index.
I think this is not what we expected.



First of all creating index for GTT ni one session doesn't immediately 
initiate building indexes in all other sessions.
Indexes are built on demand. If session is not using this GTT any more, 
then index for it will not build at all.
And if GTT is really are actively used by all sessions, then building 
index and using it for constructing optimal execution plan is better,

then continue to  use sequential scan and read all GTT data from the disk.

And as I already mentioned I do not see some principle difference in 
aspect of resource consumptions comparing with current usage of local 
temp tables.
If we have have many sessions, each creating temp table, populating it 
with data and building index for it, then we will
observe the same CPU utilization and memory resource consumption as in 
case of using GTT and creating index for it.


Sorry, but I still not convinced by your and Tomas arguments.
Yes, building GTT index may cause high memory consumption 
(maintenance_work_mem * n_backends).
But such consumption can be  observed also without GTT and it has to be 
taken in account when choosing value for maintenance_work_mem.
But from my point of view it is much more important to make behavior of 
GTT as much compatible with normal tables as possible.
Also from database administration point of view, necessity to restart 
sessions to make then use new indexes seems to be very strange and 
inconvenient.
Alternatively DBA can address the problem with high memory consumption 
by adjusting maintenance_work_mem, so this solution is more flexible.




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Append with naive multiplexing of FDWs

Thank you very much for the testing of the patch, Ahsan!

At Wed, 15 Jan 2020 15:41:04 -0500, Bruce Momjian  wrote in 
> On Tue, Jan 14, 2020 at 02:37:48PM +0500, Ahsan Hadi wrote:
> > Summary
> > The patch is pretty good, it works well when there were little data back to
> > the parent node. The patch doesn’t provide parallel FDW scan, it ensures
> > that child nodes can send data to parent in parallel but the parent can only
> > sequennly process the data from data nodes.

"Parallel scan" at the moment means multiple workers fetch unique
blocks from *one* table in an arbitrated manner. In this sense
"parallel FDW scan" means multiple local workers fetch unique bundles
of tuples from *one* foreign table, which means it is running on a
single session.  That doesn't offer an advantage.

If parallel query processing worked in worker-per-table mode,
especially on partitioned tables, maybe the current FDW would work
without much of modification. But I believe asynchronous append on
foreign tables on a single process is far resource-effective and
moderately faster than parallel append.

> > Providing there is no performance degrdation for non FDW append queries,
> > I would recomend to consider this patch as an interim soluton while we are
> > waiting for parallel FDW scan.
> 
> Wow, these are very impressive results!

Thanks.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: SlabCheck leaks memory into TopMemoryContext

Hi,

On 2020-01-16 01:25:00 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2020-01-16 00:09:53 -0500, Tom Lane wrote:
> >> It's basically assuming that the memory management mechanism is sane,
> >> which makes the whole thing fundamentally circular, even if it's
> >> relying on some other context to be sane.  Is there a way to do the
> >> checking without extra allocations?
> 
> > Probably not easily.
> 
> In the AllocSet code, we don't hesitate to expend extra space per-chunk
> for debug support:
> 
> typedef struct AllocChunkData
> {
> ...
> #ifdef MEMORY_CONTEXT_CHECKING
>   Sizerequested_size;
> #endif
> ...
> 
> I don't see a pressing reason why SlabContext couldn't do something
> similar, either per-chunk or per-context, whatever makes sense.

Well, what I suggested upthread, was to just have two globals like

int slabcheck_freechunks_size;
bool *slabcheck_freechunks;

and realloc that to the larger size in SlabContextCreate() if necessary,
based on the computed chunksPerBlock?  I thought you were asking whether
any additional memory could just be avoided...

Greetings,

Andres Freund

Re: SlabCheck leaks memory into TopMemoryContext

Andres Freund  writes:
> On 2020-01-16 00:09:53 -0500, Tom Lane wrote:
>> It's basically assuming that the memory management mechanism is sane,
>> which makes the whole thing fundamentally circular, even if it's
>> relying on some other context to be sane.  Is there a way to do the
>> checking without extra allocations?

> Probably not easily.

In the AllocSet code, we don't hesitate to expend extra space per-chunk
for debug support:

typedef struct AllocChunkData
{
...
#ifdef MEMORY_CONTEXT_CHECKING
Sizerequested_size;
#endif
...

I don't see a pressing reason why SlabContext couldn't do something
similar, either per-chunk or per-context, whatever makes sense.

regards, tom lane

Re: SlabCheck leaks memory into TopMemoryContext

Hi,

On 2020-01-16 00:09:53 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > I just noticed that having a slab context around in an assertion build
> > leads to performance degrading and memory usage going up. A bit of
> > poking revealed that SlabCheck() doesn't free the freechunks it
> > allocates.
>
> > It's on its own obviously trivial to fix.
>
> It seems like having a context check function do new allocations
> is something to be avoided in the first place.

Yea, that's why I was wondering about doing the allocation during
context creation, for the largest size necessary of any context:

/* bitmap of free chunks on a block */
freechunks = palloc(slab->chunksPerBlock * sizeof(bool));

or at the very least using malloc(), rather than another context.

> It's basically assuming that the memory management mechanism is sane,
> which makes the whole thing fundamentally circular, even if it's
> relying on some other context to be sane.  Is there a way to do the
> checking without extra allocations?

Probably not easily.

Was wondering if the bitmap could be made more dense, and allocated on
the stack. It could actually using one bit for each tracked chunk,
instead of one byte. Would have to put in a clear lower limit of the
allowed chunk size, in relation to the block size, however, to stay in a
reasonable range.

Greetings,

Andres Freund

Re: relcache sometimes initially ignores has_generated_stored

At Wed, 15 Jan 2020 10:11:05 -0800, Andres Freund  wrote in 
> Hi,
> 
> I think it's probably not relevant, but it confused me for a moment
> that RelationBuildTupleDesc() might set constr->has_generated_stored to
> true, but then throw away the constraint at the end, because nothing
> matches the
>   /*
>* Set up constraint/default info
>*/
>   if (has_not_null || ndef > 0 ||
>   attrmiss || relation->rd_rel->relchecks)
> test, i.e. there are no defaults.

It was as follows before 16828d5c02.

-   if (constr->has_not_null || ndef > 0 ||relation->rd_rel->relchecks)

At that time TupleConstr has only members defval, check and
has_not_null other than subsidiary members.  The condition apparently
checked all of the members.

Then the commit adds attrmiss to the condition since the corresponding
member to TupleConstr.

+   if (constr->has_not_null || ndef > 0 ||
+   attrmiss || relation->rd_rel->relchecks)

Later fc22b6623b introduced has_generated_stored to TupleConstr but
didn't add the corresponding check.

> A quick assert confirms we do indeed pfree() constr in cases where
> has_generated_stored == true.
> 
> I suspect that's just an intermediate catalog, however, e.g. when
> DefineRelation() does
> heap_create_with_catalog();
> CommandCounterIncrement();
> relation_open();
> AddRelationNewConstraints().
> 
> It does still strike me as not great that we can get a different
> relcache entry, even if transient, depending on whether there are other
> reasons to create a TupleConstr. Say a NOT NULL column.
> 
> I'm inclined to think we should just also check has_generated_stored in
> the if quoted above?

I agree to that.  We could have a local boolean "has_any_constraint"
to merge them but it would be an overkill.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: remove some STATUS_* symbols

On Sat, Jan 11, 2020 at 08:14:17AM +0100, Peter Eisentraut wrote:
> OK, pushed as it was then.

Thanks, that looks fine.  I am still not sure whether the second patch
adding an enum via ProcWaitStatus improves the code readability
though, so my take would be to discard it for now.  Perhaps others
think differently, I don't know.
--
Michael


signature.asc
Description: PGP signature

Re: TRUNCATE on foreign tables

On Wed, Jan 15, 2020 at 11:33:07PM +0900, Kohei KaiGai wrote:
> 2020年1月15日(水) 17:11 Michael Paquier :
>> I have done a quick read through the patch.  You have modified the
>> patch to pass down to the callback a list of relation OIDs to execute
>> one command for all, and there are tests for FKs so that coverage
>> looks fine.
>>
>> Regression tests are failing with this patch:
>>  -- TRUNCATE doesn't work on foreign tables, either directly or
>>  recursively
>>  TRUNCATE ft2;  -- ERROR
>> -ERROR:  "ft2" is not a table
>> +ERROR:  foreign-data wrapper "dummy" has no handler
>> You visibly just need to update the output because no handlers are
>> available for truncate in this case.
>>
> What error message is better in this case? It does not print "ft2" anywhere,
> so user may not notice that "ft2" is the source of the error.
> How about 'foreign table "ft2" does not support truncate' ?

It sounds to me that this message is kind of right.  This FDW "dummy"
does not have any FDW handler at all, so it complains about it.
Having no support for TRUNCATE is something that may happen after
that.  Actually, this error message from your patch used for a FDW
which has a  handler but no TRUNCATE support could be reworked:
+   if (!fdwroutine->ExecForeignTruncate)
+   ereport(ERROR,
+   (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+errmsg("\"%s\" is not a supported foreign table",
+   relname)));
Something like "Foreign-data wrapper \"%s\" does not support
TRUNCATE"?

>> + frels_list is a list of foreign tables that are
>> + connected to a particular foreign server; thus, these foreign tables
>> + should have identical foreign server ID
>> The list is built by the backend code, so that has to be true.
>>
>> +   foreach (lc, frels_list)
>> +   {
>> +   Relationfrel = lfirst(lc);
>> +   Oid frel_oid = RelationGetRelid(frel);
>> +
>> +   if (server_id == GetForeignServerIdByRelId(frel_oid))
>> +   {
>> +   frels_list = foreach_delete_current(frels_list, lc);
>> +   curr_frels = lappend(curr_frels, frel);
>> +   }
>> +   }
>> Wouldn't it be better to fill in a hash table for each server with a
>> list of relations?
>
> It's just a matter of preference. A temporary hash-table with server-id
> and list of foreign-tables is an idea. Let me try to revise.

Thanks.  It would not matter much for relations without inheritance
children, but if truncating a partition tree with many foreign tables
using various FDWs that could matter performance-wise.
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] WAL logging problem in 9.4.3?

All the known defects are fixed.

At Wed, 25 Dec 2019 16:15:21 -0800, Noah Misch  wrote in 
> === Defect 3: storage.c checks size decrease of MAIN_FORKNUM only
> 
> storage.c tracks only MAIN_FORKNUM in pendingsync->max_truncated.  Is it
> possible for MAIN_FORKNUM to have a net size increase while FSM_FORKNUM has a
> net size decrease?  I haven't tested, but this sequence seems possible:
> 
>   TRUNCATE
> reduces MAIN_FORKNUM from 100 blocks to 0 blocks
> reduces FSM_FORKNUM from 3 blocks to 0 blocks
>   COPY
> raises MAIN_FORKNUM from 0 blocks to 110 blocks
> does not change FSM_FORKNUM
>   COMMIT
> should fsync, but wrongly chooses log_newpage_range() approach
> 
> If that's indeed a problem, beside the obvious option of tracking every fork's
> max_truncated, we could convert max_truncated to a bool and use fsync anytime
> the relation experienced an mdtruncate().  (While FSM_FORKNUM is not critical
> for database operations, the choice to subject it to checksums entails
> protecting it here.)  If that's not a problem, would you explain?

That causes page-load failure since FSM can offer a nonexistent heap
block, which failure leads to ERROR of an SQL statement. It's not
critical but surely a problem. I'd like to take the bool option
because insert-truncate sequence is rarely happen. That case is not
our main target of the optimization so it is enough for us to make
sure that the operation doesn't lead to such errors.

The attached is nm30 patch followed by the three fix patches for the
three defects. The new member "RelationData.isremoved" is renamed to
"isdropped" in this version.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 3a811c76874ae3c596e138369766ad00888c572c Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 21 Nov 2019 15:28:06 +0900
Subject: [PATCH v32 1/4] Rework WAL-skipping optimization

While wal_level=minimal we omit WAL-logging for certain some
operations on relfilenodes that are created in the current
transaction. The files are fsynced at commit. The machinery
accelerates bulk-insertion operations but it fails in certain sequence
of operations and a crash just after commit may leave broken table
files.

This patch overhauls the machinery so that WAL-loggings on all
operations are omitted for such relfilenodes. This patch also
introduces a new feature that small files are emitted as a WAL record
instead of syncing. The new GUC variable wal_skip_threshold controls
the threshold.
---
 doc/src/sgml/config.sgml|  43 ++-
 doc/src/sgml/perform.sgml   |  47 +--
 src/backend/access/common/toast_internals.c |   4 +-
 src/backend/access/gist/gistutil.c  |  31 +-
 src/backend/access/gist/gistxlog.c  |  21 ++
 src/backend/access/heap/heapam.c|  45 +--
 src/backend/access/heap/heapam_handler.c|  22 +-
 src/backend/access/heap/rewriteheap.c   |  21 +-
 src/backend/access/nbtree/nbtsort.c |  41 +--
 src/backend/access/rmgrdesc/gistdesc.c  |   5 +
 src/backend/access/transam/README   |  45 ++-
 src/backend/access/transam/xact.c   |  15 +
 src/backend/access/transam/xloginsert.c |  10 +-
 src/backend/access/transam/xlogutils.c  |  18 +-
 src/backend/catalog/heap.c  |   4 +
 src/backend/catalog/storage.c   | 248 -
 src/backend/commands/cluster.c  |  12 +-
 src/backend/commands/copy.c |  58 +--
 src/backend/commands/createas.c |  11 +-
 src/backend/commands/matview.c  |  12 +-
 src/backend/commands/tablecmds.c|  11 +-
 src/backend/storage/buffer/bufmgr.c | 125 ++-
 src/backend/storage/lmgr/lock.c |  12 +
 src/backend/storage/smgr/md.c   |  36 +-
 src/backend/storage/smgr/smgr.c |  35 ++
 src/backend/utils/cache/relcache.c  | 159 +++--
 src/backend/utils/misc/guc.c|  13 +
 src/include/access/gist_private.h   |   2 +
 src/include/access/gistxlog.h   |   1 +
 src/include/access/heapam.h |   3 -
 src/include/access/rewriteheap.h|   2 +-
 src/include/access/tableam.h|  15 +-
 src/include/catalog/storage.h   |   6 +
 src/include/storage/bufmgr.h|   4 +
 src/include/storage/lock.h  |   3 +
 src/include/storage/smgr.h  |   1 +
 src/include/utils/rel.h |  57 ++-
 src/include/utils/relcache.h|   8 +-
 src/test/recovery/t/018_wal_optimize.pl | 374 
 src/test/regress/expected/alter_table.out   |   6 +
 src/test/regress/sql/alter_table.sql|   7 +
 41 files changed, 1242 insertions(+), 351 deletions(-)
 create mode 100644 src/test/recovery/t/018_wal_optimize.pl

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5d45b6f7cb..0e7a0bc0ee 100644
--- a/doc/src/sgml/config.sgml
+

Re: [HACKERS] Block level parallel vacuum

On Thu, Jan 16, 2020 at 10:11 AM Mahendra Singh Thalor
 wrote:
>
> On Thu, 16 Jan 2020 at 08:22, Amit Kapila  wrote:
> >
> > > 2.
> > > I checked time taken by vacuum.sql test. Execution time is almost same
> > > with and without v45 patch.
> > >
> > > Without v45 patch:
> > > Run1) vacuum   ... ok 701 ms
> > > Run2) vacuum   ... ok 549 ms
> > > Run3) vacuum   ... ok 559 ms
> > > Run4) vacuum   ... ok 480 ms
> > >
> > > With v45 patch:
> > > Run1) vacuum   ... ok 842 ms
> > > Run2) vacuum   ... ok 808 ms
> > > Run3)  vacuum   ... ok 774 ms
> > > Run4) vacuum   ... ok 792 ms
> > >
> >
> > I see some variance in results, have you run with autovacuum as off.
> > I was expecting that this might speed up some cases where parallel
> > vacuum is used by default.
>
> I think, this is expected difference in timing because we are adding
> some vacuum related test. I am not starting server manually(means I am
> starting server with only default setting).
>

Can you once test by setting autovacuum = off?  The autovacuum leads
to variability in test timing.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: SlabCheck leaks memory into TopMemoryContext

Andres Freund  writes:
> I just noticed that having a slab context around in an assertion build
> leads to performance degrading and memory usage going up. A bit of
> poking revealed that SlabCheck() doesn't free the freechunks it
> allocates.

> It's on its own obviously trivial to fix.

It seems like having a context check function do new allocations
is something to be avoided in the first place.  It's basically assuming
that the memory management mechanism is sane, which makes the whole thing
fundamentally circular, even if it's relying on some other context to be
sane.  Is there a way to do the checking without extra allocations?

regards, tom lane

Re: Minimal logical decoding on standbys

2020-01-15 Thread Amit Khandekar

On Fri, 10 Jan 2020 at 17:50, Rahila Syed  wrote:
>
> Hi Amit,
>
> Can you please rebase the patches as they don't apply on latest master?

Thanks for notifying. Attached is the rebased version.


logicaldecodng_standby_v5_rebased.tar.gz
Description: GNU Zip compressed data

SlabCheck leaks memory into TopMemoryContext

Hi Tomas,

I just noticed that having a slab context around in an assertion build
leads to performance degrading and memory usage going up. A bit of
poking revealed that SlabCheck() doesn't free the freechunks it
allocates.

It's on its own obviously trivial to fix.

But there's also this note at the top:
 * NOTE: report errors as WARNING, *not* ERROR or FATAL.  Otherwise you'll
 * find yourself in an infinite loop when trouble occurs, because this
 * routine will be entered again when elog cleanup tries to release memory!

which seems to be violated by doing:

/* bitmap of free chunks on a block */
freechunks = palloc(slab->chunksPerBlock * sizeof(bool));

I guess it'd be better to fall back to malloc() here, and handle the
allocation failure gracefully? Or perhaps we can just allocate something
persistently during SlabCreate()?

Greetings,

Andres Freund

Re: [HACKERS] Block level parallel vacuum

On Thu, 16 Jan 2020 at 08:22, Amit Kapila  wrote:
>
> On Thu, Jan 16, 2020 at 1:02 AM Mahendra Singh Thalor
>  wrote:
> >
> > On Wed, 15 Jan 2020 at 19:31, Mahendra Singh Thalor  
> > wrote:
> > >
> > > On Wed, 15 Jan 2020 at 19:04, Mahendra Singh Thalor  
> > > wrote:
> > > >
> > > >
> > > > I reviewed v48 patch and below are some comments.
> > > >
> > > > 1.
> > > > +* based on the number of indexes.  -1 indicates a parallel vacuum 
> > > > is
> > > >
> > > > I think, above should be like "-1 indicates that parallel vacuum is"
> > > >
>
> I am not an expert in this matter, but I am not sure if your
> suggestion is correct.  I thought an article is required here, but I
> could be wrong.  Can you please clarify?
>
> > > > 2.
> > > > +/* Variables for cost-based parallel vacuum  */
> > > >
> > > > At the end of comment, there is 2 spaces.  I think, it should be only 1 
> > > > space.
> > > >
> > > > 3.
> > > > I think, we should add a test case for parallel option(when degree is 
> > > > not specified).
> > > > Ex:
> > > > postgres=# VACUUM (PARALLEL) tmp;
> > > > ERROR:  parallel option requires a value between 0 and 1024
> > > > LINE 1: VACUUM (PARALLEL) tmp;
> > > > ^
> > > > postgres=#
> > > >
> > > > Because above error is added in this parallel patch, so we should have 
> > > > test case for this to increase code coverage.
> > > >
>
> I thought about it but was not sure to add a test for it.  We might
> not want to add a test for each and every case as that will increase
> the number and time of tests without a significant advantage.  Now
> that you have pointed this, I can add a test for it unless someone
> else thinks otherwise.
>
> >
> > 1.
> > With v45 patch, compute_parallel_delay is never called so function hit
> > is zero. I think, we can add some delay options into vacuum.sql test
> > to hit function.
> >
>
> But how can we meaningfully test the functionality of the delay?  It
> would be tricky to come up with a portable test that can always
> produce consistent results.
>
> > 2.
> > I checked time taken by vacuum.sql test. Execution time is almost same
> > with and without v45 patch.
> >
> > Without v45 patch:
> > Run1) vacuum   ... ok 701 ms
> > Run2) vacuum   ... ok 549 ms
> > Run3) vacuum   ... ok 559 ms
> > Run4) vacuum   ... ok 480 ms
> >
> > With v45 patch:
> > Run1) vacuum   ... ok 842 ms
> > Run2) vacuum   ... ok 808 ms
> > Run3)  vacuum   ... ok 774 ms
> > Run4) vacuum   ... ok 792 ms
> >
>
> I see some variance in results, have you run with autovacuum as off.
> I was expecting that this might speed up some cases where parallel
> vacuum is used by default.

I think, this is expected difference in timing because we are adding
some vacuum related test. I am not starting server manually(means I am
starting server with only default setting).

If we start server with default settings, then we will not hit vacuum
related test cases to parallel because size of index relation is very
small so we will not do parallel vacuum.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

On Fri, Jan 10, 2020 at 08:07:48PM +0900, Michael Paquier wrote:
> Thinking more about it, this has a race condition if a temporary
> schema is removed after collecting the OIDs in the drop phase.  So the
> updated attached is actually much more conservative and does not need
> an update of the log message, without giving up on the improvements
> done in v11~.  In 9.4~10, the code of the second phase relies on
> GetTempNamespaceBackendId() which causes an orphaned relation to not
> be dropped in the event of a missing namespace.  I'll just leave that
> alone for a couple of days now..

And back on that one, I still like better the solution as of the
attached which skips any relations with their namespace gone missing
as 246a6c87's intention was only to allow orphaned temp relations to
be dropped by autovacuum when a backend slot is connected, but not
using yet its own temp namespace.

If we want the drop of temp relations to work properly, more thoughts
are needed regarding the storage part, and I am not actually sure that
it is autovacuum's job to handle that better.

Any thoughts?
--
Michael
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index f0e40e36af..9eb8132a37 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2071,9 +2071,11 @@ do_autovacuum(void)
 		{
 			/*
 			 * We just ignore it if the owning backend is still active and
-			 * using the temporary schema.
+			 * using the temporary schema.  If the namespace does not exist
+			 * ignore the entry.
 			 */
-			if (!isTempNamespaceInUse(classForm->relnamespace))
+			if (!isTempNamespaceInUse(classForm->relnamespace) &&
+get_namespace_name(classForm->relnamespace) != NULL)
 			{
 /*
  * The table seems to be orphaned -- although it might be that
@@ -2202,6 +2204,7 @@ do_autovacuum(void)
 		Oid			relid = lfirst_oid(cell);
 		Form_pg_class classForm;
 		ObjectAddress object;
+		char	   *nspname;
 
 		/*
 		 * Check for user-requested abort.
@@ -2243,7 +2246,15 @@ do_autovacuum(void)
 			continue;
 		}
 
-		if (isTempNamespaceInUse(classForm->relnamespace))
+		nspname = get_namespace_name(classForm->relnamespace);
+
+		/*
+		 * Nothing to do for a relation with a missing namespace.  This
+		 * check is the same as above when building the list of orphan
+		 * relations.
+		 */
+		if (isTempNamespaceInUse(classForm->relnamespace) ||
+			nspname == NULL)
 		{
 			UnlockRelationOid(relid, AccessExclusiveLock);
 			continue;
@@ -2253,8 +2264,7 @@ do_autovacuum(void)
 		ereport(LOG,
 (errmsg("autovacuum: dropping orphan temp table \"%s.%s.%s\"",
 		get_database_name(MyDatabaseId),
-		get_namespace_name(classForm->relnamespace),
-		NameStr(classForm->relname;
+		nspname, NameStr(classForm->relname;
 
 		object.classId = RelationRelationId;
 		object.objectId = relid;


signature.asc
Description: PGP signature

Re: Implementing Incremental View Maintenance

2020-01-15 Thread nuko yokohama

Aggregate operation of user-defined type cannot be specified
(commit e150d964df7e3aeb768e4bae35d15764f8abd284)

A SELECT statement using the MIN() and MAX() functions can be executed on a
user-defined type column that implements the aggregate functions MIN () and
MAX ().
However, if the same SELECT statement is specified in the AS clause of
CREATE INCREMENTAL MATERIALIZED VIEW, the following error will occur.

```
SELECT MIN(data) data_min, MAX(data) data_max FROM foo;
 data_min | data_max
--+--
 1/3  | 2/3
(1 row)

CREATE INCREMENTAL MATERIALIZED VIEW foo_min_imv AS SELECT MIN(data)
data_min FROM foo;
psql:extension-agg.sql:14: ERROR:  aggregate function min is not supported
CREATE INCREMENTAL MATERIALIZED VIEW foo_max_imv AS SELECT MAX(data)
data_max FROM foo;
psql:extension-agg.sql:15: ERROR:  aggregate function max is not supported
```

Does query including user-defined type aggregate operation not supported by
INCREMENTAL MATERIALIZED VIEW?

An execution example is shown below.

```
[ec2-user@ip-10-0-1-10 ivm]$ cat extension-agg.sql
--
-- pg_fraction: https://github.com/nuko-yokohama/pg_fraction
--
DROP EXTENSION IF EXISTS pg_fraction CASCADE;
DROP TABLE IF EXISTS foo CASCADE;

CREATE EXTENSION IF NOT EXISTS pg_fraction;
\dx
\dT+ fraction

CREATE TABLE foo (id int, data fraction);
INSERT INTO foo (id, data) VALUES (1,'2/3'),(2,'1/3'),(3,'1/2');
SELECT MIN(data) data_min, MAX(data) data_max FROM foo;
CREATE INCREMENTAL MATERIALIZED VIEW foo_min_imv AS SELECT MIN(data)
data_min FROM foo;
CREATE INCREMENTAL MATERIALIZED VIEW foo_max_imv AS SELECT MAX(data)
data_max FROM foo;

SELECT MIN(id) id_min, MAX(id) id_max FROM foo;
CREATE INCREMENTAL MATERIALIZED VIEW foo_id_imv AS SELECT MIN(id) id_min,
MAX(id) id_max FROM foo;
```

Best regards.

2018年12月27日(木) 21:57 Yugo Nagata :

> Hi,
>
> I would like to implement Incremental View Maintenance (IVM) on
> PostgreSQL.
> IVM is a technique to maintain materialized views which computes and
> applies
> only the incremental changes to the materialized views rather than
> recomputate the contents as the current REFRESH command does.
>
> I had a presentation on our PoC implementation of IVM at PGConf.eu 2018
> [1].
> Our implementation uses row OIDs to compute deltas for materialized
> views.
> The basic idea is that if we have information about which rows in base
> tables
> are contributing to generate a certain row in a matview then we can
> identify
> the affected rows when a base table is updated. This is based on an idea of
> Dr. Masunaga [2] who is a member of our group and inspired from ID-based
> approach[3].
>
> In our implementation, the mapping of the row OIDs of the materialized view
> and the base tables are stored in "OID map". When a base relation is
> modified,
> AFTER trigger is executed and the delta is recorded in delta tables using
> the transition table feature. The accual udpate of the matview is triggerd
> by REFRESH command with INCREMENTALLY option.
>
> However, we realize problems of our implementation. First, WITH OIDS will
> be removed since PG12, so OIDs are no longer available. Besides this, it
> would
> be hard to implement this since it needs many changes of executor nodes to
> collect base tables's OIDs during execuing a query. Also, the cost of
> maintaining
> OID map would be high.
>
> For these reasons, we started to think to implement IVM without relying on
> OIDs
> and made a bit more surveys.
>
> We also looked at Kevin Grittner's discussion [4] on incremental matview
> maintenance.  In this discussion, Kevin proposed to use counting algorithm
> [5]
> to handle projection views (using DISTNICT) properly. This algorithm need
> an
> additional system column, count_t, in materialized views and delta tables
> of
> base tables.
>
> However, the discussion about IVM is now stoped, so we would like to
> restart and
> progress this.
>
>
> Through our PoC inplementation and surveys, I think we need to think at
> least
> the followings for implementing IVM.
>
> 1. How to extract changes on base tables
>
> I think there would be at least two approaches for it.
>
>  - Using transition table in AFTER triggers
>  - Extracting changes from WAL using logical decoding
>
> In our PoC implementation, we used AFTER trigger and transition tables,
> but using
> logical decoding might be better from the point of performance of base
> table
> modification.
>
> If we can represent a change of UPDATE on a base table as query-like
> rather than
> OLD and NEW, it may be possible to update the materialized view directly
> instead
> of performing delete & insert.
>
>
> 2. How to compute the delta to be applied to materialized views
>
> Essentially, IVM is based on relational algebra. Theorically, changes on
> base
> tables are represented as deltas on this, like "R <- R + dR", and the
> delta on
> the materialized view is computed using base table deltas based on "change
> propagation equations".  For implementation, we have to derive the

Re: isTempNamespaceInUse() is incorrect with its handling of MyBackendId

On Tue, Jan 14, 2020 at 07:23:19AM +0900, Michael Paquier wrote:
> Yes, I'd rather keep this routine in its simplest shape for now.  If
> the optimization makes sense, though in most cases it won't because it
> just helps sessions to detect faster their own temp schema, then let's
> do it.  I'll let this patch aside for a couple of days to let others
> comment on it, and if there are no objections, I'll commit the fix.
> Thanks for the lookup!

For the archive's sake: this has been fixed with ac5bdf6, down to 11.
--
Michael


signature.asc
Description: PGP signature

Re: Reorderbuffer crash during recovery

2020-01-15 Thread Dilip Kumar

On Tue, Dec 31, 2019 at 11:35 AM vignesh C  wrote:
>
> On Mon, Dec 30, 2019 at 11:17 AM Amit Kapila  wrote:
> >
> > On Fri, Dec 27, 2019 at 8:37 PM Alvaro Herrera  
> > wrote:
> > >
> > > On 2019-Dec-27, vignesh C wrote:
> > >
> > > > I felt amit solution also solves the problem. Attached patch has the
> > > > fix based on the solution proposed.
> > > > Thoughts?
> > >
> > > This seems a sensible fix to me, though I didn't try to reproduce the
> > > failure.
> > >
> > > > @@ -2472,6 +2457,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, 
> > > > ReorderBufferTXN *txn)
> > > >   }
> > > >
> > > >   ReorderBufferSerializeChange(rb, txn, fd, change);
> > > > + txn->final_lsn = change->lsn;
> > > >   dlist_delete(&change->node);
> > > >   ReorderBufferReturnChange(rb, change);
> > >
> > > Should this be done insider ReorderBufferSerializeChange itself, instead
> > > of in its caller?
> > >
> >
> > makes sense.  But, I think we should add a comment specifying the
> > reason why it is important to set final_lsn while serializing the
> > change.
>
> Fixed
>
> > >  Also, would it be sane to verify that the TXN
> > > doesn't already have a newer final_lsn?  Maybe as an Assert.
> > >
> >
> > I don't think this is a good idea because we update the final_lsn with
> > commit_lsn in ReorderBufferCommit after which we can try to serialize
> > the remaining changes.  Instead, we should update it only if the
> > change_lsn value is greater than final_lsn.
> >
>
> Fixed.
> Thanks Alvaro & Amit for your suggestions. I have made the changes
> based on your suggestions. Please find the updated patch for the same.
> I have also verified the patch in back branches. Separate patch was
> required for Release-10 branch, patch for the same is attached as
> 0001-Reorder-buffer-crash-while-aborting-old-transactions-REL_10.patch.
> Thoughts?

One minor comment.  Otherwise, the patch looks fine to me.
+ /*
+ * We set final_lsn on a transaction when we decode its commit or abort
+ * record, but we never see those records for crashed transactions.  To
+ * ensure cleanup of these transactions, set final_lsn to that of their
+ * last change; this causes ReorderBufferRestoreCleanup to do the right
+ * thing. Final_lsn would have been set with commit_lsn earlier when we
+ * decode it commit, no need to update in that case
+ */
+ if (txn->final_lsn < change->lsn)
+ txn->final_lsn = change->lsn;

/decode it commit,/decode its commit,

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: making the backend's json parser work in frontend code

On Wed, Jan 15, 2020 at 09:39:13PM -0500, Robert Haas wrote:
> On Wed, Jan 15, 2020 at 6:40 PM Andres Freund  wrote:
>> It's not obvious why the better approach here wouldn't be to just have a
>> very simple ereport replacement, that needs to be explicitly included
>> from frontend code. It'd not be meaningfully harder, imo, and it'd
>> require fewer adaptions, and it'd look more familiar.
> 
> I agree that it's far from obvious that the hacks in the patch are
> best; to the contrary, they are hacks. That said, I feel that the
> semantics of throwing an error are not very well-defined in a
> front-end environment. I mean, in a backend context, throwing an error
> is going to abort the current transaction, with all that this implies.
> If the frontend equivalent is to do nothing and hope for the best, I
> doubt it will survive anything more than the simplest use cases. This
> is one of the reasons I've been very reluctant to go do down this
> whole path in the first place.

The error handling is a well defined concept in the backend.  If
connected to a database, you know that a session has to rollback any
existing activity, etc.  The clients have to be more flexible because
an error depends a lot of how the tools is designed and how it should
react on a error.  So the backend code in charge of logging an error
does the best it can: it throws an error, then lets the caller decide
what to do with it.  I agree with the feeling that having a simple
replacement for ereport() in the frontend would be nice, that would be
less code churn in parts shared by backend/frontend.

> Not sure how. pg_mblen() and PQmblen() are both existing interfaces,
> and they're not compatible with each other. I guess we could make
> PQmblen() available to backend code, but given that the function name
> implies an origin in libpq, that seems wicked confusing.

Well, the problem here is the encoding part, and the code looks at the
same table pg_wchar_table[] at the end, so this needs some thoughts.
On top of that, we don't know exactly on the client what kind of
encoding is available (this led for example to several
assumptions/hiccups behind the implementation of SCRAM as it requires
UTF-8 per its RFC when working on the libpq part). 
--
Michael


signature.asc
Description: PGP signature

Re: Setting min/max TLS protocol in clientside libpq

On Wed, Jan 15, 2020 at 02:58:09PM +0900, Michael Paquier wrote:
> On Tue, Jan 14, 2020 at 11:01:00PM +0100, Daniel Gustafsson wrote:
>> Files renamed to match existing naming convention, the rest of the patch left
>> unchanged.
>
> [previous review]

One thing I remembered after sleeping on it is that we can split the
patch into two parts: the refactoring pieces and the addition of the
options for libpq.  The previous review mostly impacts the libpq part,
and the split is straight-forward, so attached is a patch for only the
refactoring pieces with some fixes and tweaks.  I have tested it with
and without OpenSSL, using 1.0.2 and 1.1.0 on Linux and Windows
(MSVC).  Those tests have allowed me to find an error in the previous
patch that I missed: the new files openssl.h and protocol_openssl.c
still declared SSL_CTX_set_min/max_proto_version as static functions,
so compilation was broken when trying to use OpenSSL <= 1.0.2.

If that looks fine, I would like to get that part committed first.
Daniel, any thoughts?
--
Michael
diff --git a/src/include/common/openssl.h b/src/include/common/openssl.h
new file mode 100644
index 00..47fa129994
--- /dev/null
+++ b/src/include/common/openssl.h
@@ -0,0 +1,28 @@
+/*-
+ *
+ * openssl.h
+ *	  OpenSSL supporting functionality shared between frontend and backend
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *		  src/include/common/openssl.h
+ *
+ *-
+ */
+#ifndef COMMON_OPENSSL_H
+#define COMMON_OPENSSL_H
+
+#ifdef USE_OPENSSL
+#include 
+
+/* src/common/protocol_openssl.c */
+#ifndef SSL_CTX_set_min_proto_version
+extern int	SSL_CTX_set_min_proto_version(SSL_CTX *ctx, int version);
+extern int	SSL_CTX_set_max_proto_version(SSL_CTX *ctx, int version);
+#endif
+
+#endif
+
+#endif			/* COMMON_OPENSSL_H */
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 62f1fcab2b..0cc59f1be1 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -36,6 +36,7 @@
 #include 
 #endif
 
+#include "common/openssl.h"
 #include "libpq/libpq.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -69,11 +70,6 @@ static bool ssl_is_server_start;
 
 static int	ssl_protocol_version_to_openssl(int v, const char *guc_name,
 			int loglevel);
-#ifndef SSL_CTX_set_min_proto_version
-static int	SSL_CTX_set_min_proto_version(SSL_CTX *ctx, int version);
-static int	SSL_CTX_set_max_proto_version(SSL_CTX *ctx, int version);
-#endif
-
 
 /*  */
 /*		 Public interface		*/
@@ -1314,96 +1310,3 @@ ssl_protocol_version_to_openssl(int v, const char *guc_name, int loglevel)
 	GetConfigOption(guc_name, false, false;
 	return -1;
 }
-
-/*
- * Replacements for APIs present in newer versions of OpenSSL
- */
-#ifndef SSL_CTX_set_min_proto_version
-
-/*
- * OpenSSL versions that support TLS 1.3 shouldn't get here because they
- * already have these functions.  So we don't have to keep updating the below
- * code for every new TLS version, and eventually it can go away.  But let's
- * just check this to make sure ...
- */
-#ifdef TLS1_3_VERSION
-#error OpenSSL version mismatch
-#endif
-
-static int
-SSL_CTX_set_min_proto_version(SSL_CTX *ctx, int version)
-{
-	int			ssl_options = SSL_OP_NO_SSLv2 | SSL_OP_NO_SSLv3;
-
-	if (version > TLS1_VERSION)
-		ssl_options |= SSL_OP_NO_TLSv1;
-	/*
-	 * Some OpenSSL versions define TLS*_VERSION macros but not the
-	 * corresponding SSL_OP_NO_* macro, so in those cases we have to return
-	 * unsuccessfully here.
-	 */
-#ifdef TLS1_1_VERSION
-	if (version > TLS1_1_VERSION)
-	{
-#ifdef SSL_OP_NO_TLSv1_1
-		ssl_options |= SSL_OP_NO_TLSv1_1;
-#else
-		return 0;
-#endif
-	}
-#endif
-#ifdef TLS1_2_VERSION
-	if (version > TLS1_2_VERSION)
-	{
-#ifdef SSL_OP_NO_TLSv1_2
-		ssl_options |= SSL_OP_NO_TLSv1_2;
-#else
-		return 0;
-#endif
-	}
-#endif
-
-	SSL_CTX_set_options(ctx, ssl_options);
-
-	return 1;	/* success */
-}
-
-static int
-SSL_CTX_set_max_proto_version(SSL_CTX *ctx, int version)
-{
-	int			ssl_options = 0;
-
-	AssertArg(version != 0);
-
-	/*
-	 * Some OpenSSL versions define TLS*_VERSION macros but not the
-	 * corresponding SSL_OP_NO_* macro, so in those cases we have to return
-	 * unsuccessfully here.
-	 */
-#ifdef TLS1_1_VERSION
-	if (version < TLS1_1_VERSION)
-	{
-#ifdef SSL_OP_NO_TLSv1_1
-		ssl_options |= SSL_OP_NO_TLSv1_1;
-#else
-		return 0;
-#endif
-	}
-#endif
-#ifdef TLS1_2_VERSION
-	if (version < TLS1_2_VERSION)
-	{
-#ifdef SSL_OP_NO_TLSv1_2
-		ssl_options |= SSL_OP_NO_TLSv1_2;
-#else
-		return 0;
-#endif
-	}
-#endif
-
-	SSL_CTX_set_options(ctx, ssl_options);
-
-	return 1;	/* success */
-}
-
-#endif			/* !SSL_CTX_set_min_

Re: aggregate crash

Hi,

On 2020-01-15 12:47:47 -0800, Andres Freund wrote:
> FWIW, I'm working on narrowing it down to something small. I can
> reliably trigger the bug, and I understand the mechanics, I
> think. Interestingly enough the reproducer currently only triggers on
> v12, not on v11 and before.

That's just happenstance due to allocation changes in plpgsql,
though. The attached small reproducer, for me, reliably triggers crashes
on 10 - master.

It's hard to hit intentionally, because plpgsql does a datumCopy() to
its non-null return value, which means that to hit the bug, one needs
different numbers of allocations between setting up the transition value
with transvalueisnull = true, transvalue = 0xsomepointer (because
plpgsql doesn't copy NULLs), and the transition output with
transvalueisnull = false, transvalue = 0xsomepointer. Which is necessary
to trigger the bug, as it's then not reparented into a long lived enough
context. To be then freed/accessed for the next group input value.

I think this is too finnicky to actually keep as a regression test.

The bug, in a way, exists all the way back, but it's a bit harder to
create NULL values where the datum component isn't 0.

To fix I suggest we, in all branches, do the equivalent of adding
something like:
diff --git i/src/backend/executor/execExprInterp.c 
w/src/backend/executor/execExprInterp.c
index 790380051be..3260a63ac6b 100644
--- i/src/backend/executor/execExprInterp.c
+++ w/src/backend/executor/execExprInterp.c
@@ -4199,6 +4199,12 @@ ExecAggTransReparent(AggState *aggstate, 
AggStatePerTrans pertrans,
  pertrans->transtypeByVal,
  pertrans->transtypeLen);
 }
+else
+{
+/* ensure datum component is 0 for NULL transition values */
+newValue = (Datum) 0;
+}
+
 if (!oldValueIsNull)
 {
 if (DatumIsReadWriteExpandedObject(oldValue,

and a comment explaining why it's (now) safe to rely on datum
comparisons for
if (DatumGetPointer(newVal) != DatumGetPointer(pergroup->transValue))

I don't think it makes sense to add something like it to the byval case
- there's plenty other ways a function returning != 0 with
fcinfo->isnull == true can cause such values to exist. And that's
longstanding.

A separate question is whether it's worth adding code to
e.g. EEO_CASE(EEOP_FUNCEXPR_STRICT) also resetting *op->resvalue to
(Datum) 0.  I don't personally don't think ensuring the datum is always
0 when isnull true is all that helpful, if we can't guarantee it
everywhere. So I'm a bit loathe to add cycles to places that don't need
it, and are hot.

Regards,

Andres

xrepro.sql
Description: application/sql

Re: pgindent && weirdness

2020-01-15 Thread Thomas Munro

On Wed, Jan 15, 2020 at 11:30 AM Tom Lane  wrote:
> Alvaro Herrera  writes:
> > I just ran pgindent over some patch, and noticed that this hunk ended up
> > in my working tree:
>
> > - if (IsA(leftop, Var) && IsA(rightop, Const))
> > + if (IsA(leftop, Var) &&IsA(rightop, Const))
>
> Yeah, it's been doing that for decades.  I think the triggering
> factor is the typedef name (Var, here) preceding the &&.
>
> It'd be nice to fix properly, but I've tended to take the path
> of least resistance by breaking such lines to avoid the ugliness:
>
> if (IsA(leftop, Var) &&
> IsA(rightop, Const))

I am on vacation away from the Internet this week but somehow saw this
on my phone and couldn't stop myself from peeking at pg_bsd_ident
again. Yeah, "(Var)" (where Var is a known typename) causes it to
think that any following operator must be unary.

One way to fix that in the cases Alvaro is referring to is to tell
override the setting so that && (and likewise ||) are never considered
to be unary, though I haven't tested this much and there are surely
other ways to achieve this:

diff --git a/lexi.c b/lexi.c
index d43723c..6de3227 100644
--- a/lexi.c
+++ b/lexi.c
@@ -655,6 +655,12 @@ stop_lit:
unary_delim = state->last_u_d;
break;
}
+
+   /* && and || are never unary */
+   if ((token[0] == '&' && *buf_ptr == '&') ||
+   (token[0] == '|' && *buf_ptr == '|'))
+   state->last_u_d = false;
+
while (*(e_token - 1) == *buf_ptr || *buf_ptr == '=') {
/*
 * handle ||, &&, etc, and also things as in int *i

The problem with that is that && sometimes *should* be formatted like
a unary operator: when it's part of the nonstandard GCC computed goto
syntax.

Re: [HACKERS] Block level parallel vacuum

On Thu, Jan 16, 2020 at 1:02 AM Mahendra Singh Thalor
 wrote:
>
> On Wed, 15 Jan 2020 at 19:31, Mahendra Singh Thalor  
> wrote:
> >
> > On Wed, 15 Jan 2020 at 19:04, Mahendra Singh Thalor  
> > wrote:
> > >
> > >
> > > I reviewed v48 patch and below are some comments.
> > >
> > > 1.
> > > +* based on the number of indexes.  -1 indicates a parallel vacuum is
> > >
> > > I think, above should be like "-1 indicates that parallel vacuum is"
> > >

I am not an expert in this matter, but I am not sure if your
suggestion is correct.  I thought an article is required here, but I
could be wrong.  Can you please clarify?

> > > 2.
> > > +/* Variables for cost-based parallel vacuum  */
> > >
> > > At the end of comment, there is 2 spaces.  I think, it should be only 1 
> > > space.
> > >
> > > 3.
> > > I think, we should add a test case for parallel option(when degree is not 
> > > specified).
> > > Ex:
> > > postgres=# VACUUM (PARALLEL) tmp;
> > > ERROR:  parallel option requires a value between 0 and 1024
> > > LINE 1: VACUUM (PARALLEL) tmp;
> > > ^
> > > postgres=#
> > >
> > > Because above error is added in this parallel patch, so we should have 
> > > test case for this to increase code coverage.
> > >

I thought about it but was not sure to add a test for it.  We might
not want to add a test for each and every case as that will increase
the number and time of tests without a significant advantage.  Now
that you have pointed this, I can add a test for it unless someone
else thinks otherwise.

>
> 1.
> With v45 patch, compute_parallel_delay is never called so function hit
> is zero. I think, we can add some delay options into vacuum.sql test
> to hit function.
>

But how can we meaningfully test the functionality of the delay?  It
would be tricky to come up with a portable test that can always
produce consistent results.

> 2.
> I checked time taken by vacuum.sql test. Execution time is almost same
> with and without v45 patch.
>
> Without v45 patch:
> Run1) vacuum   ... ok 701 ms
> Run2) vacuum   ... ok 549 ms
> Run3) vacuum   ... ok 559 ms
> Run4) vacuum   ... ok 480 ms
>
> With v45 patch:
> Run1) vacuum   ... ok 842 ms
> Run2) vacuum   ... ok 808 ms
> Run3)  vacuum   ... ok 774 ms
> Run4) vacuum   ... ok 792 ms
>

I see some variance in results, have you run with autovacuum as off.
I was expecting that this might speed up some cases where parallel
vacuum is used by default.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: making the backend's json parser work in frontend code

2020-01-15 Thread Robert Haas

On Wed, Jan 15, 2020 at 6:40 PM Andres Freund  wrote:
> It's not obvious why the better approach here wouldn't be to just have a
> very simple ereport replacement, that needs to be explicitly included
> from frontend code. It'd not be meaningfully harder, imo, and it'd
> require fewer adaptions, and it'd look more familiar.

I agree that it's far from obvious that the hacks in the patch are
best; to the contrary, they are hacks. That said, I feel that the
semantics of throwing an error are not very well-defined in a
front-end environment. I mean, in a backend context, throwing an error
is going to abort the current transaction, with all that this implies.
If the frontend equivalent is to do nothing and hope for the best, I
doubt it will survive anything more than the simplest use cases. This
is one of the reasons I've been very reluctant to go do down this
whole path in the first place.

> > +#ifdef FRONTEND
> > + lex->token_terminator = s + 
> > PQmblen(s, PG_UTF8);
> > +#else
> >   lex->token_terminator = s + 
> > pg_mblen(s);
> > +#endif
>
> If we were to go this way, it seems like the ifdef should rather be in a
> helper function, rather than all over.

Sure... like I said, this is just to illustrate the problem.

> It seems like it should be
> unproblematic to have a common interface for both frontend/backend?

Not sure how. pg_mblen() and PQmblen() are both existing interfaces,
and they're not compatible with each other. I guess we could make
PQmblen() available to backend code, but given that the function name
implies an origin in libpq, that seems wicked confusing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: src/test/recovery regression failure on bionic

On Thu, Jan 16, 2020 at 2:10 AM Christoph Berg  wrote:
>
> Re: Amit Kapila 2020-01-09 
> 
> > The point is that we know what is going wrong on sidewinder on back
> > branches.  However, we still don't know what is going wrong with tern
> > and mandrill on v10 [1][2] where the log is:
>
> Fwiw, the problem on bionic disappeared yesterday with the build
> triggered by "Revert test added by commit d207038053".
>

Thanks for the update.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Improve errors when setting incorrect bounds for SSL protocols

On Wed, Jan 15, 2020 at 06:34:39PM +0100, Daniel Gustafsson wrote:
> This is pretty much exactly the patch I was intending to write for this, so +1
> from me.

Thanks for the review.  Let's wait a couple of days to see if others
have objections or more comments about this patch, but I'd like to
fix the issue and backpatch down to 12 where the parameters have been
introduced.
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH v1] pg_ls_tmpdir to show directories

2020-01-15 Thread Justin Pryzby

On Wed, Jan 15, 2020 at 11:21:36AM +0100, Fabien COELHO wrote:
> I'm trying to think about how to get rid of the strange structure and hacks,
> and the arbitrary looking size 2 array.
> 
> Also the recursion is one step, but I'm not sure why, ISTM it could/should
> go on always?

Because tmpfiles only go one level deep.

> Looking at the code, ISTM that relying on a stack/list would be much cleaner
> and easier to understand. The code could look like:

I'm willing to change the implementation, but only after there's an agreement
about the desired behavior (extra column, one level, etc).

Justin

Re: Use compiler intrinsics for bit ops in hash

2020-01-15 Thread Jesse Zhang

On Tue, Jan 14, 2020 at 2:09 PM David Fetter  wrote:
> > The changes in hash AM and SIMPLEHASH do look like a net positive
> > improvement. My biggest cringe might be in pg_bitutils:
> >
> > 1. Is ceil_log2_64 dead code?
>
> Let's call it nascent code. I suspect there are places it could go, if
> I look for them.  Also, it seemed silly to have one without the other.
>

While not absolutely required, I'd like us to find at least one
place and start using it. (Clang also nags at me when we have
unused functions).

> On Tue, Jan 14, 2020 at 12:21:41PM -0800, Jesse Zhang wrote:
> > 4. It seems like you *really* would like an operation like LZCNT in x86
> > (first appearing in Haswell) that is well defined on zero input. ISTM
> > the alternatives are:
> >
> >a) Special case 1. That seems straightforward, but the branching cost
> >on a seemingly unlikely condition seems to be a lot of performance
> >loss
> >
> >b) Use architecture specific intrinsic (and possibly with CPUID
> >shenanigans) like __builtin_ia32_lzcnt_u64 on x86 and use the CLZ
> >intrinsic elsewhere. The CLZ GCC intrinsic seems to map to
> >instructions that are well defined on zero in most ISA's other than
> >x86, so maybe we can get away with special-casing x86?

i. We can detect LZCNT instruction by checking one of the
"extended feature" (EAX=8001) bits using CPUID. Unlike the
"basic features" (EAX=1), extended feature flags have been more
vendor-specific, but fortunately it seems that the feature bit
for LZCNT is the same [1][2].

ii. We'll most likely still need to provide a fallback
implementation for processors that don't have LZCNT (either
because they are from a different vendor, or an older Intel/AMD
processor). I wonder if simply checking for 1 is "good enough".
Maybe a micro benchmark is in order?

> Is there some way to tilt the tools so that this happens?
We have a couple options here:

1. Use a separate object (a la our SSE 4.2 implemenation of
CRC). On Clang and GCC (I don't have MSVC at hand), -mabm or
-mlzcnt should cause __builtin_clz to generate the LZCNT
instruction, which is well defined on zero input. The default
configuration would translate __builtin_clz to code that
subtracts BSR from the width of the input, but BSR leaves the
destination undefined on zero input.

2. (My least favorite) use inline asm (a la our popcount
implementation).

> b) seems much more attractive. Is there some way to tilt the tools so
> that this happens? What should I be reading up on?

The enclosed references hopefully are good places to start. Let
me know if you have more ideas.

Cheers,
Jesse


References:

[1] "How to detect New Instruction support in the 4th generation Intel®
Core™ processor family"
https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family
[2] "Bit Manipulation Instruction Sets"
https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets

Re: making the backend's json parser work in frontend code

Hi,

On 2020-01-15 16:02:49 -0500, Robert Haas wrote:
> The discussion on the backup manifest thread has gotten bogged down on
> the issue of the format that should be used to store the backup
> manifest file. I want something simple and ad-hoc; David Steele and
> Stephen Frost prefer JSON. That is problematic because our JSON parser
> does not work in frontend code, and I want to be able to validate a
> backup against its manifest, which involves being able to parse the
> manifest from frontend code. The latest development over there is that
> David Steele has posted the JSON parser that he wrote for pgbackrest
> with an offer to try to adapt it for use in front-end PostgreSQL code,
> an offer which I genuinely appreciate. I'll write more about that over
> on that thread.

I'm not sure where I come down between using json and a simple ad-hoc
format, when the dependency for the former is making the existing json
parser work in the frontend. But if the alternative is to add a second
json parser, it very clearly shifts towards using an ad-hoc
format. Having to maintain a simple ad-hoc parser is a lot less
technical debt than having a second full blown json parser. Imo even
when an external project or three also has to have that simple parser.

If the alternative were to use that newly proposed json parser to
*replace* the backend one too, the story would again be different.


> 0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm
> missing something, this seems like an overdue cleanup. It's long been
> the case that wchar.c is actually compiled and linked into both
> frontend and backend code. Commit
> 60f11b87a2349985230c08616fa8a34ffde934c8 added code into src/common
> that depends on wchar.c being available, but didn't actually make
> wchar.c part of src/common, which seems like an odd decision: the
> functions in the library are dependent on code that is not part of any
> library but whose source files get copied around where needed. Eh?

Cool.


> 0002 does some basic header cleanup to make it possible to include the
> existing header file jsonapi.h in frontend code. The state of the JSON
> headers today looks generally poor. There seems not to have been much
> attempt to get the prototypes for a given source file, say foo.c, into
> a header file with the same name, say foo.h. Also, dependencies
> between various header files seem to be have added somewhat freely.
> This patch does not come close to fixing all that, but I consider it a
> modest down payment on a cleanup that probably ought to be taken
> further.

Yea, this seems like a necessary cleanup (or well, maybe the start of
it).


> 0003 splits json.c into two files, json.c and jsonapi.c. All the
> lexing and parsing stuff (whose prototypes are in jsonapi.h) goes into
> jsonapi.c, while the stuff that pertains to the 'json' data type
> remains in json.c. This also seems like a good cleanup, because to me,
> at least, it's not a great idea to mix together code that is used by
> both the json and jsonb data types as well as other things in the
> system that want to generate or parse json together with things that
> are specific to the 'json' data type.

+1


> On the other hand, 0004, 0005, and 0006 are charitably described as
> experimental or WIP.  0004 and 0005 hack up jsonapi.c so that it can
> still be compiled even if #include "postgres.h" is changed to #include
> "postgres-fe.h" and 0006 moves it into src/common. Note that I say
> that they make it compile, not work. It's not just untested; it's
> definitely broken. But it gives a feeling for what the remaining
> obstacles to making this code available in a frontend environment are.
> Since I wrote my very first email complaining about the difficulty of
> making the backend's JSON parser work in a frontend environment, one
> obstacle has been knocked down: StringInfo is now available in
> front-end code (commit 26aaf97b683d6258c098859e6b1268e1f5da242f). The
> remaining problems (that I know about) have to do with error reporting
> and multibyte character support; a read of the patches is suggested
> for those wanting further details.

> From d05e1fc82a51cb583a0367e72b1afc0de561dd00 Mon Sep 17 00:00:00 2001
> From: Robert Haas 
> Date: Wed, 15 Jan 2020 10:36:52 -0500
> Subject: [PATCH 4/6] Introduce json_error() macro.
> 
> ---
>  src/backend/utils/adt/jsonapi.c | 221 +---
>  1 file changed, 90 insertions(+), 131 deletions(-)
> 
> diff --git a/src/backend/utils/adt/jsonapi.c b/src/backend/utils/adt/jsonapi.c
> index fc8af9f861..20f7f0f7ac 100644
> --- a/src/backend/utils/adt/jsonapi.c
> +++ b/src/backend/utils/adt/jsonapi.c
> @@ -17,6 +17,9 @@
>  #include "miscadmin.h"
>  #include "utils/jsonapi.h"
>  
> +#define json_error(rest) \
> + ereport(ERROR, (rest, report_json_context(lex)))
> +

It's not obvious why the better approach here wouldn't be to just have a
very simple ereport replacement, that needs to be explicitly included
from frontend

Enabling B-Tree deduplication by default

2020-01-15 Thread Peter Geoghegan

There are some outstanding questions about how B-Tree deduplication
[1] should be configured, and whether or not it should be enabled by
default. I'm starting this new thread in the hopes of generating
discussion on these high level questions.

The commit message of the latest version of the patch has a reasonable
summary of its overall design, that might be worth reviewing before
reading on:

https://www.postgresql.org/message-id/CAH2-Wz=Tr6mxMsKRmv_=9-05_o9qwqozq8gwerv2dxs6+y3...@mail.gmail.com

(If you want to go deeper than that, check out the nbtree README changes.)

B-Tree deduplication comes in two basic varieties:

* Unique index deduplication.

* Non-unique index deduplication.

Each variety works in essentially the same way. However, they differ
in how and when deduplication is actually applied. We can infer quite
a lot about versioning within a unique index, and we know that version
churn is the only problem that we should try to address -- it's not
really about space efficiency, since we know that there won't be any
duplicates in the long run. Also, workloads that have a lot of version
churn in unique indexes are generally quite reliant on LP_DEAD bit
setting within _bt_check_unique() (this is not to be confused with the
similar kill_prior_tuple LP_DEAD bit optimization) -- we need to be
sensitive to that.

Despite the fact that there are many more similarities than
differences here, I would like to present each variety as a different
thing to users (actually, I don't really want users to have to think
about unique index deduplication at all).

I believe that the best path forward for users is to make
deduplication in non-unique indexes a user-visible feature with
documented knobs (a GUC and an index storage parameter), while leaving
unique index deduplication as an internal thing that is barely
documented in the sgml user docs (I'll have a paragraph about it in
the B-Tree internals chapter). Unique index deduplication won't care
about the GUC, and will only trigger a deduplication pass when the
incoming tuple is definitely a duplicate of an existing tuple (this
signals version churn cheaply but reliably). The index storage
parameter will be respected with unique indexes, but only as a
debugging option -- our presumption is that nobody will want to
disable deduplication in unique indexes, since leaving it on has
virtually no downside (because we know exactly when to trigger it, and
when not to). Unique index deduplication is an internal thing that
users benefit from, but aren't really aware of, much like
opportunistic deletion of LP_DEAD items. (A secondary benefit of this
approach is that we don't have to have an awkward section in the
documentation that explains why deduplication in unique indexes isn't
actually an oxymoron.)

Thoughts on presenting unique index deduplication a separate
internal-only optimization, that doesn't care about the GUC?

I also want to talk about a related but separate topic. I propose that
the deduplicate_btree_items GUC (which only affects non-unique
indexes) be turned on by default in Postgres 13. This can be reviewed
at the end of the beta period, just like it was with parallelism and
with JIT. Note that this means that we'd opportunistically perform a
deduplication pass at the point that we usually have to split the
page, even though in general there is no reason to think that that
will work out. (We have no better way of applying deduplication than
"try it and see", unless it's a unique index that has the version
churn trigger heuristic that I just described.)

Append-only tables with multiple non-unique indexes that happen to
have only unique key values will pay a cost without seeing a benefit.
I believe that I saw a 3% throughput cost when I assessed this using
something called the "insert benchmark" [2] a couple of months ago
(this is maintained by Mark Callaghan, the Facebook MyRocks guy). I
need to do more work on quantifying the cost with a recent version of
the patch, especially the cost of only having one LP_DEAD bit for an
entire posting list tuple, but in general I think that this is likely
to be worth it. Consider these points in favor of enabling
deduplication by default:

* This insert-only workload is something that Postgres does
particularly well with -- append-only tables are reported to have
greater throughput in Postgres than in other comparable RDBMSs.
Presumably this is because we don't need UNDO logs, and don't do index
key locking in the same way (apparently even Oracle has locks in
indexes that protect logical transaction state). We are paying a small
cost by enabling deduplication by default, but it is paid in an area
where we already do particularly well.

* Deduplication can be turned off at any time. nbtree posting list
tuples are really not like the ones in GIN, in that we're not
maintaining them over time. A deduplication pass fundamentally works
by generating an alternative physical representation for the same
logical contents --

Re: making the backend's json parser work in frontend code

Robert Haas  writes:
> ... However, I decided to spend today doing some further
> investigation of an alternative approach, namely making the backend's
> existing JSON parser work in frontend code as well. I did not solve
> all the problems there, but I did come up with some patches which I
> think would be worth committing on independent grounds, and I think
> the whole series is worth posting. So here goes.

In general, if we can possibly get to having one JSON parser in
src/common, that seems like an obviously better place to be than
having two JSON parsers.  So I'm encouraged that it might be
feasible after all.

> 0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm
> missing something, this seems like an overdue cleanup.

FWIW, I've been wanting to do that for awhile.  I've not studied
your patch, but +1 for the idea.  We might also need to take a
hard look at mbutils.c to see if any of that code can/should move.

> Since I wrote my very first email complaining about the difficulty of
> making the backend's JSON parser work in a frontend environment, one
> obstacle has been knocked down: StringInfo is now available in
> front-end code (commit 26aaf97b683d6258c098859e6b1268e1f5da242f). The
> remaining problems (that I know about) have to do with error reporting
> and multibyte character support; a read of the patches is suggested
> for those wanting further details.

The patch I just posted at <2863.1579127...@sss.pgh.pa.us> probably
affects this in small ways, but not anything major.

regards, tom lane

Re: Unicode escapes with any backend encoding

I wrote:
> Andrew Dunstan  writes:
>> Perhaps I expressed myself badly. What I meant was that we should keep
>> the json and text escape rules in sync, as they are now. Since we're
>> changing the text rules to allow resolvable non-ascii unicode escapes
>> in non-utf8 locales, we should do the same for json.

> Got it.  I'll make the patch do that in a little bit.

OK, here's v2, which brings JSONB into the fold and also makes some
effort to produce an accurate error cursor for invalid Unicode escapes.
As it's set up, we only pay the extra cost of setting up an error
context callback when we're actually processing a Unicode escape,
so I think that's an acceptable cost.  (It's not much of a cost,
anyway.)

The callback support added here is pretty much a straight copy-and-paste
of the existing functions setup_parser_errposition_callback() and friends.
That's slightly annoying --- we could perhaps merge those into one.
But I didn't see a good common header to put such a thing into, so
I just did it like this.

Another note is that we could use the additional scanner infrastructure
to produce more accurate error pointers for other cases where we're
whining about a bad escape sequence, or some other sub-part of a lexical
token.  I think that'd likely be a good idea, since the existing cursor
placement at the start of the token isn't too helpful if e.g. you're
dealing with a very long string constant.  But to keep this focused,
I only touched the behavior for Unicode escapes.  The rest could be
done as a separate patch.

This also mops up after 7f380c59 by making use of the new pg_wchar.c
exports is_utf16_surrogate_first() etc everyplace that they're relevant
(which is just the JSON code I was touching anyway, as it happens).
I also made a bit of an effort to ensure test coverage of all the
code touched in that patch and this one.

regards, tom lane

diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index 6ff8751..0f0d0c6 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -61,8 +61,8 @@

-  PostgreSQL allows only one character set
-  encoding per database.  It is therefore not possible for the JSON
+  RFC 7159 specifies that JSON strings should be encoded in UTF8.
+  It is therefore not possible for the JSON
   types to conform rigidly to the JSON specification unless the database
   encoding is UTF8. Attempts to directly include characters that
   cannot be represented in the database encoding will fail; conversely,
@@ -77,13 +77,13 @@
   regardless of the database encoding, and are checked only for syntactic
   correctness (that is, that four hex digits follow \u).
   However, the input function for jsonb is stricter: it disallows
-  Unicode escapes for non-ASCII characters (those above U+007F)
-  unless the database encoding is UTF8.  The jsonb type also
+  Unicode escapes for characters that cannot be represented in the database
+  encoding.  The jsonb type also
   rejects \u (because that cannot be represented in
   PostgreSQL's text type), and it insists
   that any use of Unicode surrogate pairs to designate characters outside
   the Unicode Basic Multilingual Plane be correct.  Valid Unicode escapes
-  are converted to the equivalent ASCII or UTF8 character for storage;
+  are converted to the equivalent single character for storage;
   this includes folding surrogate pairs into a single character.

@@ -96,9 +96,8 @@
not jsonb. The fact that the json input function does
not make these checks may be considered a historical artifact, although
it does allow for simple storage (without processing) of JSON Unicode
-   escapes in a non-UTF8 database encoding.  In general, it is best to
-   avoid mixing Unicode escapes in JSON with a non-UTF8 database encoding,
-   if possible.
+   escapes in a database encoding that does not support the represented
+   characters.

@@ -144,8 +143,8 @@

 string
 text
-\u is disallowed, as are non-ASCII Unicode
- escapes if database encoding is not UTF8
+\u is disallowed, as are Unicode escapes
+ representing characters not available in the database encoding

 number
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index c908e0b..e134877 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -189,6 +189,23 @@ UPDATE "my_table" SET "a" = 5;
 ampersands.  The length limitation still applies.

+   
+Quoting an identifier also makes it case-sensitive, whereas
+unquoted names are always folded to lower case.  For example, the
+identifiers FOO, foo, and
+"foo" are considered the same by
+PostgreSQL, but
+"Foo" and "FOO" are
+different from these three and each other.  (The folding of
+unquoted names to lower case in PostgreSQL is
+incompatible with the SQL standard, which says that unquoted names
+should be folded to upper case.

making the backend's json parser work in frontend code

2020-01-15 Thread Robert Haas

The discussion on the backup manifest thread has gotten bogged down on
the issue of the format that should be used to store the backup
manifest file. I want something simple and ad-hoc; David Steele and
Stephen Frost prefer JSON. That is problematic because our JSON parser
does not work in frontend code, and I want to be able to validate a
backup against its manifest, which involves being able to parse the
manifest from frontend code. The latest development over there is that
David Steele has posted the JSON parser that he wrote for pgbackrest
with an offer to try to adapt it for use in front-end PostgreSQL code,
an offer which I genuinely appreciate. I'll write more about that over
on that thread. However, I decided to spend today doing some further
investigation of an alternative approach, namely making the backend's
existing JSON parser work in frontend code as well. I did not solve
all the problems there, but I did come up with some patches which I
think would be worth committing on independent grounds, and I think
the whole series is worth posting. So here goes.

0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm
missing something, this seems like an overdue cleanup. It's long been
the case that wchar.c is actually compiled and linked into both
frontend and backend code. Commit
60f11b87a2349985230c08616fa8a34ffde934c8 added code into src/common
that depends on wchar.c being available, but didn't actually make
wchar.c part of src/common, which seems like an odd decision: the
functions in the library are dependent on code that is not part of any
library but whose source files get copied around where needed. Eh?

0002 does some basic header cleanup to make it possible to include the
existing header file jsonapi.h in frontend code. The state of the JSON
headers today looks generally poor. There seems not to have been much
attempt to get the prototypes for a given source file, say foo.c, into
a header file with the same name, say foo.h. Also, dependencies
between various header files seem to be have added somewhat freely.
This patch does not come close to fixing all that, but I consider it a
modest down payment on a cleanup that probably ought to be taken
further.

0003 splits json.c into two files, json.c and jsonapi.c. All the
lexing and parsing stuff (whose prototypes are in jsonapi.h) goes into
jsonapi.c, while the stuff that pertains to the 'json' data type
remains in json.c. This also seems like a good cleanup, because to me,
at least, it's not a great idea to mix together code that is used by
both the json and jsonb data types as well as other things in the
system that want to generate or parse json together with things that
are specific to the 'json' data type.

As far as I know all three of the above patches are committable as-is;
review and contrary opinions welcome.

On the other hand, 0004, 0005, and 0006 are charitably described as
experimental or WIP.  0004 and 0005 hack up jsonapi.c so that it can
still be compiled even if #include "postgres.h" is changed to #include
"postgres-fe.h" and 0006 moves it into src/common. Note that I say
that they make it compile, not work. It's not just untested; it's
definitely broken. But it gives a feeling for what the remaining
obstacles to making this code available in a frontend environment are.
Since I wrote my very first email complaining about the difficulty of
making the backend's JSON parser work in a frontend environment, one
obstacle has been knocked down: StringInfo is now available in
front-end code (commit 26aaf97b683d6258c098859e6b1268e1f5da242f). The
remaining problems (that I know about) have to do with error reporting
and multibyte character support; a read of the patches is suggested
for those wanting further details.

Suggestions welcome.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


0001-Move-wchar.c-to-src-common.patch
Description: Binary data


0002-Adjust-src-include-utils-jsonapi.h-so-it-s-not-backe.patch
Description: Binary data


0003-Split-JSON-lexer-parser-from-json-data-type-support.patch
Description: Binary data


0004-Introduce-json_error-macro.patch
Description: Binary data


0005-Frontendify-jsonapi.c.patch
Description: Binary data


0006-Move-jsonapi.c-to-src-common.patch
Description: Binary data

Re: Corruption with duplicate primary key

2020-01-15 Thread Alex Adriaanse

On Thu., December 12, 2019 at 5:25 PM, Tomas Vondra wrote:
>On Wed, Dec 11, 2019 at 11:46:40PM +, Alex Adriaanse wrote:
>>On Thu., December 5, 2019 at 5:45 PM, Tomas Vondra wrote:
>>> At first I thought maybe this might be due to collations changing and
>>> breaking the index silently. What collation are you using?
>>
>>We're using en_US.utf8. We did not make any collation changes to my
>>knowledge.
>
>Well, the idea was more that glibc got updated and the collations
>changed because of that (without PostgreSQL having a chance to even
>notice that).

Closing the loop on this, I've investigated this some more and it turns out 
this is exactly what happened. As you suspected, the issue had nothing to do 
with pg_upgrade or PG12, but rather the glibc upgrade that was seen in Debian 
Buster. The postgres:10 and postgres:11 images are based on Debian Stretch, 
whereas postgres:12 is based on Buster.

When I kept the database on an older version of Postgres (10 or 11) but 
switched from the older Docker image to the postgres:12 or debian:buster(-slim) 
image, manually installing older Postgres packages inside those images, I saw 
index corruption there too.

Thanks for the input!

Alex

Re: aggregate crash

Hi,

On 2020-01-14 23:27:02 -0800, Andres Freund wrote:
> On 2020-01-14 17:54:16 -0500, Tom Lane wrote:
> > Andres Freund  writes:
> > > I'm still not sure I actually fully understand the bug. It's obvious how
> > > returning the input value again could lead to memory not being freed (so
> > > that leak seems to go all the way back). And similarly, since the
> > > introduction of expanded objects, it can also lead to the expanded
> > > object not being deleted.
> > > But that's not the problem causing the crash here. What I think must
> > > instead be the problem is that pergroupstate->transValueIsNull, but
> > > pergroupstate->transValue is set to something looking like a
> > > pointer. Which caused us not to datumCopy() a new transition value into
> > > a long lived context. and then a later transition causes us to free the
> > > short-lived value?
> >
> > Yeah, I was kind of wondering that too.  While formally the Datum value
> > for a null is undefined, I'm not aware offhand of any functions that
> > wouldn't return zero --- and this would have to be an aggregate transition
> > function doing so, which reduces the universe of candidates quite a lot.
> > Plus there's the question of how often a transition function would return
> > null for non-null input at all.
> >
> > Could we see a test case that provokes this crash, even if it doesn't
> > do so reliably?
>
> There's a larger reproducer referenced in the first message. I had hoped
> that Teodor could narrow it down - I guess I'll try to do that tomorrow...

FWIW, I'm working on narrowing it down to something small. I can
reliably trigger the bug, and I understand the mechanics, I
think. Interestingly enough the reproducer currently only triggers on
v12, not on v11 and before.

As you say, this requires a transition function returning a NULL that
has the datum part set - the reproducer here defines a non-strict
aggregate transition function that can indirectly do so:

CREATE FUNCTION public.state_max_bytea(st bytea, inp bytea) RETURNS bytea
LANGUAGE plpgsql
AS $$
BEGIN
 if st is null
 then
return inp;
 elseif std.func.fcinfo_data;
NullableDatum *args = fcinfo->args;
int argno;
Datum   d;

/* strict function, so check for NULL args */
for (argno = 0; argno < op->d.func.nargs; argno++)
{
if (args[argno].isnull)
{
*op->resnull = true;
goto strictfail;
}
}
fcinfo->isnull = false;
d = op->d.func.fn_addr(fcinfo);
*op->resvalue = d;
*op->resnull = fcinfo->isnull;

strictfail:
EEO_NEXT();
}

I.e. if the transitions argument is a strict function, and that strict
function is not evaluated because of a NULL input, we set op->resnull =
true, but do *not* touch op->resvalue. If there was a previous row that
actually set resvalue to something meaningful, we get an input to the
transition function consisting out of the old resvalue (!= 0), but the
new resnull = true.  If the transition function returns that unchanged,
ExecAggTransReparent() doesn't do anything, because the new value is
null. Afterwards pergroup->transValue is set != 0, even though
transValueIsNull = true.

The somewhat tricky bit is arranging this to happen with pointers that
are the same. I think I'm on the way to narrow that down, but it'll take
me a bit longer.

To fix this I think we should set newVal = 0 in
ExecAggTransReparent()'s, as a new else to !newValueIsNull. That should
not add any additional branches, I think. I contrast to always doing so
when checking whether ExecAggTransReparent() ought to be called.

Greetings,

Andres Freund

Re: src/test/recovery regression failure on bionic

2020-01-15 Thread Christoph Berg

Re: Amit Kapila 2020-01-09 

> The point is that we know what is going wrong on sidewinder on back
> branches.  However, we still don't know what is going wrong with tern
> and mandrill on v10 [1][2] where the log is:

Fwiw, the problem on bionic disappeared yesterday with the build
triggered by "Revert test added by commit d207038053".

Christoph

Re: Append with naive multiplexing of FDWs

2020-01-15 Thread Bruce Momjian

On Tue, Jan 14, 2020 at 02:37:48PM +0500, Ahsan Hadi wrote:
> Summary
> The patch is pretty good, it works well when there were little data back to
> the parent node. The patch doesn’t provide parallel FDW scan, it ensures
> that child nodes can send data to parent in parallel but  the parent can only
> sequennly process the data from data nodes.
> 
> Providing there is no performance degrdation for non FDW append queries,
> I would recomend to consider this patch as an interim soluton while we are
> waiting for parallel FDW scan.

Wow, these are very impressive results!

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: pgsql: Add basic TAP tests for psql's tab-completion logic.

2020-01-15 Thread Christoph Berg

Re: Tom Lane 2020-01-09 <16328.1578606...@sss.pgh.pa.us>
> build part.)  I pushed a patch using SKIP_READLINE_TESTS.
> Christoph should be able to set that for the Ubuntu branches
> where the test is failing.

That "fixed" the problem on xenial, thanks.

Christoph

Re: [HACKERS] Block level parallel vacuum

On Wed, 15 Jan 2020 at 19:31, Mahendra Singh Thalor  wrote:
>
> On Wed, 15 Jan 2020 at 19:04, Mahendra Singh Thalor  
> wrote:
> >
> > On Wed, 15 Jan 2020 at 17:27, Amit Kapila  wrote:
> > >
> > > On Wed, Jan 15, 2020 at 10:05 AM Masahiko Sawada
> > >  wrote:
> > > >
> > > > Thank you for updating the patch! I have a few small comments.
> > > >
> > >
> > > I have adapted all your changes, fixed the comment by Mahendra related
> > > to initializing parallel state only when there are at least two
> > > indexes.  Additionally, I have changed a few comments (make the
> > > reference to parallel vacuum consistent, at some places we were
> > > referring it as 'parallel lazy vacuum' and at other places it was
> > > 'parallel index vacuum').
> > >
> > > > The
> > > > rest looks good to me.
> > > >
> > >
> > > Okay, I think the patch is in good shape.  I am planning to read it a
> > > few more times (at least 2 times) and then probably will commit it
> > > early next week (Monday or Tuesday) unless there are any major
> > > comments.  I have already committed the API patch (4d8a8d0c73).
> > >
> >
> > Hi,
> > Thanks Amit for fixing review comments.
> >
> > I reviewed v48 patch and below are some comments.
> >
> > 1.
> > +* based on the number of indexes.  -1 indicates a parallel vacuum is
> >
> > I think, above should be like "-1 indicates that parallel vacuum is"
> >
> > 2.
> > +/* Variables for cost-based parallel vacuum  */
> >
> > At the end of comment, there is 2 spaces.  I think, it should be only 1 
> > space.
> >
> > 3.
> > I think, we should add a test case for parallel option(when degree is not 
> > specified).
> > Ex:
> > postgres=# VACUUM (PARALLEL) tmp;
> > ERROR:  parallel option requires a value between 0 and 1024
> > LINE 1: VACUUM (PARALLEL) tmp;
> > ^
> > postgres=#
> >
> > Because above error is added in this parallel patch, so we should have test 
> > case for this to increase code coverage.
> >
>
> Hi
> Below are some more review comments for v48 patch.
>
> 1.
> #include "storage/bufpage.h"
> #include "storage/lockdefs.h"
> +#include "storage/shm_toc.h"
> +#include "storage/dsm.h"
>
> Here, order of header file is not alphabetically. (storage/dsm.h
> should come before storage/lockdefs.h)
>
> 2.
> +/* No index supports parallel vacuum */
> +if (nindexes_parallel == 0)
> +return 0;
> +
> +/* The leader process takes one index */
> +nindexes_parallel--;
>
> Above code can be rearranged as:
>
> +/* The leader process takes one index */
> +nindexes_parallel--;
> +
> +/* No index supports parallel vacuum */
> +if (nindexes_parallel <= 0)
> +return 0;
>
> If we do like this, then in some cases, we can skip some calculations
> of parallel workers.
>
> --
> Thanks and Regards
> Mahendra Singh Thalor
> EnterpriseDB: http://www.enterprisedb.com

Hi,
I checked code coverage and time taken by vacuum.sql test with and
without v48 patch. Below are some findings (I ran "make check-world
-i" to get coverage.)

1.
With v45 patch, compute_parallel_delay is never called so function hit
is zero. I think, we can add some delay options into vacuum.sql test
to hit function.

2.
I checked time taken by vacuum.sql test. Execution time is almost same
with and without v45 patch.

Without v45 patch:
Run1) vacuum   ... ok 701 ms
Run2) vacuum   ... ok 549 ms
Run3) vacuum   ... ok 559 ms
Run4) vacuum   ... ok 480 ms

With v45 patch:
Run1) vacuum   ... ok 842 ms
Run2) vacuum   ... ok 808 ms
Run3)  vacuum   ... ok 774 ms
Run4) vacuum   ... ok 792 ms

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Re: our checks for read-only queries are not great

Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, Jan 15, 2020 at 10:25 AM Peter Eisentraut
>  wrote:
> > Well, if the transaction was declared read-only, then committing it
> > (directly or 2PC) shouldn't change anything.  This appears to be a
> > circular argument.
> 
> I don't think it's a circular argument. Suppose that someone decrees
> that, as of 5pm Eastern time, no more read-write transactions are
> permitted. And because the person issuing the decree has a lot of
> power, everybody obeys. Now, every pg_dump taken after that time will
> be semantically equivalent to every other pg_dump taken after that
> time, with one tiny exception. That exception is that someone could
> still do COMMIT PREPARED of a read-write transaction that was prepared
> before 5pm. If the goal of the powerful person who issued the decree
> was to make sure that the database couldn't change - e.g. so they
> could COPY each table individually without keeping a snapshot open and
> still get a consistent backup - they might fail to achieve it if, as
> of the moment of the freeze, there are some prepared write
> transactions.
> 
> I'm not saying we have to change the behavior or anything. I'm just
> saying that there seems to be one, and only one, way to make the
> apparent contents of the database change in a read-only transaction
> right now. And that's a COMMIT PREPARED of a read-write transaction.

Yeah, allowing a read-only transaction to start and then commit a
read-write transaction doesn't seem sensible.  I'd be in favor of
changing that.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: our checks for read-only queries are not great

2020-01-15 Thread Robert Haas

On Wed, Jan 15, 2020 at 10:25 AM Peter Eisentraut
 wrote:
> Well, if the transaction was declared read-only, then committing it
> (directly or 2PC) shouldn't change anything.  This appears to be a
> circular argument.

I don't think it's a circular argument. Suppose that someone decrees
that, as of 5pm Eastern time, no more read-write transactions are
permitted. And because the person issuing the decree has a lot of
power, everybody obeys. Now, every pg_dump taken after that time will
be semantically equivalent to every other pg_dump taken after that
time, with one tiny exception. That exception is that someone could
still do COMMIT PREPARED of a read-write transaction that was prepared
before 5pm. If the goal of the powerful person who issued the decree
was to make sure that the database couldn't change - e.g. so they
could COPY each table individually without keeping a snapshot open and
still get a consistent backup - they might fail to achieve it if, as
of the moment of the freeze, there are some prepared write
transactions.

I'm not saying we have to change the behavior or anything. I'm just
saying that there seems to be one, and only one, way to make the
apparent contents of the database change in a read-only transaction
right now. And that's a COMMIT PREPARED of a read-write transaction.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Making psql error out on output failures

2020-01-15 Thread David Zhang

Right, the file difference is caused by "-At". 

On the other side, in order to keep the output message more consistent with 
other tools, I did a litter bit more investigation on pg_dump to see how it 
handles this situation. Here is my findings.
pg_dump using WRITE_ERROR_EXIT to throw the error message when "(bytes_written 
!= size * nmemb)", where WRITE_ERROR_EXIT calls fatal("could not write to 
output file: %m") and then "pg_log_generic(PG_LOG_ERROR, __VA_ARGS__)". After 
ran a quick test in the same situation, I got message like below,
$ pg_dump -h localhost -p 5432  -d postgres -t psql_error -f /mnt/ramdisk/file
pg_dump: error: could not write to output file: No space left on device

If I change the error log message like below, where "%m" is used to pass the 
value of strerror(errno), "could not write to output file:" is copied from 
function "WRITE_ERROR_EXIT". 
-   pg_log_error("Error printing tuples");
+   pg_log_error("could not write to output file: %m");
then the output message is something like below, which, I believe, is more 
consistent with pg_dump.
$ psql -d postgres  -t -c "select repeat('111', 100)" -o /mnt/ramdisk/file
could not write to output file: No space left on device
$ psql -d postgres  -t -c "select repeat('111', 100)" > /mnt/ramdisk/file
could not write to output file: No space left on device

Hope the information will help.

David
---
Highgo Software Inc. (Canada)
www.highgo.ca

Re: Allow to_date() and to_timestamp() to accept localized names

2020-01-15 Thread Juan José Santamaría Flecha

On Wed, Jan 15, 2020 at 11:20 AM Arthur Zakirov  wrote:

>
> I have some couple picky notes.
>
>
Thanks for the review.


> > + if (name_len != norm_len)
> > + pfree(norm_name);
>
> I'm not sure here. Is it save to assume that if it was allocated a new
> norm_name name_len and norm_len always will differ?
>

Good call, that check can be more robust.


>
> >  static const char *const months_full[] = {
> >   "January", "February", "March", "April", "May", "June", "July",
> > - "August", "September", "October", "November", "December", NULL
> > + "August", "September", "October", "November", "December"
> >  };
>
> Unfortunately I didn't see from the patch why it was necessary to remove
> last null element from months_full, days_short, adbc_strings,
> adbc_strings_long and other arrays. I think it is not good because
> unnecessarily increases the patch and adds code like the following:
>
> > + from_char_seq_search(&value, &s, months,
> localized_names,
> > +
> ONE_UPPER, MAX_MON_LEN, n, have_error,
> > +
> lengthof(months_full));
>
> Here it passes "months" from datetime.c to from_char_seq_search() and
> calculates its length using "months_full" array from formatting.c.
>


With the introduction of seq_search_localized() that can be avoided,
minimizing code churn.

Please, find attached a version addressing the above mentioned.

Regards,

Juan José Santamaría Flecha

>
>


0001-Allow-localized-month-names-to_date-v6.patch
Description: Binary data

Re: Rearranging ALTER TABLE to avoid multi-operations bugs

Alvaro Herrera  writes:
> I didn't review in detail, but it seems good to me.  I especially liked
> getting rid of the ProcessedConstraint code, and the additional test
> cases.

Thanks for looking!

Yeah, all those test cases expose situations where we misbehave
today :-(.  I wish this were small enough to be back-patchable,
but it's not feasible.

regards, tom lane

relcache sometimes initially ignores has_generated_stored

Hi,

I think it's probably not relevant, but it confused me for a moment
that RelationBuildTupleDesc() might set constr->has_generated_stored to
true, but then throw away the constraint at the end, because nothing
matches the
/*
 * Set up constraint/default info
 */
if (has_not_null || ndef > 0 ||
attrmiss || relation->rd_rel->relchecks)
test, i.e. there are no defaults.

A quick assert confirms we do indeed pfree() constr in cases where
has_generated_stored == true.

I suspect that's just an intermediate catalog, however, e.g. when
DefineRelation() does
heap_create_with_catalog();
CommandCounterIncrement();
relation_open();
AddRelationNewConstraints().

It does still strike me as not great that we can get a different
relcache entry, even if transient, depending on whether there are other
reasons to create a TupleConstr. Say a NOT NULL column.

I'm inclined to think we should just also check has_generated_stored in
the if quoted above?

Greetings,

Andres Freund

Re: Improve errors when setting incorrect bounds for SSL protocols

2020-01-15 Thread Daniel Gustafsson

> On 15 Jan 2020, at 03:28, Michael Paquier  wrote:
> 
> On Tue, Jan 14, 2020 at 11:21:53AM +0100, Daniel Gustafsson wrote:
>> On 14 Jan 2020, at 04:54, Michael Paquier  wrote:
>>> Please note that OpenSSL 1.1.0 has added two routines to be able to
>>> get the min/max protocols set in a context, called
>>> SSL_CTX_get_min/max_proto_version.  Thinking about older versions of
>>> OpenSSL I think that it is better to use
>>> ssl_protocol_version_to_openssl to do the parsing work.  I also found
>>> that it is easier to check for compatible versions after setting both
>>> bounds in the SSL context, so as there is no need to worry about
>>> invalid values depending on the build of OpenSSL used.
>> 
>> I'm not convinced that it's a good idea to check for incompatible protocol
>> range in the OpenSSL backend.  We've spent a lot of energy to make the TLS 
>> code
>> library agnostic and pluggable, and since identifying a basic configuration
>> error isn't OpenSSL specific I think it should be in the guc code.  That 
>> would
>> keep the layering as well as ensure that we don't mistakenly treat this
>> differently should we get a second TLS backend.
> 
> Good points.  And the get routines are not that portable in OpenSSL
> either even if HEAD supports 1.0.1 and newer versions...  Attached is
> an updated patch which uses a GUC check for both parameters, and
> provides a hint on top of the original error message.  The SSL context
> does not get reloaded if there is an error, so the errors from OpenSSL
> cannot be triggered as far as I checked (after mixing a couple of
> corrent and incorrect combinations manually).

This is pretty much exactly the patch I was intending to write for this, so +1
from me.

cheers ./daniel

Re: Duplicate Workers entries in some EXPLAIN plans

2020-01-15 Thread Maciek Sakrejda

Sounds good, I'll try that format. Any idea how to test YAML? With the
JSON format, I was able to rely on Postgres' own JSON-manipulating
functions to strip or canonicalize fields that can vary across
executions--I can't really do that with YAML. Or should I run EXPLAIN
with COSTS OFF, TIMING OFF, SUMMARY OFF and assume that for simple
queries the BUFFERS output (and other fields I can't turn off like
Sort Space Used) *is* going to be stable?

Re: Rearranging ALTER TABLE to avoid multi-operations bugs

2020-01-15 Thread Alvaro Herrera

On 2020-Jan-14, Tom Lane wrote:

> I wrote:
> > [ fix-alter-table-order-of-operations-3.patch ]
> 
> Rebased again, fixing a minor conflict with f595117e2.
> 
> > I'd kind of like to get this cleared out of my queue soon.
> > Does anyone intend to review it further?
> 
> If I don't hear objections pretty darn quick, I'm going to
> go ahead and push this.

I didn't review in detail, but it seems good to me.  I especially liked
getting rid of the ProcessedConstraint code, and the additional test
cases.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: progress report for ANALYZE

2020-01-15 Thread Alvaro Herrera

I just pushed this after some small extra tweaks.

Thanks, Yamada-san, for seeing this to completion!

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Complete data erasure

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Tomas Vondra  writes:
> > But let's assume it makes sense - is this really the right solution? I
> > think what I'd prefer is encryption + rotation of the keys. Which should
> > work properly even on COW filesystems, the performance impact is kinda
> > low and amortized etc. Of course, we're discussing built-in encryption
> > for quite a bit of time.
> 
> Yeah, it seems to me that encrypted storage would solve strictly more
> cases than this proposal does, and it has fewer foot-guns.

I still view that as strictly a different solution and one that
certainly won't fit in all cases, not to mention that it's a great deal
more complicated and we're certainly no where near close to having it.

> One foot-gun that had been vaguely bothering me yesterday, but the point
> didn't quite crystallize till just now, is that the only place that we
> could implement such file zeroing is post-commit in a transaction that
> has done DROP TABLE or TRUNCATE.  Of course post-commit is an absolutely
> horrid place to be doing anything that could fail, since there's no very
> good way to recover from an error.  It's an even worse place to be doing
> anything that could take a long time --- if the user loses patience and
> kills the session, now your problems are multiplied.

This is presuming that we make this feature something that can be run in
a transaction and rolled back.  I don't think there's been any specific
expression that there is such a requirement, and you bring up good
points that show why providing that functionality would be particularly
challenging, but that isn't really an argument against this feature in
general.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Complete data erasure

Tomas Vondra  writes:
> But let's assume it makes sense - is this really the right solution? I
> think what I'd prefer is encryption + rotation of the keys. Which should
> work properly even on COW filesystems, the performance impact is kinda
> low and amortized etc. Of course, we're discussing built-in encryption
> for quite a bit of time.

Yeah, it seems to me that encrypted storage would solve strictly more
cases than this proposal does, and it has fewer foot-guns.

One foot-gun that had been vaguely bothering me yesterday, but the point
didn't quite crystallize till just now, is that the only place that we
could implement such file zeroing is post-commit in a transaction that
has done DROP TABLE or TRUNCATE.  Of course post-commit is an absolutely
horrid place to be doing anything that could fail, since there's no very
good way to recover from an error.  It's an even worse place to be doing
anything that could take a long time --- if the user loses patience and
kills the session, now your problems are multiplied.

Right now our risks in that area are confined to leaking files if
unlink() fails, which isn't great but it isn't catastrophic either.
With this proposal, erroring out post-commit becomes a security
failure, if it happens anywhere before we've finished a possibly
large amount of zero-writing.  I don't want to go there.

regards, tom lane

Re: pgindent && weirdness

2020-01-15 Thread Bruce Momjian

On Tue, Jan 14, 2020 at 05:30:21PM -0500, Tom Lane wrote:
> Alvaro Herrera  writes:
> > I just ran pgindent over some patch, and noticed that this hunk ended up
> > in my working tree:
>  
> > -   if (IsA(leftop, Var) && IsA(rightop, Const))
> > +   if (IsA(leftop, Var) &&IsA(rightop, Const))
> 
> Yeah, it's been doing that for decades.  I think the triggering
> factor is the typedef name (Var, here) preceding the &&.
> 
> It'd be nice to fix properly, but I've tended to take the path
> of least resistance by breaking such lines to avoid the ugliness:
> 
>   if (IsA(leftop, Var) &&
>   IsA(rightop, Const))

In the past I would use a post-processing step after BSD indent to fix
up these problems.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Rearranging ALTER TABLE to avoid multi-operations bugs

Sergei Kornilov  writes:
> I am clearly not a good reviewer for such changes... But for a note: I read 
> the v4 patch and have no useful comments. Good new tests, reasonable code 
> changes to fix multiple bug reports.

Thanks for looking!

> The patch is proposed only for the master branch, right?

Yes, it seems far too risky for the back branches.

regards, tom lane

Re: Complete data erasure

Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> On Wed, Jan 15, 2020 at 10:23:22AM -0500, Stephen Frost wrote:
> >I disagree entirely.  If the operating system and hardware level provide
> >a way for this to work, which is actually rather common when you
> >consider that ext4 is an awful popular filesystem where this would work
> >just fine with nearly all traditional hardware underneath it, then we're
> >just blocking enabling this capability for no justifiably reason.
> 
> Not sure. I agree the goal (securely discarding data) is certainly
> worthwile, although I suspect it's just of many things you'd need to
> care about. That is, there's probably a million other things you'd need
> to worry about (logs, WAL, CSV files, temp files, ...), so with actually
> sensitive data I'd expect people to just dump/load the data into a clean
> system and rebuild the old one (zero drives, ...).

Of course there's other things that one would need to worry about, but
saying this isn't useful because PG isn't also scanning your entire
infrastructure for CSV files that have this sensitive information isn't
a sensible argument.

I agree that there are different levels of sensitive data- and for many,
many cases what is being proposed here will work just fine, even if that
level of sensitive data isn't considered "actually sensitive data" by
other people.

> But let's assume it makes sense - is this really the right solution? I
> think what I'd prefer is encryption + rotation of the keys. Which should
> work properly even on COW filesystems, the performance impact is kinda
> low and amortized etc. Of course, we're discussing built-in encryption
> for quite a bit of time.

In some cases that may make sense as an alternative, in other situations
it doesn't.  In other words, I disagree that what you're proposing is
just a different implementation of the same thing (which I'd be happy to
discuss), it's an entirely different feature with different trade-offs.

I'm certainly on-board with the idea of having table level and column
level encryption to facilitate an approach like this, but I disagree
strongly that we shouldn't have this simple "just overwrite the data a
few times" because we might, one day, have useful in-core encryption or
even that, even if we had it, that we should tell everyone to use that
instead of providing this capability.  I don't uninstall 'shread' when I
install 'gpg' on my system.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Complete data erasure

2020-01-15 Thread Tomas Vondra

On Wed, Jan 15, 2020 at 10:23:22AM -0500, Stephen Frost wrote:

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:

"asaba.takan...@fujitsu.com"  writes:
> I want to add the feature to erase data so that it cannot be restored
> because it prevents attackers from stealing data from released data area.

I think this is fairly pointless, unfortunately.

I disagree- it's a feature that's been asked for multiple times and does
have value in some situations.

Dropping or truncating tables is as much as we can do without making
unwarranted assumptions about the filesystem's behavior.  You say
you want to zero out the files first, but what will that accomplish
on copy-on-write filesystems?

What about filesystems which are not copy-on-write though?

Even granting that zeroing our storage files is worth something,
it's not worth much if there are more copies of the data elsewhere.

Backups are not our responsibility to worry about, or, at least, it'd be
an independent feature if we wanted to add something like this to
pg_basebackup, and not something the initial feature would need to worry
about.

And any well-run database is going to have backups, or standby servers,
or both.  There's no way for the primary node to force standbys to erase
themselves (and it'd be a different sort of security hazard if there
were).

A user can't "force" PG to do anything more than we can "force" a
replica to do something, but a user can issue a request to a primary and
that primary can then pass that request along to the replica as part of
the WAL stream.

Also to the point: if your assumption is that an attacker has access
to the storage medium at a sufficiently low level that they can examine
previously-deleted files, what's stopping them from reading the data
*before* it's deleted?

This argument certainly doesn't make any sense- who said they had access
to the storage medium at a time before the files were deleted?  What if
they only had access after the files were zero'd?  When you consider the
lifetime of a storage medium, it's certainly a great deal longer than
the length of time that a given piece of sensitive data might reside on
it.

So I think doing anything useful in this area is a bit outside
Postgres' remit.  You'd need to be thinking at an operating-system
or hardware level.

I disagree entirely.  If the operating system and hardware level provide
a way for this to work, which is actually rather common when you
consider that ext4 is an awful popular filesystem where this would work
just fine with nearly all traditional hardware underneath it, then we're
just blocking enabling this capability for no justifiably reason.

Not sure. I agree the goal (securely discarding data) is certainly
worthwile, although I suspect it's just of many things you'd need to
care about. That is, there's probably a million other things you'd need
to worry about (logs, WAL, CSV files, temp files, ...), so with actually
sensitive data I'd expect people to just dump/load the data into a clean
system and rebuild the old one (zero drives, ...).

But let's assume it makes sense - is this really the right solution? I
think what I'd prefer is encryption + rotation of the keys. Which should
work properly even on COW filesystems, the performance impact is kinda
low and amortized etc. Of course, we're discussing built-in encryption
for quite a bit of time.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Rearranging ALTER TABLE to avoid multi-operations bugs

2020-01-15 Thread Sergei Kornilov

Hello

Thank you!

I am clearly not a good reviewer for such changes... But for a note: I read the 
v4 patch and have no useful comments. Good new tests, reasonable code changes 
to fix multiple bug reports.

The patch is proposed only for the master branch, right?

regards, Sergei

Re: Extracting only the columns needed for a query

2020-01-15 Thread Dmitry Dolgov

> On Thu, Jan 02, 2020 at 05:21:55PM -0800, Melanie Plageman wrote:
>
> That makes me think that maybe the function name,
> extract_used_columns() is bad, though. Maybe extract_scan_columns()?
> I tried this in the attached, updated patch.

Thanks, I'll take a look at the new version. Following your explanation
extract_scan_columns sounds better, but at the same time scan is pretty
broad term and one can probably misunderstand what kind of scan is that
in the name.

> For UPDATE, we need all of the columns in the table because of the
> table_lock() API's current expectation that the slot has all of the
> columns populated. If we want UPDATE to only need to insert the column
> values which have changed, table_tuple_lock() will have to change.

Can you please elaborate on this part? I'm probably missing something,
since I don't see immediately where are those expectations expressed.

> > AM callback relation_estimate_size exists currently which planner leverages.
> > Via this callback it fetches tuples, pages, etc.. So, our thought is to 
> > extend
> > this API if possible to pass down needed column and help perform better 
> > costing
> > for the query. Though we think if wish to leverage this function, need to 
> > know
> > list of columns before planning hence might need to use query tree.
>
> I believe it would be beneficial to add this potential API extension patch 
> into
> the thread (as an example of an interface defining how scanCols could be used)
> and review them together.

I still find this question from my previous email interesting to explore.

Re: our checks for read-only queries are not great


On 2020-01-13 20:14, Robert Haas wrote:

On Mon, Jan 13, 2020 at 5:57 AM Peter Eisentraut
 wrote:

On 2020-01-10 14:41, Robert Haas wrote:

This rule very nearly matches the current behavior: it explains why
temp table operations are allowed, and why ALTER SYSTEM is allowed,
and why REINDEX etc. are allowed. However, there's a notable
exception: PREPARE, COMMIT PREPARED, and ROLLBACK PREPARED are allowed
in a read-only transaction. Under the "doesn't change pg_dump output"
criteria, the first and third ones should be permitted but COMMIT
PREPARED should be denied, except maybe if the prepared transaction
didn't do any writes (and in that case, why did we bother preparing
it?). Despite that, this rule does a way better job explaining the
current behavior than anything else suggested so far.


I don't follow.  Does pg_dump dump prepared transactions?


No, but committing one changes the database contents as seen by a
subsequent pg_dump.


Well, if the transaction was declared read-only, then committing it 
(directly or 2PC) shouldn't change anything.  This appears to be a 
circular argument.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Complete data erasure

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> "asaba.takan...@fujitsu.com"  writes:
> > I want to add the feature to erase data so that it cannot be restored 
> > because it prevents attackers from stealing data from released data area.
> 
> I think this is fairly pointless, unfortunately.

I disagree- it's a feature that's been asked for multiple times and does
have value in some situations.

> Dropping or truncating tables is as much as we can do without making
> unwarranted assumptions about the filesystem's behavior.  You say
> you want to zero out the files first, but what will that accomplish
> on copy-on-write filesystems?

What about filesystems which are not copy-on-write though?

> Even granting that zeroing our storage files is worth something,
> it's not worth much if there are more copies of the data elsewhere.

Backups are not our responsibility to worry about, or, at least, it'd be
an independent feature if we wanted to add something like this to
pg_basebackup, and not something the initial feature would need to worry
about.

> And any well-run database is going to have backups, or standby servers,
> or both.  There's no way for the primary node to force standbys to erase
> themselves (and it'd be a different sort of security hazard if there
> were).

A user can't "force" PG to do anything more than we can "force" a
replica to do something, but a user can issue a request to a primary and
that primary can then pass that request along to the replica as part of
the WAL stream.

> Also to the point: if your assumption is that an attacker has access
> to the storage medium at a sufficiently low level that they can examine
> previously-deleted files, what's stopping them from reading the data
> *before* it's deleted?

This argument certainly doesn't make any sense- who said they had access
to the storage medium at a time before the files were deleted?  What if
they only had access after the files were zero'd?  When you consider the
lifetime of a storage medium, it's certainly a great deal longer than
the length of time that a given piece of sensitive data might reside on
it.

> So I think doing anything useful in this area is a bit outside
> Postgres' remit.  You'd need to be thinking at an operating-system
> or hardware level.

I disagree entirely.  If the operating system and hardware level provide
a way for this to work, which is actually rather common when you
consider that ext4 is an awful popular filesystem where this would work
just fine with nearly all traditional hardware underneath it, then we're
just blocking enabling this capability for no justifiably reason.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: base backup client as auxiliary backend process


On 2020-01-09 11:57, Alexandra Wang wrote:

Back to the base backup stuff, I don't quite understand all the benefits you
mentioned above. It seems to me the greatest benefit with this patch is that
postmaster takes care of pg_basebackup itself, which reduces the human 
wait in
between running the pg_basebackup and pg_ctl/postgres commands. Is that 
right?

I personally don't mind the --write-recovery-conf option because it helps me
write the primary_conninfo and primary_slot_name gucs into
postgresql.auto.conf, which to me as a developer is easier than editing
postgres.conf without automation.  Sorry about the dumb question but 
what's so

bad about --write-recovery-conf?


Making it easier to automate is one major appeal of my proposal.  The 
current way of setting up a standby is very difficult to automate correctly.



Are you planning to completely replace
pg_basebackup with this? Is there any use case that a user just need a
basebackup but not immediately start the backend process?


I'm not planning to replace or change pg_basebackup.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: base backup client as auxiliary backend process


On 2020-01-15 01:40, Masahiko Sawada wrote:

Could you rebase the main patch that adds base backup client as
auxiliary backend process since the previous version patch (v3)
conflicts with the current HEAD?


attached

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 892ba431956c7d936555f758efc874f58b3679e8 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 15 Jan 2020 16:15:06 +0100
Subject: [PATCH v4] Base backup client as auxiliary backend process

Discussion: 
https://www.postgresql.org/message-id/flat/61b8d18d-c922-ac99-b990-a31ba63cd...@2ndquadrant.com
---
 doc/src/sgml/protocol.sgml|  12 +-
 doc/src/sgml/ref/initdb.sgml  |  17 +
 src/backend/access/transam/xlog.c | 102 +++--
 src/backend/bootstrap/bootstrap.c |   9 +
 src/backend/postmaster/pgstat.c   |   6 +
 src/backend/postmaster/postmaster.c   | 114 -
 src/backend/replication/basebackup.c  |  70 +++
 .../libpqwalreceiver/libpqwalreceiver.c   | 400 ++
 src/backend/replication/repl_gram.y   |   9 +-
 src/backend/replication/repl_scanner.l|   1 +
 src/backend/storage/file/fd.c |  36 +-
 src/bin/initdb/initdb.c   |  39 +-
 src/bin/pg_resetwal/pg_resetwal.c |   6 +-
 src/include/access/xlog.h |   8 +-
 src/include/miscadmin.h   |   2 +
 src/include/pgstat.h  |   1 +
 src/include/replication/basebackup.h  |   2 +
 src/include/replication/walreceiver.h |   4 +
 src/include/storage/fd.h  |   2 +-
 src/include/utils/guc.h   |   2 +-
 src/test/recovery/t/018_basebackup.pl |  29 ++
 21 files changed, 783 insertions(+), 88 deletions(-)
 create mode 100644 src/test/recovery/t/018_basebackup.pl

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 80275215e0..f54b820edf 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2466,7 +2466,7 @@ Streaming Replication Protocol
   
 
   
-BASE_BACKUP [ LABEL 
'label' ] [ PROGRESS ] [ 
FAST ] [ WAL ] [ 
NOWAIT ] [ MAX_RATE 
rate ] [ TABLESPACE_MAP ] [ 
NOVERIFY_CHECKSUMS ]
+BASE_BACKUP [ LABEL 
'label' ] [ PROGRESS ] [ 
FAST ] [ WAL ] [ 
NOWAIT ] [ MAX_RATE 
rate ] [ TABLESPACE_MAP ] [ 
NOVERIFY_CHECKSUMS ] [ EXCLUDE_CONF ]
  BASE_BACKUP
 
 
@@ -2576,6 +2576,16 @@ Streaming Replication Protocol
  
 

+
+   
+EXCLUDE_CONF
+
+ 
+  Do not copy configuration files, that is, files that end in
+  .conf.
+ 
+
+   
   
  
  
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index da5c8f5307..1261e02d59 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -286,6 +286,23 @@ Options
   
  
 
+ 
+  -r
+  --replica
+  
+   
+Initialize a data directory for a physical replication replica.  The
+data directory will not be initialized with a full database system,
+but will instead only contain a minimal set of files.  A server that
+is started on this data directory will first fetch a base backup and
+then switch to standby mode.  The connection information for the base
+backup has to be configured by setting , and other parameters as desired,
+before the server is started.
+   
+  
+ 
+
  
   -S
   --sync-only
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 7f4f784c0e..36c6cdef82 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -903,8 +903,6 @@ static void CheckRecoveryConsistency(void);
 static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader,

XLogRecPtr RecPtr, int whichChkpt, bool report);
 static bool rescanLatestTimeLine(void);
-static void WriteControlFile(void);
-static void ReadControlFile(void);
 static char *str_time(pg_time_t tnow);
 static bool CheckForStandbyTrigger(void);
 
@@ -4494,7 +4492,7 @@ rescanLatestTimeLine(void)
  * ReadControlFile() verifies they are correct.  We could split out the
  * I/O and compatibility-check functions, but there seems no need currently.
  */
-static void
+void
 WriteControlFile(void)
 {
int fd;
@@ -4585,7 +4583,7 @@ WriteControlFile(void)
XLOG_CONTROL_FILE)));
 }
 
-static void
+void
 ReadControlFile(void)
 {
pg_crc32c   crc;
@@ -5075,6 +5073,41 @@ XLOGShmemInit(void)
InitSharedLatch(&XLogCtl->recoveryWakeupLatch);
 }
 
+void
+InitControlFile(uint64 sysidentifier)
+{
+   charmock_auth_nonce[MO

Re: Remove libpq.rc, use win32ver.rc for libpq


On 2020-01-15 07:44, Michael Paquier wrote:

I have been testing and checking the patch a bit more seriously, and
the information gets generated correctly for dlls and exe files.  The
rest of the changes look fine to me.  For src/makefiles/Makefile.win32,
I don't have a MinGW environment at hand so I have not directly
tested but the logic looks fine.


I have tested MinGW.

Patch committed.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: TRUNCATE on foreign tables

2020-01-15 Thread Kohei KaiGai

2020年1月15日(水) 17:11 Michael Paquier :
>
> On Tue, Jan 14, 2020 at 06:16:17PM +0900, Kohei KaiGai wrote:
> > The "frels_list" is a list of foreign tables that are connected to a 
> > particular
> > foreign server, thus, the server-id pulled out by foreign tables id should 
> > be
> > identical for all the relations in the list.
> > Due to the API design, this callback shall be invoked for each foreign 
> > server
> > involved in the TRUNCATE command, not per table basis.
> >
> > The 2nd and 3rd arguments also informs FDW driver other options of the
> > command. If FDW has a concept of "cascaded truncate" or "restart sequence",
> > it can adjust its remote query. In postgres_fdw, it follows the manner of
> > usual TRUNCATE command.
>
> I have done a quick read through the patch.  You have modified the
> patch to pass down to the callback a list of relation OIDs to execute
> one command for all, and there are tests for FKs so that coverage
> looks fine.
>
> Regression tests are failing with this patch:
>  -- TRUNCATE doesn't work on foreign tables, either directly or
>  recursively
>  TRUNCATE ft2;  -- ERROR
> -ERROR:  "ft2" is not a table
> +ERROR:  foreign-data wrapper "dummy" has no handler
> You visibly just need to update the output because no handlers are
> available for truncate in this case.
>
What error message is better in this case? It does not print "ft2" anywhere,
so user may not notice that "ft2" is the source of the error.
How about 'foreign table "ft2" does not support truncate' ?

> +void
> +deparseTruncateSql(StringInfo buf, Relation rel)
> +{
> +   deparseRelation(buf, rel);
> +}
> Don't see much point in having this routine.
>
deparseRelation() is a static function in postgres_fdw/deparse.c
On the other hand, it may be better to move entire logic to construct
remote TRUNCATE command in the deparse.c side like other commands.

> + If FDW does not provide this callback, PostgreSQL considers
> + TRUNCATE is not supported on the foreign table.
> +
> This sentence is weird.  Perhaps you meant "as not supported"?
>
Yes.
If FDW does not provide this callback, PostgreSQL performs as if TRUNCATE
is not supported on the foreign table.

> + frels_list is a list of foreign tables that are
> + connected to a particular foreign server; thus, these foreign tables
> + should have identical foreign server ID
> The list is built by the backend code, so that has to be true.
>
> +   foreach (lc, frels_list)
> +   {
> +   Relationfrel = lfirst(lc);
> +   Oid frel_oid = RelationGetRelid(frel);
> +
> +   if (server_id == GetForeignServerIdByRelId(frel_oid))
> +   {
> +   frels_list = foreach_delete_current(frels_list, lc);
> +   curr_frels = lappend(curr_frels, frel);
> +   }
> +   }
> Wouldn't it be better to fill in a hash table for each server with a
> list of relations?
>
It's just a matter of preference. A temporary hash-table with server-id
and list of foreign-tables is an idea. Let me try to revise.

> +typedef void (*ExecForeignTruncate_function) (List *frels_list,
> + bool is_cascade,
> + bool restart_seqs);
> I would recommend to pass down directly DropBehavior instead of a
> boolean to the callback.  That's more extensible.
>
Ok,

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: remove support for old Python versions


Rémi,

please update your build farm member "locust" to a new Python version 
(>=2.6) for the master branch, or disable the Python option.


Thanks.

On 2020-01-08 23:04, Peter Eisentraut wrote:

On 2019-12-31 10:46, Peter Eisentraut wrote:

On 2019-12-09 11:37, Peter Eisentraut wrote:

Per discussion in [0], here is a patch set to remove support for Python
versions older than 2.6.


[0]:
https://www.postgresql.org/message-id/6d3b7b69-0970-4d40-671a-268c46e93...@2ndquadrant.com


It appears that the removal of old OpenSSL support is stalled or
abandoned for now, so this patch set is on hold for now as well, as far
as I'm concerned.  I have committed the change of the Python exception
syntax in the documentation, though.


Since the OpenSSL patch went ahead, I have now committed this one as well.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Block level parallel vacuum

On Wed, 15 Jan 2020 at 19:04, Mahendra Singh Thalor  wrote:
>
> On Wed, 15 Jan 2020 at 17:27, Amit Kapila  wrote:
> >
> > On Wed, Jan 15, 2020 at 10:05 AM Masahiko Sawada
> >  wrote:
> > >
> > > Thank you for updating the patch! I have a few small comments.
> > >
> >
> > I have adapted all your changes, fixed the comment by Mahendra related
> > to initializing parallel state only when there are at least two
> > indexes.  Additionally, I have changed a few comments (make the
> > reference to parallel vacuum consistent, at some places we were
> > referring it as 'parallel lazy vacuum' and at other places it was
> > 'parallel index vacuum').
> >
> > > The
> > > rest looks good to me.
> > >
> >
> > Okay, I think the patch is in good shape.  I am planning to read it a
> > few more times (at least 2 times) and then probably will commit it
> > early next week (Monday or Tuesday) unless there are any major
> > comments.  I have already committed the API patch (4d8a8d0c73).
> >
>
> Hi,
> Thanks Amit for fixing review comments.
>
> I reviewed v48 patch and below are some comments.
>
> 1.
> +* based on the number of indexes.  -1 indicates a parallel vacuum is
>
> I think, above should be like "-1 indicates that parallel vacuum is"
>
> 2.
> +/* Variables for cost-based parallel vacuum  */
>
> At the end of comment, there is 2 spaces.  I think, it should be only 1 space.
>
> 3.
> I think, we should add a test case for parallel option(when degree is not 
> specified).
> Ex:
> postgres=# VACUUM (PARALLEL) tmp;
> ERROR:  parallel option requires a value between 0 and 1024
> LINE 1: VACUUM (PARALLEL) tmp;
> ^
> postgres=#
>
> Because above error is added in this parallel patch, so we should have test 
> case for this to increase code coverage.
>

Hi
Below are some more review comments for v48 patch.

1.
#include "storage/bufpage.h"
#include "storage/lockdefs.h"
+#include "storage/shm_toc.h"
+#include "storage/dsm.h"

Here, order of header file is not alphabetically. (storage/dsm.h
should come before storage/lockdefs.h)

2.
+/* No index supports parallel vacuum */
+if (nindexes_parallel == 0)
+return 0;
+
+/* The leader process takes one index */
+nindexes_parallel--;

Above code can be rearranged as:

+/* The leader process takes one index */
+nindexes_parallel--;
+
+/* No index supports parallel vacuum */
+if (nindexes_parallel <= 0)
+return 0;

If we do like this, then in some cases, we can skip some calculations
of parallel workers.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

On Wed, 15 Jan 2020 at 17:27, Amit Kapila  wrote:
>
> On Wed, Jan 15, 2020 at 10:05 AM Masahiko Sawada
>  wrote:
> >
> > Thank you for updating the patch! I have a few small comments.
> >
>
> I have adapted all your changes, fixed the comment by Mahendra related
> to initializing parallel state only when there are at least two
> indexes.  Additionally, I have changed a few comments (make the
> reference to parallel vacuum consistent, at some places we were
> referring it as 'parallel lazy vacuum' and at other places it was
> 'parallel index vacuum').
>
> > The
> > rest looks good to me.
> >
>
> Okay, I think the patch is in good shape.  I am planning to read it a
> few more times (at least 2 times) and then probably will commit it
> early next week (Monday or Tuesday) unless there are any major
> comments.  I have already committed the API patch (4d8a8d0c73).
>

Hi,
Thanks Amit for fixing review comments.

I reviewed v48 patch and below are some comments.

1.
+* based on the number of indexes.  -1 indicates a parallel vacuum is

I think, above should be like "-1 indicates that parallel vacuum is"

2.
+/* Variables for cost-based parallel vacuum  */

At the end of comment, there is 2 spaces.  I think, it should be only 1
space.

3.
I think, we should add a test case for parallel option(when degree is not
specified).
*Ex:*
postgres=# VACUUM (PARALLEL) tmp;
ERROR:  parallel option requires a value between 0 and 1024
LINE 1: VACUUM (PARALLEL) tmp;
^
postgres=#

Because above error is added in this parallel patch, so we should have test
case for this to increase code coverage.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

RE: Index Skip Scan

2020-01-15 Thread Floris Van Nee

Hi all,

I reviewed the latest version of the patch. Overall some good improvements I 
think. Please find my feedback below.

- I think I mentioned this before - it's not that big of a deal, but it just 
looks weird and inconsistent to me:
create table t2 as (select a, b, c, 10 d from generate_series(1,5) a, 
generate_series(1,100) b, generate_series(1,1) c); create index on t2 
(a,b,c desc);

postgres=# explain select distinct on (a,b) a,b,c from t2 where a=2 and b>=5 
and b<=5 order by a,b,c desc;
   QUERY PLAN   
 
-
 Index Only Scan using t2_a_b_c_idx on t2  (cost=0.43..216.25 rows=500 width=12)
   Skip scan: true
   Index Cond: ((a = 2) AND (b >= 5) AND (b <= 5))
(3 rows)

postgres=# explain select distinct on (a,b) a,b,c from t2 where a=2 and b=5 
order by a,b,c desc;
   QUERY PLAN   
 
-
 Unique  (cost=0.43..8361.56 rows=500 width=12)
   ->  Index Only Scan using t2_a_b_c_idx on t2  (cost=0.43..8361.56 rows=9807 
width=12)
 Index Cond: ((a = 2) AND (b = 5))
(3 rows)

When doing a distinct on (params) and having equality conditions for all 
params, it falls back to the regular index scan even though there's no reason 
not to use the skip scan here. It's much faster to write b between 5 and 5 now 
rather than writing b=5. I understand this was a limitation of the unique-keys 
patch at the moment which could be addressed in the future. I think for the 
sake of consistency it would make sense if this eventually gets addressed.

- nodeIndexScan.c, line 126
This sets xs_want_itup to true in all cases (even for non skip-scans). I don't 
think this is acceptable given the side-effects this has (page will never be 
unpinned in between returned tuples in _bt_drop_lock_and_maybe_pin)

- nbsearch.c, _bt_skip, line 1440
_bt_update_skip_scankeys(scan, indexRel); This function is called twice now - 
once in the else {} and immediately after that outside of the else. The second 
call can be removed I think.

- nbtsearch.c _bt_skip line 1597
LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
scan->xs_itup = (IndexTuple) PageGetItem(page, 
itemid);

This is an UNLOCK followed by a read of the unlocked page. That looks incorrect?

- nbtsearch.c _bt_skip line 1440
if (BTScanPosIsValid(so->currPos) &&
_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, 
dir))

Is it allowed to look at the high key / low key of the page without have a read 
lock on it?

- nbtsearch.c line 1634
if (_bt_readpage(scan, indexdir, offnum))  ...
else
 error()

Is it really guaranteed that a match can be found on the page itself? Isn't it 
possible that an extra index condition, not part of the scan key, makes none of 
the keys on the page match?

- nbtsearch.c in general
Most of the code seems to rely quite heavily on the fact that xs_want_itup 
forces _bt_drop_lock_and_maybe_pin to never release the buffer pin. Have you 
considered that compacting of a page may still happen even if you hold the pin? 
[1] I've been trying to come up with cases in which this may break the patch, 
but I haven't able to produce such a scenario - so it may be fine. But it would 
be good to consider again. One thing I was thinking of was a scenario where 
page splits and/or compacting would happen in between returning tuples. Could 
this break the _bt_scankey_within_page check such that it thinks the scan key 
is within the current page, while it actually isn't? Mainly for backward and/or 
cursor scans. Forward scans shouldn't be a problem I think. Apologies for being 
a bit vague as I don't have a clear example ready when it would go wrong. It 
may well be fine, but it was one of the things on my mind.

[1] https://postgrespro.com/list/id/1566683972147.11...@optiver.com

-Floris

Re: [Proposal] Add accumulated statistics for wait event

2020-01-15 Thread Imai Yoshikazu

On 2020/01/13 4:11, Pavel Stehule wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   not tested
> Documentation:tested, passed
> 
> I like this patch, because I used similar functionality some years ago very 
> successfully. The implementation is almost simple, and the result should be 
> valid by used method.

Thanks for your review!

> The potential problem is performance impact. Very early test show impact cca 
> 3% worst case, but I'll try to repeat these tests.

Yes, performance impact is the main concern. I want to know how it 
affects performance in various test cases or on various environments.

> There are some ending whitespaces and useless tabs.
> 
> The new status of this patch is: Waiting on Author
I attach v4 patches removing those extra whitespaces of the end of lines 
and useless tabs.

--
Yoshikazu Imai
From b009b1f8f6be47ae61b5e4538e2730d721ee60db Mon Sep 17 00:00:00 2001
From: "imai.yoshikazu" 
Date: Wed, 15 Jan 2020 09:13:19 +
Subject: [PATCH v4 1/2] Add pg_stat_waitaccum view.

pg_stat_waitaccum shows counts and duration of each wait events.
Each backend/backgrounds counts and measures the time of wait event
in every pgstat_report_wait_start and pgstat_report_wait_end. They
store those info into their local variables and send to Statistics
Collector. We can get those info via Statistics Collector.

For reducing overhead, I implemented statistic hash instead of
dynamic hash. I also implemented track_wait_timing which
determines wait event duration is collected or not.

On windows, this function might be not worked correctly, because
now it initializes local variables in pg_stat_init which is not
passed to fork processes on windows.
---
 src/backend/catalog/system_views.sql  |   8 +
 src/backend/postmaster/pgstat.c   | 344 ++
 src/backend/storage/lmgr/lwlock.c |  19 ++
 src/backend/utils/adt/pgstatfuncs.c   |  80 ++
 src/backend/utils/misc/guc.c  |   9 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/catalog/pg_proc.dat   |   9 +
 src/include/pgstat.h  | 123 -
 src/include/storage/lwlock.h  |   1 +
 src/include/storage/proc.h|   1 +
 src/test/regress/expected/rules.out   |   5 +
 11 files changed, 598 insertions(+), 2 deletions(-)

diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 773edf8..80f2caa 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -957,6 +957,14 @@ CREATE VIEW pg_stat_bgwriter AS
 pg_stat_get_buf_alloc() AS buffers_alloc,
 pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
 
+CREATE VIEW pg_stat_waitaccum AS
+SELECT
+   S.wait_event_type AS wait_event_type,
+   S.wait_event AS wait_event,
+   S.calls AS calls,
+   S.times AS times
+   FROM pg_stat_get_waitaccum(NULL) AS S;
+
 CREATE VIEW pg_stat_progress_vacuum AS
 SELECT
 S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 51c486b..08e10ad 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -123,6 +123,7 @@
  */
 bool   pgstat_track_activities = false;
 bool   pgstat_track_counts = false;
+bool   pgstat_track_wait_timing = false;
 intpgstat_track_functions = TRACK_FUNC_OFF;
 intpgstat_track_activity_query_size = 1024;
 
@@ -153,6 +154,10 @@ static time_t last_pgstat_start_time;
 
 static bool pgStatRunningInCollector = false;
 
+WAHash *wa_hash;
+
+instr_time waitStart;
+
 /*
  * Structures in which backends store per-table info that's waiting to be
  * sent to the collector.
@@ -255,6 +260,7 @@ static int  localNumBackends = 0;
  */
 static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
+static PgStat_WaitAccumStats waitAccumStats;
 
 /*
  * List of OIDs of databases we need to write out.  If an entry is InvalidOid,
@@ -280,6 +286,8 @@ static pid_t pgstat_forkexec(void);
 #endif
 
 NON_EXEC_STATIC void PgstatCollectorMain(int argc, char *argv[]) 
pg_attribute_noreturn();
+static void pgstat_init_waitaccum_hash(WAHash **hash);
+static PgStat_WaitAccumEntry *pgstat_add_wa_entry(WAHash *hash, uint32 key);
 static void pgstat_beshutdown_hook(int code, Datum arg);
 
 static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
@@ -287,8 +295,11 @@ static PgStat_StatTabEntry 
*pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,

 Oid tableoid, bool create);
 static void pgstat_write_s

Re: [Proposal] Global temporary tables

2020-01-15 Thread 曾文旌(义从)




> 2020年1月13日 下午4:08，Konstantin Knizhnik  写道：
> 
> 
> 
> On 12.01.2020 4:51, Tomas Vondra wrote:
>> On Fri, Jan 10, 2020 at 11:47:42AM +0300, Konstantin Knizhnik wrote:
>>> 
>>> 
>>> On 09.01.2020 19:48, Tomas Vondra wrote:
 
> The most complex and challenged task is to support GTT for all kind of 
> indexes. Unfortunately I can not proposed some good universal solution 
> for it.
> Just patching all existed indexes implementation seems to be the only 
> choice.
> 
 
 I haven't looked at the indexing issue closely, but IMO we need to
 ensure that every session sees/uses only indexes on GTT that were
 defined before the seesion started using the table.
>>> 
>>> Why? It contradicts with behavior of normal tables.
>>> Assume that you have active clients and at some point of time DBA 
>>> recognizes that them are spending to much time in scanning some GTT.
>>> It cab create index for this GTT but if existed client will not be able to 
>>> use this index, then we need somehow make this clients to restart their 
>>> sessions?
>>> In my patch I have implemented building indexes for GTT on demand: if 
>>> accessed index on GTT is not yet initialized, then it is filled with local 
>>> data.
>> 
>> Yes, I know the behavior would be different from behavior for regular
>> tables. And yes, it would not allow fixing slow queries in sessions
>> without interrupting those sessions.
>> 
>> I proposed just ignoring those new indexes because it seems much simpler
>> than alternative solutions that I can think of, and it's not like those
>> other solutions don't have other issues.
> 
> Quit opposite: prohibiting sessions to see indexes created before session 
> start to use GTT requires more efforts. We need to somehow maintain and check 
> GTT first access time.
> 
>> 
>> For example, I've looked at the "on demand" building as implemented in
>> global_private_temp-8.patch, I kinda doubt adding a bunch of index build
>> calls into various places in index code seems somewht suspicious.
> 
> We in any case has to initialize GTT indexes on demand even if we prohibit 
> usages of indexes created after first access by session to GTT.
> So the difference is only in one thing: should we just initialize empty index 
> or populate it with local data (if rules for index usability are the same for 
> GTT as for normal tables).
> From implementation point of view there is no big difference. Actually 
> building index in standard way is even simpler than constructing empty index. 
> Originally I have implemented
> first approach (I just forgot to consider case when GTT was already user by a 
> session). Then I rewrited it using second approach and patch even became 
> simpler.
> 
>> 
>> * brinbuild is added to brinRevmapInitialize, which is meant to
>>   initialize state for scanning. It seems wrong to build the index we're
>>   scanning from this function (layering and all that).
>> 
>> * btbuild is called from _bt_getbuf. That seems a bit ... suspicious?
> 
> 
> As I already mentioned - support of indexes for GTT is one of the most 
> challenged things in my patch.
> I didn't find good and universal solution. So I agreed that call of btbuild 
> from _bt_getbuf may be considered as suspicious.
> I will be pleased if you or sombody else can propose better elternative and 
> not only for B-Tree, but for all other indexes.
> 
> But as I already wrote above, prohibiting session to used indexes created 
> after first access to GTT doesn't solve the problem.
> For normal tables (and for local temp tables) indexes are initialized at the 
> time of their creation.
> With GTT it doesn't work, because each session has its own local data of GTT.
> We should either initialize/build index on demand (when it is first 
> accessed), either at the moment of session start initialize indexes for all 
> existed GTTs.
> Last options seem to be much worser from my point of view: there may me huge 
> number of GTT and session may not need to access GTT at all.
>> 
>> ... and so on for other index types. Also, what about custom indexes
>> implemented in extensions? It seems a bit strange each of them has to
>> support this separately.
> 
> I have already complained about it: my patch supports GTT for all built-in 
> indexes, but custom indexes has to handle it themselves.
> Looks like to provide some generic solution we need to extend index API, 
> providing two diffrent operations: creation and initialization.
> But extending index API is very critical change... And also it doesn't solve 
> the problem with all existed extensions: them in any case have
> to be rewritten to implement new API version in order to support GTT.
>> 
>> IMHO if this really is the right solution, we need to make it work for
>> existing indexes without having to tweak them individually. Why don't we
>> track a flag whether an index on GTT was initialized in a given session,
>> and if it was not then call the build function before calling an

Re: Add pg_file_sync() to adminpack

2020-01-15 Thread Atsushi Torikoshi

On Wed, Jan 15, 2020 at 1:49 Julien Rouhaud :

> Actually, pg_write_server_files has enough privileges to execute those
> functions anywhere on the FS as far as C code is concerned, provided
> that the user running postgres daemon is allowed to (see
> convert_and_check_filename), but won't be allowed to do so by default
> as it won't have EXECUTE privilege on the functions.
>

I see, thanks for your explanation.

--
Regards,
Atsushi Torikoshi

Re: [HACKERS] Block level parallel vacuum

On Wed, Jan 15, 2020 at 10:05 AM Masahiko Sawada
 wrote:
>
> Thank you for updating the patch! I have a few small comments.
>

I have adapted all your changes, fixed the comment by Mahendra related
to initializing parallel state only when there are at least two
indexes.  Additionally, I have changed a few comments (make the
reference to parallel vacuum consistent, at some places we were
referring it as 'parallel lazy vacuum' and at other places it was
'parallel index vacuum').

> The
> rest looks good to me.
>

Okay, I think the patch is in good shape.  I am planning to read it a
few more times (at least 2 times) and then probably will commit it
early next week (Monday or Tuesday) unless there are any major
comments.  I have already committed the API patch (4d8a8d0c73).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

v48-0001-Allow-vacuum-command-to-process-indexes-in-parallel.patch
Description: Binary data

Re: Disallow cancellation of waiting for synchronous replication

2020-01-15 Thread Maksim Milyutin


On 15.01.2020 01:53, Andres Freund wrote:


On 2020-01-12 16:18:38 +0500, Andrey Borodin wrote:

11 янв. 2020 г., в 7:34, Bruce Momjian  написал(а):

Actually, it might be worse than that.  In my reading of
RecordTransactionCommit(), we do this:

write to WAL
flush WAL (durable)
make visible to other backends
replicate
communicate to the client

I think this means we make the transaction commit visible to all
backends _before_ we replicate it, and potentially wait until we get a
replication reply to return SUCCESS to the client.

No. Data is not visible to other backend when we await sync rep.

Yea, as the relevant comment in RecordTransactionCommit() says;

 * Note that at this stage we have marked clog, but still show as 
running
 * in the procarray and continue to hold locks.
 */
if (wrote_xlog && markXidCommitted)
SyncRepWaitForLSN(XactLastRecEnd, true);


But it's worthwhile to emphasize that data at that stage actually *can*
be visible on standbys. The fact that the transaction still shows as
running via procarray, on the primary, does not influence visibility
determinations on the standby.



In common case, consistent reading in cluster (even if remote_apply is 
on) is available (and have to be) only on master node. For example, if 
random load balancer on read-only queries is established above master 
and sync replicas (while meeting remote_apply is on) it's possible to 
catch the case when preceding reads would return data that will be 
absent on subsequent ones.
Moreover, such visible commits on sync standby are not durable from the 
point of cluster view. For example, if we have two sync standbys then 
under failover we can switch master to sync standby on which waiting 
commit was not replicated but it was applied (and visible) on other 
standby. This switching is completely safe because client haven't 
received acknowledge on commit request and that transaction is in 
indeterminate state, it can be as committed so aborted depending on 
which standby will be promoted.



--
Best regards,
Maksim Milyutin

Re: [PATCH v1] pg_ls_tmpdir to show directories

2020-01-15 Thread Fabien COELHO




Hello Justin,

About this v4.

It applies cleanly.

I'm trying to think about how to get rid of the strange structure and 
hacks, and the arbitrary looking size 2 array.


Also the recursion is one step, but I'm not sure why, ISTM it could/should 
go on always?


Maybe the recursive implementation was not such a good idea, if I 
suggested it is because I did not noticed the "return next" re-entrant 
stuff, shame on me.


Looking at the code, ISTM that relying on a stack/list would be much 
cleaner and easier to understand. The code could look like:


  ls_func()
if (first_time)
  initialize description
  set next directory to visit
while (1)
   if next directory
 init/push next directory to visit as current
   read current directory
 if emty
   pop/close current directory
 elif is_a_dir and recursion allowed
   set next directory to visit
 else ...
   return next tuple
cleanup

The point is to avoid a hack around the directory_fctx array, to have only 
one place to push/init a new directory (i.e. call AllocateDir and play 
around with the memory context) instead of two, and to allow deeper 
recursion if useful.


Details : snprintf return is not checked. Maybe it should say why an 
overflow cannot occur.


"bool nulls[3] = { 0,};" -> "bool nulls[3} = { false, false, false };"?

--
Fabien.

Re: Allow to_date() and to_timestamp() to accept localized names

2020-01-15 Thread Arthur Zakirov


Hello!

On 2020/01/13 21:04, Juan José Santamaría Flecha wrote:

Please, find attached a version addressing the above mentioned.


I have some couple picky notes.


+   if (name_len != norm_len)
+   pfree(norm_name);


I'm not sure here. Is it save to assume that if it was allocated a new 
norm_name name_len and norm_len always will differ?



 static const char *const months_full[] = {
"January", "February", "March", "April", "May", "June", "July",
-   "August", "September", "October", "November", "December", NULL
+   "August", "September", "October", "November", "December"
 };


Unfortunately I didn't see from the patch why it was necessary to remove 
last null element from months_full, days_short, adbc_strings, 
adbc_strings_long and other arrays. I think it is not good because 
unnecessarily increases the patch and adds code like the following:



+   from_char_seq_search(&value, &s, months, 
localized_names,
+
ONE_UPPER, MAX_MON_LEN, n, have_error,
+
lengthof(months_full));


Here it passes "months" from datetime.c to from_char_seq_search() and 
calculates its length using "months_full" array from formatting.c.


--
Arthur

Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2020-01-15 Thread Paul Guo

I further fixed the last test failure (due to a small bug in the test, not
in code). Attached are the new patch series. Let's see the CI pipeline
result.


v9-0001-Support-node-initialization-from-backup-with-tabl.patch
Description: Binary data


v9-0003-Fix-replay-of-create-database-records-on-standby.patch
Description: Binary data


v9-0002-Tests-to-replay-create-database-operation-on-stan.patch
Description: Binary data

Re: Duplicate Workers entries in some EXPLAIN plans

2020-01-15 Thread Georgios Kokolatos

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

The current version of the patch (v2) applies cleanly and does what it says on 
the box.

> Any thoughts on what we should do with text mode output (which is
untouched right now)? The output Andres proposed above makes sense to
me, but I'd like to get more input.

Nothing to add beyond what is stated in the thread.

>  Sort Method: external merge  Disk: 4920kB
>  Buffers: shared hit=682 read=10188, temp read=1415 written=2101
>  Worker 0: actual time=130.058..130.324 rows=1324 loops=1
>Sort Method: external merge  Disk: 5880kB
>Buffers: shared hit=337 read=3489, temp read=505 written=739
>  Worker 1: actual time=130.273..130.512 rows=1297 loops=1
>Buffers: shared hit=345 read=3507, temp read=505 written=744
>Sort Method: external merge  Disk: 5920kB

This proposal seems like a fitting approach. Awaiting for v3 which
will include the text version. May I suggest a format yaml test case?
Just to make certain that no regressions occur in the future.

Thanks,

Re: proposal: type info support functions for functions that use "any" type

2020-01-15 Thread Pavel Stehule

út 14. 1. 2020 v 22:09 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> >  [ parser-support-function-with-demo-20191128.patch ]
>
> TBH, I'm still not convinced that this is a good idea.  Restricting
> the support function to only change the function's return type is
> safer than the original proposal, but it's still not terribly safe.
> If you change the support function's algorithm in any way, how do
> you know whether you've broken existing stored queries?  If the
> support function consults external resources to make its choice
> (perhaps checking the existence of a cast), where could we record
> that the query depends on the existence of that cast?  There'd be
> no visible trace of that in the query parsetree.
>

This risk is real and cannot be simply solved without more complications.

Can be solution to limit and enforce this functionality only for extensions
that be initialized from shared_preload_libraries or
local_preload_libraries?


> I'm also still not convinced that this idea allows doing anything
> that can't be done just as well with polymorphism.  It would be a
> really bad idea for the support function to be examining the values
> of the arguments (else what happens when they're not constants?).
> So all you can do is look at their types, and then it seems like
> the things you can usefully do are pretty much like polymorphism,
> i.e. select some one of the input types, or a related type such
> as an array type or element type.  If there are gaps in what you
> can express with polymorphism, I'd much rather spend effort on
> improving that facility than in adding something that is only
> accessible to advanced C coders.  (Yes, I know I've been slacking
> on reviewing [1].)
>

For my purpose critical information is type. I don't need to work with
constant, but I can imagine, so some API can be nice to work with constant
value.
Yes, I can solve lot of things by patch [1], but not all, and this patch
shorter, and almost trivial.


> Lastly, I still think that this patch doesn't begin to address
> all the places that would have to know about the feature.  There's
> a lot of places that know about polymorphism --- if this is
> polymorphism on steroids, which it is, then why don't all of those
> places need to be touched?
>

I am sorry, I don't understand  last sentence?


> On the whole I think we should reject this idea.
>
> regards, tom lane
>
> [1] https://commitfest.postgresql.org/26/1911/
>

Re: pause recovery if pitr target not reached

2020-01-15 Thread Leif Gunnar Erlandsen

> "Peter Eisentraut"  skrev 14. januar 2020 
> kl. 21:13:
> 
> On 2019-12-11 12:40, Leif Gunnar Erlandsen wrote:
>> Adding patch written for 13dev from git
>> "Michael Paquier"  skrev 1. desember 2019 kl. 03:08:
>> On Fri, Nov 22, 2019 at 11:26:59AM +, Leif Gunnar Erlandsen wrote:
> 
>> No it does not. It works well to demonstrate its purpose though.
>> And it might be to stop with FATAL would be more correct.
> 
> This is still under active discussion. Please note that the latest
> patch does not apply, so a rebase would be nice to have. I have moved
> the patch to next CF, waiting on author.
> 
> I reworked your patch a bit. I changed the outcome to be an error, as was 
> discussed. I also added
> tests and documentation. Please take a look.

Thank you, it was not unexpexted for the patch to be a little bit smaller.
Although it would have been nice to log where recover ended before reporting 
fatal error.
And since you use RECOVERY_TARGET_UNSET, RECOVERY_TARGET_IMMEDIATE also gets 
included, is this correct?


> -- Peter Eisentraut http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] WAL logging problem in 9.4.3?

Hello. I added a fix for the defect 2.

At Wed, 25 Dec 2019 16:15:21 -0800, Noah Misch  wrote in 
> === Defect 2: Forgets to skip WAL due to oversimplification in heap_create()
> 
> In ALTER TABLE cases where TryReuseIndex() avoided an index rebuild, we need
> to transfer WAL-skipped state to the new index relation.  Before v24nm, the
> new index relation skipped WAL unconditionally.  Since v24nm, the new index
> relation never skips WAL.  I've added a test to alter_table.sql that reveals
> this problem under wal_level=minimal.

The fix for this defect utilizes the mechanism that preserves relcache
entry for dropped relation.  If ATExecAddIndex can obtain such a
relcache entry for the old relation, it should hold the newness flags
and we can copy them to the new relcache entry.  I added one member
named oldRelId to the struct IndexStmt to let the function access the
relcache entry for the old index relation.

I forgot to assing version 31 to the last patch, I reused the number
for this version.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 591872bdd7b18566fe2529d20e4073900dec32fd Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 21 Nov 2019 15:28:06 +0900
Subject: [PATCH v31 1/3] Rework WAL-skipping optimization

While wal_level=minimal we omit WAL-logging for certain some
operations on relfilenodes that are created in the current
transaction. The files are fsynced at commit. The machinery
accelerates bulk-insertion operations but it fails in certain sequence
of operations and a crash just after commit may leave broken table
files.

This patch overhauls the machinery so that WAL-loggings on all
operations are omitted for such relfilenodes. This patch also
introduces a new feature that small files are emitted as a WAL record
instead of syncing. The new GUC variable wal_skip_threshold controls
the threshold.
---
 doc/src/sgml/config.sgml|  43 ++--
 doc/src/sgml/perform.sgml   |  47 +---
 src/backend/access/common/toast_internals.c |   4 +-
 src/backend/access/gist/gistutil.c  |  31 ++-
 src/backend/access/gist/gistxlog.c  |  21 ++
 src/backend/access/heap/heapam.c|  45 +---
 src/backend/access/heap/heapam_handler.c|  22 +-
 src/backend/access/heap/rewriteheap.c   |  21 +-
 src/backend/access/nbtree/nbtsort.c |  41 +---
 src/backend/access/rmgrdesc/gistdesc.c  |   5 +
 src/backend/access/transam/README   |  45 +++-
 src/backend/access/transam/xact.c   |  15 ++
 src/backend/access/transam/xloginsert.c |  10 +-
 src/backend/access/transam/xlogutils.c  |  18 +-
 src/backend/catalog/heap.c  |   4 +
 src/backend/catalog/storage.c   | 248 ++--
 src/backend/commands/cluster.c  |  12 +-
 src/backend/commands/copy.c |  58 +
 src/backend/commands/createas.c |  11 +-
 src/backend/commands/matview.c  |  12 +-
 src/backend/commands/tablecmds.c|  11 +-
 src/backend/storage/buffer/bufmgr.c | 125 +-
 src/backend/storage/lmgr/lock.c |  12 +
 src/backend/storage/smgr/md.c   |  36 ++-
 src/backend/storage/smgr/smgr.c |  35 +++
 src/backend/utils/cache/relcache.c  | 159 ++---
 src/backend/utils/misc/guc.c|  13 +
 src/include/access/gist_private.h   |   2 +
 src/include/access/gistxlog.h   |   1 +
 src/include/access/heapam.h |   3 -
 src/include/access/rewriteheap.h|   2 +-
 src/include/access/tableam.h|  15 +-
 src/include/catalog/storage.h   |   6 +
 src/include/storage/bufmgr.h|   4 +
 src/include/storage/lock.h  |   3 +
 src/include/storage/smgr.h  |   1 +
 src/include/utils/rel.h |  57 +++--
 src/include/utils/relcache.h|   8 +-
 src/test/regress/expected/alter_table.out   |   6 +
 src/test/regress/sql/alter_table.sql|   7 +
 40 files changed, 868 insertions(+), 351 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5d45b6f7cb..0e7a0bc0ee 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2481,21 +2481,14 @@ include_dir 'conf.d'
 levels.  This parameter can only be set at server start.


-In minimal level, WAL-logging of some bulk
-operations can be safely skipped, which can make those
-operations much faster (see ).
-Operations in which this optimization can be applied include:
-
- CREATE TABLE AS
- CREATE INDEX
- CLUSTER
- COPY into tables that were created or truncated in the same
- transaction
-
-But minimal WAL does not contain enough information to reconstruct the
-data from a base backup and the WAL logs, so replica or
-higher mus

Re: TRUNCATE on foreign tables