Re: Truncate in synchronous logical replication failed

2021-04-11 Thread Amit Kapila
On Mon, Apr 12, 2021 at 10:03 AM osumi.takami...@fujitsu.com
 wrote:
>
> > I checked the PG-DOC, found it says that “Replication of TRUNCATE
> > commands is supported”[1], so maybe TRUNCATE is not supported in
> > synchronous logical replication?
> >
> > If my understanding is right, maybe PG-DOC can be modified like this. Any
> > thought?
> > Replication of TRUNCATE commands is supported
> > ->
> > Replication of TRUNCATE commands is supported in asynchronous mode
> I'm not sure if this becomes the final solution,
>

I think unless the solution is not possible or extremely complicated
going via this route doesn't seem advisable.

> but if we take a measure to fix the doc, we have to be careful for the 
> description,
> because when we remove the primary keys of 'test' tables on the scenario in 
> [1], we don't have this issue.
> It means TRUNCATE in synchronous logical replication is not always blocked.
>

The problem happens only when we try to fetch IDENTITY_KEY attributes
because pgoutput uses RelationGetIndexAttrBitmap() to get that
information which locks the required indexes. Now, because TRUNCATE
has already acquired an exclusive lock on the index, it seems to
create a sort of deadlock where the actual Truncate operation waits
for logical replication of operation to complete and logical
replication waits for actual Truncate operation to finish.

Do we really need to use RelationGetIndexAttrBitmap() to build
IDENTITY_KEY attributes? During decoding, we don't even lock the main
relation, we just scan the system table and build that information
using a historic snapshot. Can't we do something similar here?

Adding Petr J. and Peter E. to know their views as this seems to be an
old problem (since the decoding of Truncate operation is introduced).

-- 
With Regards,
Amit Kapila.




Re: Problems around compute_query_id

2021-04-11 Thread Julien Rouhaud
On Mon, Apr 12, 2021 at 03:12:40PM +0900, Michael Paquier wrote:
> Hi all,
> 
> Starting a new thread as the one that has introduced compute_query_id
> is already long enough.
> 
> Fujii-san has reported on Twitter that enabling the computation of
> query IDs does not work properly with log_statement as the query ID is
> calculated at parse analyze time and the query is logged before that.
> As far as I can see, that's really a problem as any queries logged are
> combined with a query ID of 0, and log parsers would not really be
> able to use that, even if the information provided by for example
> log_duration gives the computed query ID prevent in pg_stat_activity.

I don't see any way around that.  The goal of log_statements is to log all
syntactically valid queries, including otherwise invalid queries.  For
instance, it would log

SELECT notacolumn;

and there's no way to compute a queryid in that case.  I think that
log_statements already causes some issues with log parsers.  At least pgbadger
recommends to avoid using that:

"Do not enable log_statement as its log format will not be parsed by pgBadger."

I think we should simply document that %Q is not compatible with
log_statements.

> While making the feature run on some test server, I have noticed that
> %Q would log some garbage query ID for autovacuum workers that's kept
> around.  That looks wrong.

I've not been able to reproduce it, do you have some hint on how to do it?

Maybe setting a zero queryid at the beginning of AutoVacWorkerMain() could fix
the problem?




Re: [PATCH] Identify LWLocks in tracepoints

2021-04-11 Thread Craig Ringer
On Mon, 22 Mar 2021 at 16:38, Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

>
> First, a problem:  0002 doesn't build on macOS, because uint64 has been
> used in the probe definitions.  That needs to be handled like the other
> nonnative types in that file.
>

Will fix.

All the probe changes and additions should be accompanied by
> documentation changes.
>

Agreed, will fix.

The probes used to have an argument to identify the lock, which was
> removed by 3761fe3c20bb040b15f0e8da58d824631da00caa.


Huh. That's exactly the functionality I was looking for. Damn. I understand
why Robert removed it, but its removal makes it much harder to identify an
LWLock since it might fall in a DSM segment that could be mapped at
different base addresses in different backends.

Robert's patch didn't replace the offset within tranche with anything else
to identify the lock. A LWLock* is imperfect due to ASLR and DSM but it's
better than nothing. In theory we could even remap them in trace tools if
we had tracepoints on DSM attach and detach that showed their size and base
address too.

CC'ing Andres, as he expressed interest in being able to globally identify
LWLocks too.


> The 0001 patch is
> essentially trying to reinstate that, which seems sensible.  Perhaps we
> should also use the argument order that used to be there.  It used to be
>
> probe lwlock__acquire(const char *, int, LWLockMode);
>
> and now it would be
>
> probe lwlock__acquire(const char *, LWLockMode, LWLock*, int);
>
> Also, do we need both the tranche name and the tranche id?


Reasons to have the name:

* There is no easy way to look up the tranche name by ID from outside the
backend
* A tranche ID by itself is pretty much meaningless especially for dynamic
tranches
* Any existing scripts will rely on the tranche name

So the tranche name is really required to generate useful output for any
dynamic tranches, or simple and readable output from things like perf.

Reasons to have the tranche ID:

* The tranche name is not guaranteed to have the same address for a given
value across backends in the presence of ASLR, even for built-in tranches.
So tools need to read tranche names as user-space strings, which is much
more expensive than consuming an int argument from the trace args. Storing
and reporting maps of events by tranche name (string) in tools is also more
expensive than having a tranche id.
* When the trace tool or script wants to filter for only one particular
tranche,particularly when it's a built-in tranche where the tranche ID is
known, having the ID is much more useful and efficient.
* If we can avoid computing the tranche name, emitting just the tranche ID
would be much faster.

It's annoying that we have to pay the cost of computing the tranche name
though. It never used to matter, but now that T_NAME() expands to
GetLWTrancheName() calls as of 29c3e2dd5a6 it's going to cost a little more
on such a hot path. I might see if I can do a little comparison and see how
much.

I could add TRACE_POSTGRESQL_<>_ENABLED() guards since we
do in fact build with SDT semaphore support. That adds a branch for each
tracepoint, but they're already marshalling arguments and making a function
call that does lots more than a single branch, so that seems pretty
sensible. The main downside of using _ENABLED() USDT semaphore guards is
that not all tools are guaranteed to understand or support them. So an
older perf, for example, might simply fail to fire events on guarded
probes. That seems OK to me, the onus should be on the probe tool to pay
any costs, not on PostgreSQL. Despite that I don't want to mark the
_ENABLED() guards unlikely(), since that'd increase the observer effect
where probing LWLocks changes their timing and behaviour. Branch prediction
should do a very good job in such cases without being forced.

I wonder a little about the possible cache costs of the _ENABLED() macros
though. Their data is in a separate ELF segment and separate .o, with no
locality to the traced code. It might be worth checking that before
proceeding; I guess it's even possible that the GetLWTrancheName() calls
could be cheaper. Will see if I can run some checks and report back.

BTW, if you want some of the details on how userspace SDTs work,
https://leezhenghui.github.io/linux/2019/03/05/exploring-usdt-on-linux.html
is interesting and useful. It helps explain uprobes, ftrace, bcc, etc.

Or maybe we
> don't need the name, or can record it differently, which might also
> address your other concern that it's too expensive to compute.  In any
> case, I think an argument order like
>
> probe lwlock__acquite(const char *, int, LWLock*, LWLockMode);
>
> would make more sense.
>

OK.

In 0004, you add a probe to record the application_name setting?  Would
> there be any value in making that a generic probe that can record any
> GUC change?
>

Yes, there would, but I didn't want to go and do that in the same patch,
and a named probe on application_name is useful sep

Re: missing documentation for streaming in-progress transactions

2021-04-11 Thread Amit Kapila
On Fri, Apr 9, 2021 at 9:39 AM Amit Kapila  wrote:
>
>
> I don't like repeating the same thing for all new messages. So added
> separate para for the same and few other changes. See what do you
> think of the attached?
>

Pushed.

-- 
With Regards,
Amit Kapila.




Re: Performance degradation of REFRESH MATERIALIZED VIEW

2021-04-11 Thread Masahiko Sawada
.

On Thu, Mar 11, 2021 at 5:44 PM Masahiko Sawada  wrote:
>
> Hi,
>
> While discussing freezing tuples during CTAS[1], we found that
> heap_insert() with HEAP_INSERT_FROZEN brings performance degradation.
> For instance, with Paul's patch that sets HEAP_INSERT_FROZEN to CTAS,
> it took 12 sec whereas the code without the patch took 10 sec with the
> following query:
>
> create table t1 (a, b, c, d) as select i,i,i,i from
> generate_series(1,2000) i;
>
> I've done a simple benchmark of REFRESH MATERIALIZED VIEW with the
> following queries:
>
> create table source as select generate_series(1, 5000);
> create materialized view mv as select * from source;
> refresh materialized view mv;
>
> The execution time of REFRESH MATERIALIZED VIEW are:
>
> w/ HEAP_INSERT_FROZEN flag : 42 sec
> w/o HEAP_INSERT_FROZEN flag : 33 sec
>
> After investigation, I found that such performance degradation happens
> on only HEAD code. It seems to me that commit 39b66a91b (and
> 7db0cd2145) is relevant that has heap_insert() set VM bits and
> PD_ALL_VISIBLE if HEAP_INSERT_FROZEN is specified (so CCing Tomas
> Vondra and authors). Since heap_insert() sets PD_ALL_VISIBLE to the
> page when inserting a tuple for the first time on the page (around
> L2133 in heapam.c), every subsequent heap_insert() on the page reads
> and pins a VM buffer (see RelationGetBufferForTuple()).

IIUC RelationGetBufferForTuple() pins vm buffer if the page is
all-visible since the caller might clear vm bit during operation. But
it's not necessarily true in HEAP_FROZEN_INSERT case. When inserting
HEAP_FROZEN_INSERT, we might set PD_ALL_VISIBLE flag and all-visible
bit but never clear those flag and bit during insertion. Therefore to
fix this issue, I think we can have RelationGetBufferForTuple() not to
pin vm buffer if we're inserting a frozen tuple (i.g.,
HEAP_FROZEN_INSERT case) and the target page is already all-visible.
In HEAP_FROZEN_INSERT, the cases where we need to pin vm buffer would
be the table is empty. That way, we will pin vm buffer only for the
first time of inserting frozen tuple into the empty page, then set
PD_ALL_VISIBLE to the page and all-frozen bit on vm. Also set
XLH_INSERT_ALL_FROZEN_SET to WAL. At further insertions, we would not
pin vm buffer as long as we’re inserting a frozen tuple into the same
page.

If the target page is neither empty nor all-visible we will not pin vm
buffer, which is fine because if the page has non-frozen tuple we
cannot set bit on vm during heap_insert(). If all tuples on the page
are already frozen but PD_ALL_VISIBLE is not set for some reason, we
would be able to set all-frozen bit on vm but it seems not a good idea
since it requires checking during insertion if all existing tuples are
frozen or not.

The attached patch does the above idea. With this patch, the same
performance tests took 33 sec.

Also, I've measured the number of page read during REFRESH
MATERIALIZED VIEW using by pg_stat_statements. There were big
different on shared_blks_hit on pg_stat_statements:

1. w/ HEAP_INSERT_FROZEN flag (HEAD) : 50221781
2. w/ HEAP_INSERT_FROZEN flag (HEAD) : 221782
3. Patched: 443014

Since the 'source' table has 5000 and each heap_insert() read vm
buffer, test 1 read pages as many as the number of insertion tuples.
The value of test 3 is about twice as much as the one of test 2. This
is because heap_insert() read the vm buffer for each first insertion
to the page. The table has 221239 blocks.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


skip_vmbuffer_for_frozen_tuple_insertion.patch
Description: Binary data


Problems around compute_query_id

2021-04-11 Thread Michael Paquier
Hi all,

Starting a new thread as the one that has introduced compute_query_id
is already long enough.

Fujii-san has reported on Twitter that enabling the computation of
query IDs does not work properly with log_statement as the query ID is
calculated at parse analyze time and the query is logged before that.
As far as I can see, that's really a problem as any queries logged are
combined with a query ID of 0, and log parsers would not really be
able to use that, even if the information provided by for example
log_duration gives the computed query ID prevent in pg_stat_activity.

While making the feature run on some test server, I have noticed that
%Q would log some garbage query ID for autovacuum workers that's kept
around.  That looks wrong.

Thoughts?
--
Michael


signature.asc
Description: PGP signature


Re: Uninitialized scalar variable (UNINIT) (src/backend/statistics/extended_stats.c)

2021-04-11 Thread Tom Lane
Michael Paquier  writes:
> On Sun, Apr 11, 2021 at 07:42:20PM -0300, Ranier Vilela wrote:
>> Em dom., 11 de abr. de 2021 às 16:25, Justin Pryzby 
>>> I think it's cleanest to write:
>>> |HeapTupleData tmptup = {0};

> I agree that this would be cleaner.

It would be wrong, though, or at least not have the same effect.
ItemPointerSetInvalid does not set the target to all-zeroes.

(Regardless of that detail, it's generally best to accomplish
objective X in the same way that existing code does.  Deciding
that you have a better way is often wrong, and even if you
are right, you should then submit a patch to change all the
existing cases.)

regards, tom lane




Re: Simplify backend terminate and wait logic in postgres_fdw test

2021-04-11 Thread Bharath Rupireddy
On Mon, Apr 12, 2021 at 11:18 AM Michael Paquier  wrote:
>
> On Fri, Apr 09, 2021 at 04:53:01PM +0530, Bharath Rupireddy wrote:
> > I feel that we can provide a high timeout value (It can be 1hr on the
> > similar lines of using pg_sleep(3600) for crash tests in
> > 013_crash_restart.pl with the assumption that the backend running that
> > command will get killed with SIGQUIT) and make pg_terminate_backend
> > wait. This unreasonably high timeout looks okay because of the
> > assumption that the servers in the build farm will not take that much
> > time to remove the backend from the system processes, so the function
> > will return much earlier than that. If at all there's a server(which
> > is impractical IMO) that doesn't remove the backend process even
> > within 1hr, then that is anyways will fail with the warning.
>
> You may not need a value as large as 1h for that :)
>
> Looking at the TAP tests, some areas have been living with timeouts of
> up to 180s.  It is a matter of balance here, a timeout too long would
> be annoying as it would make the detection of a problem longer for
> machines that are stuck, and a too short value generates false
> positives.  5 minutes gives some balance, but there is really no
> perfect value.

I changed to 5min. If at all there's any server that would take more
than 5min to remove a process from the system processes list, then it
would see a warning on timeout.

> > With the attached patch, we could remove the procedure
> > terminate_backend_and_wait altogether. Thoughts?
>
> That's clearly better, and logically it would work.  As those tests
> are new in 14, it may be a good idea to cleanup all that so as all the
> branches have the same set of tests.  Would people object to that?

Yes, these tests are introduced in v14, +1 to clean them with this
patch on v14 as well along with master.

Attaching v4, please review further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v4-0001-Simplify-backend-terminate-and-wait-logic-in-post.patch
Description: Binary data


Re: Simplify backend terminate and wait logic in postgres_fdw test

2021-04-11 Thread Michael Paquier
On Fri, Apr 09, 2021 at 04:53:01PM +0530, Bharath Rupireddy wrote:
> I feel that we can provide a high timeout value (It can be 1hr on the
> similar lines of using pg_sleep(3600) for crash tests in
> 013_crash_restart.pl with the assumption that the backend running that
> command will get killed with SIGQUIT) and make pg_terminate_backend
> wait. This unreasonably high timeout looks okay because of the
> assumption that the servers in the build farm will not take that much
> time to remove the backend from the system processes, so the function
> will return much earlier than that. If at all there's a server(which
> is impractical IMO) that doesn't remove the backend process even
> within 1hr, then that is anyways will fail with the warning.

You may not need a value as large as 1h for that :)
 
Looking at the TAP tests, some areas have been living with timeouts of
up to 180s.  It is a matter of balance here, a timeout too long would
be annoying as it would make the detection of a problem longer for
machines that are stuck, and a too short value generates false
positives.  5 minutes gives some balance, but there is really no
perfect value.

> With the attached patch, we could remove the procedure
> terminate_backend_and_wait altogether. Thoughts?

That's clearly better, and logically it would work.  As those tests
are new in 14, it may be a good idea to cleanup all that so as all the
branches have the same set of tests.  Would people object to that?
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Identify LWLocks in tracepoints

2021-04-11 Thread Craig Ringer
On Mon, 22 Mar 2021 at 17:00, Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> On 20.03.21 01:29, Craig Ringer wrote:
> > There is already support for that.  See the documentation at the end
> of
> > this page:
> >
> https://www.postgresql.org/docs/devel/dynamic-trace.html#DEFINING-TRACE-POINTS
> > <
> https://www.postgresql.org/docs/devel/dynamic-trace.html#DEFINING-TRACE-POINTS
> >
> >
> >
> > Pretty sure it won't work right now.
> >
> > To use systemtap semaphores (the _ENABLED macros) you need to run dtrace
> > -g to generate a probes.o then link that into postgres.
> >
> > I don't think we do that. I'll double check soon.
>
> We do that.  (It's -G.)
>

Huh. I could've sworn we didn't. My mistake, it's there in
src/backend/Makefile .

In that case I'll amend the patch to use semaphore guards.

(On a side note, systemtap's semaphore support is actually a massive pain.
The way it's implemented in  means that a single compilation
unit may not use both probes.d style markers produced by the dtrace script
and use regular DTRACE_PROBE(providername,probename) preprocessor macros.
If it attempts to do so, the DTRACE_PROBE macros will emit inline asm that
tries to reference probename_semaphore symbols that will not exist,
resulting in linker errors or runtime link errors. But that's really a
systemtap problem. Core PostgreSQL doesn't use any explicit
DTRACE_PROBE(...), STAP_PROBE(...) etc.)


Re: [PATCH] force_parallel_mode and GUC categories

2021-04-11 Thread Tom Lane
Michael Paquier  writes:
> However, I'd like to think that we can do better than what's proposed
> in the patch.  There are a couple of things to consider here:
> - Should the parameter be renamed to reflect that it should only be
> used for testing purposes?

-1 to that part, because it would break a bunch of buildfarm animals'
configurations.  I doubt that any gain in clarity would be worth it.

> - Should we make more general the description of the developer options
> in the docs?

Perhaps ... what did you have in mind?

regards, tom lane




Re: [PATCH] force_parallel_mode and GUC categories

2021-04-11 Thread Bharath Rupireddy
On Mon, Apr 12, 2021 at 10:31 AM Michael Paquier  wrote:
>
> On Fri, Apr 09, 2021 at 07:39:28AM -0400, Bruce Momjian wrote:
> > On Thu, Apr  8, 2021 at 10:17:18PM -0500, Justin Pryzby wrote:
> >> This is the main motive behind the patch.
> >>
> >> Developer options aren't shown in postgresql.conf.sample, which it seems 
> >> like
> >> sometimes people read through quickly, setting a whole bunch of options 
> >> that
> >> sound good, sometimes including this one.  And in the best case they then 
> >> ask
> >> on -performance why their queries are slow and we tell them to turn it 
> >> back off
> >> to fix their issues.  This changes to no longer put it in .sample, and 
> >> calling
> >> it a "dev" option seems to be the classification and mechanism by which to 
> >> do
> >> that.
> >
> > +1
>
> Hm.  I can see the point you are making based on the bug report that
> has led to this thread:
> https://www.postgresql.org/message-id/CAN0SRDFV=Fv0zXHCGbh7gh=mtfw05xd1x7gjjrzs5qn-tep...@mail.gmail.com
>
> However, I'd like to think that we can do better than what's proposed
> in the patch.  There are a couple of things to consider here:
> - Should the parameter be renamed to reflect that it should only be
> used for testing purposes?
> - Should we make more general the description of the developer options
> in the docs?

IMO, categorizing force_parallel_mode to DEVELOPER_OPTIONS and moving
it to the "Developer Options" section in config.sgml looks
appropriate. So, the v2-0004 patch proposed by Justin at [1] looks
good to me. If there are any other GUCs that are not meant to be used
in production, IMO we could follow the same.

[1] https://www.postgresql.org/message-id/20210408213812.GA18734%40telsasoft.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: pgsql: Move tablespace path re-creation from the makefiles to pg_regres

2021-04-11 Thread Michael Paquier
On Fri, Apr 09, 2021 at 08:07:10PM -0700, Noah Misch wrote:
> "pg_regress --outputdir" is not a great location for a file or directory
> created by a user other than the user running pg_regress.  If one does "make
> check" and then "make installcheck" against a cluster running as a different
> user, the rmtree() will fail, assuming typical umask values.  An rmtree() at
> the end of the tablespace test would mostly prevent that, but that can't help
> in the event of a mid-test crash.

Yeah, I really don't think that we need to worry about multi-user
scenarios with pg_regress like that though.

> I'm not sure we should support installcheck against a server running as a
> different user.  If we should support it, then I'd probably look at letting
> the caller pass in a server-writable directory.  That directory would house
> the tablespace instead of outputdir doing so.

But, then, we would be back to the pre-13 position where we'd need to
have something external to pg_regress in charge of cleaning up and
creating the tablespace path, no?  That's basically what we want to
avoid with the Makefile rules.  I can get that it could be interesting
to be able to pass down a custom path for the test tablespace, but do
we really have a need for that?

It took some time for the CF bot to run the patch of this thread, but
from what I can see the tests are passing on Windows under Cirrus CI:
http://commitfest.cputube.org/michael-paquier.html

So it looks like this could be a different answer.
--
Michael


signature.asc
Description: PGP signature


Re: Uninitialized scalar variable (UNINIT) (src/backend/statistics/extended_stats.c)

2021-04-11 Thread Michael Paquier
On Sun, Apr 11, 2021 at 07:42:20PM -0300, Ranier Vilela wrote:
> Em dom., 11 de abr. de 2021 às 16:25, Justin Pryzby 
> escreveu:
>> I think you're right.  You can look in the commit history to find the
>> relevant
>> commit and copy the committer.

In this case that's a4d75c8, for Tomas.

>> I think it's cleanest to write:
>> |HeapTupleData tmptup = {0};

I agree that this would be cleaner.

While on it, if you could not top-post..
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] force_parallel_mode and GUC categories

2021-04-11 Thread Michael Paquier
On Fri, Apr 09, 2021 at 07:39:28AM -0400, Bruce Momjian wrote:
> On Thu, Apr  8, 2021 at 10:17:18PM -0500, Justin Pryzby wrote:
>> This is the main motive behind the patch.
>> 
>> Developer options aren't shown in postgresql.conf.sample, which it seems like
>> sometimes people read through quickly, setting a whole bunch of options that
>> sound good, sometimes including this one.  And in the best case they then ask
>> on -performance why their queries are slow and we tell them to turn it back 
>> off
>> to fix their issues.  This changes to no longer put it in .sample, and 
>> calling
>> it a "dev" option seems to be the classification and mechanism by which to do
>> that.
> 
> +1

Hm.  I can see the point you are making based on the bug report that
has led to this thread:
https://www.postgresql.org/message-id/CAN0SRDFV=Fv0zXHCGbh7gh=mtfw05xd1x7gjjrzs5qn-tep...@mail.gmail.com

However, I'd like to think that we can do better than what's proposed
in the patch.  There are a couple of things to consider here:
- Should the parameter be renamed to reflect that it should only be
used for testing purposes?
- Should we make more general the description of the developer options
in the docs?

I have applied the patch for log_autovacuum_min_duration for now, as
this one is clearly wrong.
--
Michael


signature.asc
Description: PGP signature


Re: Replication slot stats misgivings

2021-04-11 Thread Masahiko Sawada
On Sat, Apr 10, 2021 at 9:53 PM Amit Kapila  wrote:
>
> On Sat, Apr 10, 2021 at 7:24 AM Masahiko Sawada  wrote:
> >
> > IIUC there are two problems in the case where the drop message is lost:
> >
> > 1. Writing beyond the end of replSlotStats.
> > This can happen if after restarting the number of slots whose stats
> > are stored in the stats file exceeds max_replication_slots. Vignesh's
> > patch addresses this problem.
> >
> > 2. The stats for the new slot are not recorded.
> > If the stats for already-dropped slots remain in replSlotStats, the
> > stats for the new slot cannot be registered due to the full of
> > replSlotStats. This can happen even when after restarting the number
> > of slots whose stats are stored in the stat file does NOT exceed
> > max_replication_slots as well as even during the server running. The
> > patch doesn’t address this problem. (If this happens, we will have to
> > reset all slot stats since pg_stat_reset_replication_slot() cannot
> > remove the slot stats with the non-existing name).
> >
> > I think we can use HTAB to store slot stats and have
> > pg_stat_get_replication_slot() inquire about stats by the slot name,
> > resolving both problems. By using HTAB we're no longer concerned about
> > the problem of writing stats beyond the end of the replSlotStats
> > array. Instead, we have to consider how and when to clean up the stats
> > for already-dropped slots. We can have the startup process send slot
> > names at startup time, which borrows the idea proposed by Amit. But
> > maybe we need to consider the case again where the message from the
> > startup process is lost? Another idea would be to have
> > pgstat_vacuum_stat() check the existing slots and call
> > pgstat_report_replslot_drop() if the slot in the stats file doesn't
> > exist. That way, we can continuously check the stats for
> > already-dropped slots.
> >

Thanks for your comments.

>
> Agreed, I think checking periodically via pgstat_vacuum_stat is a
> better idea then sending once at start up time. I also think using
> slot_name is better than using 'idx' (index in
> ReplicationSlotCtl->replication_slots) in this scheme because even
> after startup 'idx' changes we will be able to drop the dead slot.
>
> > I've written a PoC patch for the above idea; using HTAB and cleaning
> > up slot stats at pgstat_vacuum_stat(). The patch can be applied on top
> > of 0001 patch Vignesh proposed before[1].
> >
>
> It seems Vignesh has changed patches based on the latest set of
> comments so you might want to rebase.

I've merged my patch into the v6 patch set Vignesh submitted.

I've attached the updated version of the patches. I didn't change
anything in the patch that changes char[NAMEDATALEN] to NameData (0001
patch) and patches that add tests. In 0003 patch I reordered the
output parameters of pg_stat_replication_slots; showing total number
of transactions and total bytes followed by statistics for spilled and
streamed transactions seems appropriate to me. Since my patch resolved
the issue of writing stats beyond the end of the array, I've removed
the patch that writes the number of stats into the stats file
(v6-0004-Handle-overwriting-of-replication-slot-statistic-.patch).

Apart from the above updates, the
contrib/test_decoding/001_repl_stats.pl add wait_for_decode_stats()
function during testing but I think we can use poll_query_until()
instead. Also, I think we can merge 0004 and 0005 patches.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v7-0005-Test-where-there-are-more-replication-slot-statis.patch
Description: Binary data


v7-0004-Added-tests-for-verification-of-logical-replicati.patch
Description: Binary data


v7-0001-Changed-char-datatype-to-NameData-datatype-for-sl.patch
Description: Binary data


v7-0003-Added-total-txns-and-total-txn-bytes-to-replicati.patch
Description: Binary data


v7-0002-Use-HTAB-for-replication-slot-statistics.patch
Description: Binary data


RE: Truncate in synchronous logical replication failed

2021-04-11 Thread osumi.takami...@fujitsu.com
Hi


On Saturday, April 10, 2021 11:52 PM Japin Li  wrote:
> On Thu, 08 Apr 2021 at 19:20, Japin Li  wrote:
> > On Wed, 07 Apr 2021 at 16:34, tanghy.f...@fujitsu.com
>  wrote:
> >> On Wednesday, April 7, 2021 5:28 PM Amit Kapila
> >>  wrote
> >>
> >>>Can you please check if the behavior is the same for PG-13? This is
> >>>just to ensure that we have not introduced any bug in PG-14.
> >>
> >> Yes, same failure happens at PG-13, too.
> >>
> >
> > I found that when we truncate a table in synchronous logical
> > replication,
> > LockAcquireExtended() [1] will try to take a lock via fast path and it
> > failed (FastPathStrongRelationLocks->count[fasthashcode] = 1).
> > However, it can acquire the lock when in asynchronous logical replication.
> > I'm not familiar with the locks, any suggestions? What the difference
> > between sync and async logical replication for locks?
> >
> 
> After some analyze, I find that when the TRUNCATE finish, it will call
> SyncRepWaitForLSN(), for asynchronous logical replication, it will exit early,
> and then it calls ResourceOwnerRelease(RESOURCE_RELEASE_LOCKS) to
> release the locks, so the walsender can acquire the lock.
> 
> But for synchronous logical replication, SyncRepWaitForLSN() will wait for
> specified LSN to be confirmed, so it cannot release the lock, and the
> walsender try to acquire the lock.  Obviously, it cannot acquire the lock,
> because the lock hold by the process which performs TRUNCATE command.
> This is why the TRUNCATE in synchronous logical replication is blocked.
Yeah, the TRUNCATE waits in SyncRepWaitForLSN() while
the walsender is blocked by the AccessExclusiveLock taken by it,
which makes the subscriber cannot take the change and leads to a sort of 
deadlock.


On Wednesday, April 7, 2021 3:56 PM tanghy.f...@fujitsu.com 
 wrote:
> I checked the PG-DOC, found it says that “Replication of TRUNCATE
> commands is supported”[1], so maybe TRUNCATE is not supported in
> synchronous logical replication?
> 
> If my understanding is right, maybe PG-DOC can be modified like this. Any
> thought?
> Replication of TRUNCATE commands is supported
> ->
> Replication of TRUNCATE commands is supported in asynchronous mode
I'm not sure if this becomes the final solution,
but if we take a measure to fix the doc, we have to be careful for the 
description,
because when we remove the primary keys of 'test' tables on the scenario in 
[1], we don't have this issue.
It means TRUNCATE in synchronous logical replication is not always blocked.

Having the primary key on the pub only causes the hang.
Also, I can observe the same hang using REPLICA IDENTITY USING INDEX and 
without primary key on the pub,
while I cannot reproduce the problem with the REPLICA IDENTITY FULL and without 
primary key.
This difference comes from logicalrep_write_attrs() which has a branch to call 
RelationGetIndexAttrBitmap().
Therefore, the description above is not correct, strictly speaking, I thought.

I'll share my analysis when I get a better idea to address this.

[1] - 
https://www.postgresql.org/message-id/OS0PR01MB6113C2499C7DC70EE55ADB82FB759%40OS0PR01MB6113.jpnprd01.prod.outlook.com

Best Regards,
Takamichi Osumi





Re: psql - add SHOW_ALL_RESULTS option

2021-04-11 Thread Michael Paquier
On Sun, Apr 11, 2021 at 11:14:07AM -0400, Tom Lane wrote:
> It's right: this is dead code because all paths through the if-nest
> starting at line 1373 now leave results = NULL.  Hence, this patch
> has broken the autocommit logic; it's no longer possible to tell
> whether we should do anything with our savepoint.

Ugh, that's a good catch from Coverity here.

> Between this and the known breakage of control-C, it seems clear
> to me that this patch was nowhere near ready for prime time.
> I think shoving it in on the last day before feature freeze was
> ill-advised, and it ought to be reverted.  We can try again later.

Yes, I agree that a revert would be more adapted at this stage.
Peter?
--
Michael


signature.asc
Description: PGP signature


Re: Set access strategy for parallel vacuum workers

2021-04-11 Thread Amit Kapila
On Thu, Apr 8, 2021 at 12:37 PM Bharath Rupireddy
 wrote:
>
> On Thu, Apr 8, 2021 at 11:22 AM Amit Kapila  wrote:
> > Yeah, I will change that before commit unless there are more suggestions.
>
> I have no further comments on the patch
> fix_access_strategy_workers_11.patch, it LGTM.
>

Thanks, seeing no further comments, I have pushed
fix_access_strategy_workers_11.patch.

-- 
With Regards,
Amit Kapila.




Re: psql - add SHOW_ALL_RESULTS option

2021-04-11 Thread Tom Lane
Michael Paquier  writes:
> On Fri, Apr 09, 2021 at 08:52:20AM +0200, Fabien COELHO wrote:
>> There is not a single test of "ctrl-c" which would have caught this trivial
>> and irritating regression. ISTM that a TAP test is doable. Should one be
>> added?

> If you can design something reliable, I would welcome that.

+1, there's a lot of moving parts there.

I think avoiding any timing issues wouldn't be hard; the
query-to-be-interrupted could be "select pg_sleep(1000)" or so.
What's less clear is whether we can trigger the control-C
response reliably across platforms.

regards, tom lane




Re: Replication slot stats misgivings

2021-04-11 Thread vignesh C
Thanks for the comments.

On Fri, Apr 9, 2021 at 4:13 PM Amit Kapila  wrote:
>
> On Wed, Apr 7, 2021 at 2:51 PM vignesh C  wrote:
> >
> > That is not required, I have modified it.
> > Attached v4 patch has the fixes for the same.
> >
>
> Few comments:
>
> 0001
> --
> 1. The first patch includes changing char datatype to NameData
> datatype for slotname. I feel this can be a separate patch from adding
> new stats in the view. I think we can also move the change related to
> moving stats to a structure rather than sending them individually in
> the same patch.

I have split the patch as suggested.

> 2.
> @@ -2051,6 +2054,17 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
> ReorderBufferTXN *txn,
>   rb->begin(rb, txn);
>   }
>
> + /*
> + * Update total transaction count and total transaction bytes, if
> + * transaction is streamed or spilled it will be updated while the
> + * transaction gets spilled or streamed.
> + */
> + if (!rb->streamBytes && !rb->spillBytes)
> + {
> + rb->totalTxns++;
> + rb->totalBytes += rb->size;
> + }
>
> I think this will skip a transaction if it is interleaved between a
> streaming transaction. Assume, two transactions t1 and t2. t1 sends
> changes in multiple streams and t2 sends all changes in one go at
> commit time. So, now, if t2 is interleaved between multiple streams
> then I think the above won't count t2.
>

Modified it.

> 3.
> @@ -3524,9 +3538,11 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb,
> ReorderBufferTXN *txn)
>   {
>   rb->spillCount += 1;
>   rb->spillBytes += size;
> + rb->totalBytes += size;
>
>   /* don't consider already serialized transactions */
>   rb->spillTxns += (rbtxn_is_serialized(txn) ||
> rbtxn_is_serialized_clear(txn)) ? 0 : 1;
> + rb->totalTxns += (rbtxn_is_serialized(txn) ||
> rbtxn_is_serialized_clear(txn)) ? 0 : 1;
>   }
>
> We do serialize each subtransaction separately. So totalTxns will
> include subtransaction count as well when serialized, otherwise not.
> The description of totalTxns also says that it doesn't include
> subtransactions. So, I think updating rb->totalTxns here is wrong.
>

Modified it.

> 0002
> -
> 1.
> +$node->safe_psql('postgres',
> + "SELECT data FROM pg_logical_slot_get_changes('regression_slot2',
> NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
> +$node->safe_psql('postgres',
> +"SELECT data FROM
> pg_logical_slot_get_changes('regression_slot3', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
>
> The indentation of the second SELECT seems to bit off.

Modified it.
These comments are fixed in the patch available at [1].

[1] -
https://www.postgresql.org/message-id/CALDaNm1A%3DbjSrQjBNwNsOtTig%2B6pZpunmAj_P7Au0H0XjtvCyA%40mail.gmail.com

Regards,
Vignesh


Re: psql - add SHOW_ALL_RESULTS option

2021-04-11 Thread Michael Paquier
On Fri, Apr 09, 2021 at 08:52:20AM +0200, Fabien COELHO wrote:
> There is not a single test of "ctrl-c" which would have caught this trivial
> and irritating regression. ISTM that a TAP test is doable. Should one be
> added?

If you can design something reliable, I would welcome that.
--
Michael


signature.asc
Description: PGP signature


Re: Have I found an interval arithmetic bug?

2021-04-11 Thread Zhihong Yu
On Sun, Apr 11, 2021 at 4:33 PM Zhihong Yu  wrote:

>
>
> On Sun, Apr 11, 2021 at 12:57 PM Zhihong Yu  wrote:
>
>>
>>
>> On Thu, Apr 8, 2021 at 10:24 AM Bruce Momjian  wrote:
>>
>>> On Mon, Apr  5, 2021 at 02:01:58PM -0400, Bruce Momjian wrote:
>>> > On Mon, Apr  5, 2021 at 11:33:10AM -0500, Justin Pryzby wrote:
>>> > Well, bug or not, we are not going to change back branches for this,
>>> and
>>> > if you want a larger discussion, it will have to wait for PG 15.
>>> >
>>> > > >
>>> https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT
>>> > > > « …field values can have fractional parts; for example '1.5 week'
>>> or '01:02:03.45'. Such input is converted to the appropriate number of
>>> months, days, and seconds for storage. When this would result in a
>>> fractional number of months or days, the fraction is added to the
>>> lower-order fields using the conversion factors 1 month = 30 days and 1 day
>>> = 24 hours. For example, '1.5 month' becomes 1 month and 15 days. Only
>>> seconds will ever be shown as fractional on output. »
>>> >
>>> > I see that.  What is not clear here is how far we flow down.  I was
>>> > looking at adding documentation or regression tests for that, but was
>>> > unsure.  I adjusted the docs slightly in the attached patch.
>>>
>>> Here is an updated patch, which will be for PG 15.  It updates the
>>> documentation to state:
>>>
>>> The fractional parts are used to compute appropriate values for
>>> the next
>>> lower-order internal fields (months, days, seconds).
>>>
>>> It removes the flow from fractional months/weeks to
>>> hours-minutes-seconds, and adds missing rounding for fractional
>>> computations.
>>>
>>> --
>>>   Bruce Momjian  https://momjian.us
>>>   EDB  https://enterprisedb.com
>>>
>>>   If only the physical world exists, free will is an illusion.
>>>
>>>
>> +1 to this patch.
>>
>
> Bryn reminded me, off list, about the flowing down from fractional
> day after the patch.
>
> Before Bruce confirms the removal of the flowing down from fractional day,
> I withhold my previous +1.
>
> Bryn would respond with more details.
>
> Cheers
>

Among previous examples given by Bryn, the following produces correct
result based on Bruce's patch.

# select interval '-1.7 years 29.4 months';
interval

 9 mons 12 days
(1 row)

Cheers


Re: Table refer leak in logical replication

2021-04-11 Thread Amit Langote
On Sat, Apr 10, 2021 at 10:39 AM Justin Pryzby  wrote:
> I added this as an Open Item.

Thanks, Justin.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Parallel INSERT SELECT take 2

2021-04-11 Thread tsunakawa.ta...@fujitsu.com
This is another try of [1].


BACKGROUND


We want to realize parallel INSERT SELECT in the following steps:
1) INSERT + parallel SELECT
2) Parallel INSERT + parallel SELECT

Below are example use cases.  We don't expect high concurrency or an empty data 
source.
* Data loading (ETL or ELT) into an analytics database, typically a data ware 
house.
* Batch processing in an OLTP database.


PROBLEMS


(1) The overhead of checking parallel-safety could be large
We have to check the target table and its child partitions for parallel safety. 
 That is, we make sure those relations don't have parallel-unsafe domains, 
constraints, indexes, or triggers.

What we should check is the relations into which the statement actually inserts 
data.  However, the planner does not know which relations will be actually 
inserted into.  So, the planner has to check all descendant partitions of a 
target table.  When the target table has many partitions, this overhead could 
be unacceptable when compared to the benefit gained from parallelism.


(2) There's no mechanism for parallel workers to assign an XID
Parallel workers need an XID of the current (sub)transaction when actually 
inserting a tuple (i.e., calling heap_insert()).  When the leader has not got 
the XID yet, the worker may have to assign a new XID and communicate it to the 
leader and other workers so that all parallel processes use the same XID.


SOLUTION TO (1)


The candidate ideas are:

1) Caching the result of parallel-safety check
The planner stores the result of checking parallel safety for each relation in 
relcache, or some purpose-built hash table in shared memory.

The problems are:

* Even if the target relation turns out to be parallel safe by looking at those 
data structures, we cannot assume it remains true until the SQL statement 
finishes.  For instance, other sessions might add a parallel-unsafe index to 
its descendant relations.  Other examples include that when the user changes 
the parallel safety of indexes or triggers by running ALTER FUNCTION on the 
underlying index AM function or trigger function, the relcache entry of the 
table or index is not invalidated, so the correct parallel safety is not 
maintained in the cache.
In that case, when the executor encounters a parallel-unsafe object, it can 
change the cached state as being parallel-unsafe and error out.

* Can't ensure fast access.  With relcache, the first access in each session 
has to undergo the overhead of parallel-safety check.  With a hash table in 
shared memory, the number of relations stored there would be limited, so the 
first access after database startup or the hash table entry eviction similarly 
experiences slowness.

* With a new hash table, some lwlock for concurrent access must be added, which 
can have an adverse effect on performance.


2) Enabling users to declare that the table allows parallel data modification
Add a table property that represents parallel safety of the table for DML 
statement execution.  Users specify it as follows:

CREATE TABLE table_name (...) PARALLEL { UNSAFE | RESTRICTED | SAFE };
ALTER TABLE table_name PARALLEL { UNSAFE | RESTRICTED | SAFE };

This property is recorded in pg_class's relparallel column as 'u', 'r', or 's', 
just like pg_proc's proparallel.  The default is UNSAFE.

The planner assumes that all of the table, its descendant partitions, and their 
ancillary objects have the specified parallel safety or safer one.  The user is 
responsible for its correctness.  If the parallel processes find an object that 
is less safer than the assumed parallel safety during statement execution, it 
throws an ERROR and abort the statement execution.

The objects that relate to the parallel safety of a DML target table are as 
follows:

* Column default expression
* DOMAIN type CHECK expression
* CHECK constraints on column
* Partition key
* Partition key support function
* Index expression
* Index predicate
* Index AM function
* Operator function
* Trigger function

When the parallel safety of some of these objects is changed, it's costly to 
reflect it on the parallel safety of tables that depend on them.  So, we don't 
do it.  Instead, we provide a utility function 
pg_get_parallel_safety('table_name') that returns records of (objid, classid, 
parallel_safety) that represent the parallel safety of objects that determine 
the parallel safety of the specified table.  The function only outputs objects 
that are not parallel safe.  Otherwise, it will consume excessive memory while 
accumulating the output.  The user can use this function to identify 
problematic objects when a parallel DML fails or is not parallelized in an 
expected manner.

How does the executor detect parallel unsafe objects?  There are two ways:

1) At loading time
When the executor loads the definiti

Re: Have I found an interval arithmetic bug?

2021-04-11 Thread Zhihong Yu
On Sun, Apr 11, 2021 at 12:57 PM Zhihong Yu  wrote:

>
>
> On Thu, Apr 8, 2021 at 10:24 AM Bruce Momjian  wrote:
>
>> On Mon, Apr  5, 2021 at 02:01:58PM -0400, Bruce Momjian wrote:
>> > On Mon, Apr  5, 2021 at 11:33:10AM -0500, Justin Pryzby wrote:
>> > Well, bug or not, we are not going to change back branches for this, and
>> > if you want a larger discussion, it will have to wait for PG 15.
>> >
>> > > >
>> https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT
>> > > > « …field values can have fractional parts; for example '1.5 week'
>> or '01:02:03.45'. Such input is converted to the appropriate number of
>> months, days, and seconds for storage. When this would result in a
>> fractional number of months or days, the fraction is added to the
>> lower-order fields using the conversion factors 1 month = 30 days and 1 day
>> = 24 hours. For example, '1.5 month' becomes 1 month and 15 days. Only
>> seconds will ever be shown as fractional on output. »
>> >
>> > I see that.  What is not clear here is how far we flow down.  I was
>> > looking at adding documentation or regression tests for that, but was
>> > unsure.  I adjusted the docs slightly in the attached patch.
>>
>> Here is an updated patch, which will be for PG 15.  It updates the
>> documentation to state:
>>
>> The fractional parts are used to compute appropriate values for
>> the next
>> lower-order internal fields (months, days, seconds).
>>
>> It removes the flow from fractional months/weeks to
>> hours-minutes-seconds, and adds missing rounding for fractional
>> computations.
>>
>> --
>>   Bruce Momjian  https://momjian.us
>>   EDB  https://enterprisedb.com
>>
>>   If only the physical world exists, free will is an illusion.
>>
>>
> +1 to this patch.
>

Bryn reminded me, off list, about the flowing down from fractional
day after the patch.

Before Bruce confirms the removal of the flowing down from fractional day,
I withhold my previous +1.

Bryn would respond with more details.

Cheers


Re: Uninitialized scalar variable (UNINIT) (src/backend/statistics/extended_stats.c)

2021-04-11 Thread Ranier Vilela
Hi Justin, sorry for the delay.

Nothing against it, but I looked for similar codes and this is the usual
way to initialize HeapTupleData.
Perhaps InvalidOid makes a difference.

regards,
Ranier Vilela


Em dom., 11 de abr. de 2021 às 16:25, Justin Pryzby 
escreveu:

> On Sun, Apr 11, 2021 at 03:38:10PM -0300, Ranier Vilela wrote:
> > Per Coverity.
> >
> > It seems to me that some recent commit has failed to properly initialize
> a
> > structure, in extended_stats.c, when is passed to heap_copytuple.
>
> I think you're right.  You can look in the commit history to find the
> relevant
> commit and copy the committer.
>
> I think it's cleanest to write:
> |HeapTupleData tmptup = {0};
>
> --
> Justin
>


Re: Calendar support in localization

2021-04-11 Thread Thomas Munro
On Wed, Mar 17, 2021 at 8:20 AM Thomas Munro  wrote:
> *I mean, we can discuss the different "timelines" like UT, UTC, TAI
> etc, but that's getting into the weeds, the usual timeline for
> computer software outside specialist scientific purposes is UTC
> without leap seconds.

(Erm, rereading this thread, I meant to write "time scales" there.)




Re: Possible SSI bug in heap_update

2021-04-11 Thread Thomas Munro
On Mon, Apr 12, 2021 at 4:54 AM Tom Lane  wrote:
> While re-reading heap_update() in connection with that PANIC we're
> chasing, my attention was drawn to this comment:
>
> /*
>  * Note: beyond this point, use oldtup not otid to refer to old tuple.
>  * otid may very well point at newtup->t_self, which we will overwrite
>  * with the new tuple's location, so there's great risk of confusion if we
>  * use otid anymore.
>  */
>
> This seemingly sage advice is being ignored in one place:
>
> CheckForSerializableConflictIn(relation, otid, 
> BufferGetBlockNumber(buffer));
>
> I wonder whether that's a mistake.  There'd be only a low probability
> of our detecting it through testing, I fear.

Yeah.  Patch attached.

I did a bit of printf debugging, and while it's common that otid ==
&newtup->t_self, neither our regression tests nor our isolation tests
reach a case where ItemPointerEquals(otid, &oldtup.t_self) is false at
the place where that check runs.  Obviously those tests don't exercise
all the branches and concurrency scenarios where we goto l2, so I'm
not at all sure about this, but hmm... at first glance, perhaps there
is no live bug here because the use of *otid comes before
RelationPutHeapTuple() which is where newtup->t_self is really
updated?
From 32d5c87f38d383501d10dbbda1f93ab8eb4d0ab6 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 12 Apr 2021 09:53:54 +1200
Subject: [PATCH] Fix SSI bug in heap_update().

Commit 6f38d4dac38 failed to heed a warning about the stability of the
value pointed to by "otid".  The caller is allowed to pass in a pointer to
newtup->t_self, which will be updated during the execution of the
function.  Instead, use the value we copy into oldtup.t_self at the top
of the function.

Back-patch to 13.

Reported-by: Tom Lane 
Discussion: https://postgr.es/m/2689164.1618160085%40sss.pgh.pa.us
---
 src/backend/access/heap/heapam.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 03d4abc938..548720021e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -3900,7 +3900,8 @@ l2:
 	 * will include checking the relation level, there is no benefit to a
 	 * separate check for the new tuple.
 	 */
-	CheckForSerializableConflictIn(relation, otid, BufferGetBlockNumber(buffer));
+	CheckForSerializableConflictIn(relation, &oldtup.t_self,
+   BufferGetBlockNumber(buffer));
 
 	/*
 	 * At this point newbuf and buffer are both pinned and locked, and newbuf
-- 
2.30.2



Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-11 Thread Justin Pryzby
On Sun, Apr 11, 2021 at 05:20:35PM -0400, Alvaro Herrera wrote:
> On 2021-Mar-31, Tom Lane wrote:
> 
> > diff -U3 
> > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/expected/detach-partition-concurrently-4.out
> >  
> > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/output_iso/results/detach-partition-concurrently-4.out
> > --- 
> > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/expected/detach-partition-concurrently-4.out
> > 2021-03-29 20:14:21.258199311 +0200
> > +++ 
> > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/output_iso/results/detach-partition-concurrently-4.out
> >   2021-03-30 18:54:34.272839428 +0200
> > @@ -324,6 +324,7 @@
> >  1  
> >  2  
> >  step s1insert: insert into d4_fk values (1);
> > +ERROR:  insert or update on table "d4_fk" violates foreign key constraint 
> > "d4_fk_a_fkey"
> >  step s1c: commit;
> >  
> >  starting permutation: s2snitch s1b s1s s2detach s1cancel s3vacfreeze s1s 
> > s1insert s1c
> 
> Hmm, actually, looking at this closely, I think the expected output is
> bogus and trilobite is doing the right thing by throwing this error
> here.  The real question is why isn't this case behaving in that way in
> every *other* animal.

I was looking/thinking at it, and wondered whether it could be a race condition
involving pg_cancel_backend()

-- 
Justin




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-11 Thread Alvaro Herrera
On 2021-Mar-31, Tom Lane wrote:

> diff -U3 
> /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/expected/detach-partition-concurrently-4.out
>  
> /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/output_iso/results/detach-partition-concurrently-4.out
> --- 
> /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/expected/detach-partition-concurrently-4.out
>   2021-03-29 20:14:21.258199311 +0200
> +++ 
> /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/output_iso/results/detach-partition-concurrently-4.out
> 2021-03-30 18:54:34.272839428 +0200
> @@ -324,6 +324,7 @@
>  1  
>  2  
>  step s1insert: insert into d4_fk values (1);
> +ERROR:  insert or update on table "d4_fk" violates foreign key constraint 
> "d4_fk_a_fkey"
>  step s1c: commit;
>  
>  starting permutation: s2snitch s1b s1s s2detach s1cancel s3vacfreeze s1s 
> s1insert s1c

Hmm, actually, looking at this closely, I think the expected output is
bogus and trilobite is doing the right thing by throwing this error
here.  The real question is why isn't this case behaving in that way in
every *other* animal.

-- 
Álvaro Herrera   Valdivia, Chile
"Puedes vivir sólo una vez, pero si lo haces bien, una vez es suficiente"




Re: test runner (was Re: SQL-standard function body)

2021-04-11 Thread Corey Huinker
>
> > This is nice.  Are there any parallelism capabilities?
>
> Yes. It defaults to number-of-cores processes, but obviously can also be
> specified explicitly. One very nice part about it is that it'd work
> largely the same on windows (which has practically unusable testing
> right now). It probably doesn't yet, because I just tried to get it
> build and run tests at all, but it shouldn't be a lot of additional
> work.
>

The pidgin developers speak very highly of meson, for the same reasons
already mentioned in this thread.


Re: Have I found an interval arithmetic bug?

2021-04-11 Thread Zhihong Yu
On Thu, Apr 8, 2021 at 10:24 AM Bruce Momjian  wrote:

> On Mon, Apr  5, 2021 at 02:01:58PM -0400, Bruce Momjian wrote:
> > On Mon, Apr  5, 2021 at 11:33:10AM -0500, Justin Pryzby wrote:
> > Well, bug or not, we are not going to change back branches for this, and
> > if you want a larger discussion, it will have to wait for PG 15.
> >
> > > >
> https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT
> > > > « …field values can have fractional parts; for example '1.5 week' or
> '01:02:03.45'. Such input is converted to the appropriate number of months,
> days, and seconds for storage. When this would result in a fractional
> number of months or days, the fraction is added to the lower-order fields
> using the conversion factors 1 month = 30 days and 1 day = 24 hours. For
> example, '1.5 month' becomes 1 month and 15 days. Only seconds will ever be
> shown as fractional on output. »
> >
> > I see that.  What is not clear here is how far we flow down.  I was
> > looking at adding documentation or regression tests for that, but was
> > unsure.  I adjusted the docs slightly in the attached patch.
>
> Here is an updated patch, which will be for PG 15.  It updates the
> documentation to state:
>
> The fractional parts are used to compute appropriate values for
> the next
> lower-order internal fields (months, days, seconds).
>
> It removes the flow from fractional months/weeks to
> hours-minutes-seconds, and adds missing rounding for fractional
> computations.
>
> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
>
>   If only the physical world exists, free will is an illusion.
>
>
+1 to this patch.


Re: Uninitialized scalar variable (UNINIT) (src/backend/statistics/extended_stats.c)

2021-04-11 Thread Justin Pryzby
On Sun, Apr 11, 2021 at 03:38:10PM -0300, Ranier Vilela wrote:
> Per Coverity.
> 
> It seems to me that some recent commit has failed to properly initialize a
> structure, in extended_stats.c, when is passed to heap_copytuple.

I think you're right.  You can look in the commit history to find the relevant
commit and copy the committer.

I think it's cleanest to write:
|HeapTupleData tmptup = {0};

-- 
Justin




Uninitialized scalar variable (UNINIT) (src/backend/statistics/extended_stats.c)

2021-04-11 Thread Ranier Vilela
Hi,

Per Coverity.

It seems to me that some recent commit has failed to properly initialize a
structure,
in extended_stats.c, when is passed to heap_copytuple.

regards,
Ranier Vilela


fix_uninitialized_var_extend_stats.patch
Description: Binary data


Re: MultiXact\SLRU buffers configuration

2021-04-11 Thread Andrey Borodin


> 8 апр. 2021 г., в 15:22, Thomas Munro  написал(а):
> 
> On Thu, Apr 8, 2021 at 7:24 PM Andrey Borodin  wrote:
>> I agree that this version of eviction seems much more effective and less 
>> intrusive than RR. And it's still LRU, which is important for subsystem that 
>> is called SLRU.
>> shared->search_slotno is initialized implicitly with memset(). But this 
>> seems like a common practice.
>> Also comment above "max_search = Min(shared->num_slots, 
>> MAX_REPLACEMENT_SEARCH);" does not reflect changes.
>> 
>> Besides this patch looks good to me.
> 
> Thanks!  I chickened out of committing a buffer replacement algorithm
> patch written 11 hours before the feature freeze, but I also didn't
> really want to commit the GUC patch without that.  Ahh, if only we'd
> latched onto the real problems here just a little sooner, but there is
> always PostgreSQL 15, I heard it's going to be amazing.  Moved to next
> CF.

I have one more idea inspired by CPU caches.
Let's make SLRU n-associative, where n ~ 8.
We can divide buffers into "banks", number of banks must be power of 2.
All banks are of equal size. We choose bank size to approximately satisfy 
user's configured buffer size.
Each page can live only within one bank. We use same search and eviction 
algorithms as we used in SLRU, but we only need to search\evict over 8 elements.
All SLRU data of a single bank will be colocated within at most 2 cache line.

I did not come up with idea how to avoid multiplication of bank_number * 
bank_size in case when user configured 31337 buffers (any number that is 
radically not a power of 2).

PFA patch implementing this idea.

Best regards, Andrey Borodin.




v17-0002-Divide-SLRU-buffers-into-n-associative-banks.patch
Description: Binary data


v17-0001-Make-all-SLRU-buffer-sizes-configurable.patch
Description: Binary data


Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-11 Thread Peter Geoghegan
On Sun, Apr 11, 2021 at 11:16 AM Tom Lane  wrote:
> It wasn't very clear, because I hadn't thought it through very much;
> but what I'm imagining is that we discard most of the thrashing around
> all-visible rechecks and have just one such test somewhere very late
> in heap_update, after we've successfully acquired a target buffer for
> the update and are no longer going to possibly need to release any
> buffer lock.  If at that one point we see the page is all-visible
> and we don't have the vmbuffer, then we have to release all our locks
> and go back to "l2".  Which is less efficient than some of the existing
> code paths, but given how hard this problem is to reproduce, it seems
> clear that optimizing for the occurrence is just not worth it.

Oh! That sounds way better.

This reminds me of the tupgone case that I exorcised from vacuumlazy.c
(in the same commit that stopped using a superexclusive lock). It was
also described as an optimization that wasn't quite worth it. But I
don't quite buy that. ISTM that there is a better explanation: it
evolved the appearance of being an optimization that might make sense.
Which was just camouflage.

-- 
Peter Geoghegan




Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-11 Thread Tom Lane
Peter Geoghegan  writes:
> On Sun, Apr 11, 2021 at 10:55 AM Tom Lane  wrote:
>> Either way, it's hard to argue that
>> heap_update hasn't crossed the complexity threshold where it's
>> impossible to maintain safely.  We need to simplify it.

> It is way too complicated. I don't think that I quite understand your
> first proposal right now, so I'll need to go think about it.

It wasn't very clear, because I hadn't thought it through very much;
but what I'm imagining is that we discard most of the thrashing around
all-visible rechecks and have just one such test somewhere very late
in heap_update, after we've successfully acquired a target buffer for
the update and are no longer going to possibly need to release any
buffer lock.  If at that one point we see the page is all-visible
and we don't have the vmbuffer, then we have to release all our locks
and go back to "l2".  Which is less efficient than some of the existing
code paths, but given how hard this problem is to reproduce, it seems
clear that optimizing for the occurrence is just not worth it.

regards, tom lane




Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-11 Thread Peter Geoghegan
On Sun, Apr 11, 2021 at 10:55 AM Tom Lane  wrote:
> Alternatively, we could do what you suggested and redefine things
> so that one is only allowed to set the all-visible bit while holding
> superexclusive lock; which again would allow an enormous simplification
> in heap_update and cohorts.

Great detective work.

I would rather not go back to requiring a superexclusive lock in
vacuumlazy.c (outside of pruning), actually -- I was only pointing out
that that had changed, and was likely to be relevant. It wasn't a real
proposal.

I think that it would be hard to justify requiring a super-exclusive
lock just to call PageSetAllVisible(). PD_ALL_VISIBLE is fundamentally
redundant information, so somehow it feels like the wrong design.

> Either way, it's hard to argue that
> heap_update hasn't crossed the complexity threshold where it's
> impossible to maintain safely.  We need to simplify it.

It is way too complicated. I don't think that I quite understand your
first proposal right now, so I'll need to go think about it.

-- 
Peter Geoghegan




Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-11 Thread Tom Lane
I wrote:
> I'm now inclined to think that we should toss every single line of that
> code, take RelationGetBufferForTuple out of the equation, and have just
> *one* place that rechecks for PageAllVisible having just become set.
> It's a rare enough case that optimizing it is completely not worth the
> code complexity and risk (er, reality) of hard-to-locate bugs.

Alternatively, we could do what you suggested and redefine things
so that one is only allowed to set the all-visible bit while holding
superexclusive lock; which again would allow an enormous simplification
in heap_update and cohorts.  Either way, it's hard to argue that
heap_update hasn't crossed the complexity threshold where it's
impossible to maintain safely.  We need to simplify it.

regards, tom lane




Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-11 Thread Tom Lane
I wrote:
> (It does look like RelationGetBufferForTuple
> knows about updating vmbuffer, but there's one code path through the
> if-nest at 3850ff that doesn't call that.)

Although ... isn't RelationGetBufferForTuple dropping the ball on this
point too, in the code path at the end where it has to extend the relation?

I'm now inclined to think that we should toss every single line of that
code, take RelationGetBufferForTuple out of the equation, and have just
*one* place that rechecks for PageAllVisible having just become set.
It's a rare enough case that optimizing it is completely not worth the
code complexity and risk (er, reality) of hard-to-locate bugs.

regards, tom lane




Re: PL/R regression on windows, but not linux with master.

2021-04-11 Thread Tom Lane
Joe Conway  writes:
> On 4/11/21 12:51 PM, Dave Cramer wrote:
>> On Sun, 11 Apr 2021 at 12:43, Tom Lane > > wrote:
>>> Concretely, maybe like the attached?

>> +1 from me.
>> I especially like the changes to the comments as it's more apparent what 
>> they 
>> should be used for.

> +1
> Looks great to me.

OK, pushed to HEAD only.

regards, tom lane




Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-11 Thread Tom Lane
Peter Geoghegan  writes:
> This isn't just any super-exclusive lock, either -- we were calling
> ConditionalLockBufferForCleanup() at this point.

> I now think that there is a good chance that we are seeing these
> symptoms because the "conditional-ness" went away -- we accidentally
> relied on that.

Ah, I see it.  In the fragment of heap_update where we have to do some
TOAST work (starting at line 3815) we transiently *release our lock*
on the old tuple's page.  Unlike the earlier fragments that did that,
this code path has no provision for rechecking whether the page has
become all-visible, so if that does happen while we're without the
lock then we lose.  (It does look like RelationGetBufferForTuple
knows about updating vmbuffer, but there's one code path through the
if-nest at 3850ff that doesn't call that.)

So the previous coding in vacuumlazy didn't tickle this because it would
only set the all-visible bit on a page it had superexclusive lock on;
that is, continuing to hold the pin was sufficient.  Nonetheless, if
four out of five paths through heap_update take care of this matter,
I'd say it's heap_update's bug not vacuumlazy's bug that the fifth path
doesn't.

regards, tom lane




Re: multi-install PostgresNode fails with older postgres versions

2021-04-11 Thread Andrew Dunstan

On 4/7/21 5:06 PM, Alvaro Herrera wrote:
> On 2021-Apr-07, Andrew Dunstan wrote:
>
>> Oh, you want to roll them all up into one file? That could work. It's a
>> bit frowned on by perl purists, but I've done similar (see PGBuild/SCM.pm).
> Ah!  Yeah, pretty much exactly like that, including the "no critic" flag ...
>


OK, here's an attempt at that. There is almost certainly more work to
do, but it does pass my basic test (set up a node, start it, talk to it,
shut it down) on some very old versions down as low as 7.2.


Is this is more to your liking?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index e26b2b3f30..e48379e3fd 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -127,6 +127,11 @@ INIT
 	$last_port_assigned = int(rand() * 16384) + 49152;
 }
 
+# Current dev version, for which we have no subclass
+# When a new stable branch is made this and the subclass hierarchy below
+# need to be adjusted.
+my $devtip = 14;
+
 =pod
 
 =head1 METHODS
@@ -347,9 +352,12 @@ about this node.
 sub info
 {
 	my ($self) = @_;
+	my $varr = $self->{_pg_version};
+	my $vstr = join('.', @$varr) if ref $varr;
 	my $_info = '';
 	open my $fh, '>', \$_info or die;
 	print $fh "Name: " . $self->name . "\n";
+	print $fh "Version: " . $vstr . "\n" if $vstr;
 	print $fh "Data directory: " . $self->data_dir . "\n";
 	print $fh "Backup directory: " . $self->backup_dir . "\n";
 	print $fh "Archive directory: " . $self->archive_dir . "\n";
@@ -797,7 +805,7 @@ sub start
 
 	# Note: We set the cluster_name here, not in postgresql.conf (in
 	# sub init) so that it does not get copied to standbys.
-	$ret = TestLib::system_log('pg_ctl', '-D', $self->data_dir, '-l',
+	$ret = TestLib::system_log('pg_ctl', '-w', '-D', $self->data_dir, '-l',
 		$self->logfile, '-o', "--cluster-name=$name", 'start');
 
 	if ($ret != 0)
@@ -911,7 +919,7 @@ sub restart
 
 	print "### Restarting node \"$name\"\n";
 
-	TestLib::system_or_bail('pg_ctl', '-D', $pgdata, '-l', $logfile,
+	TestLib::system_or_bail('pg_ctl', '-w', '-D', $pgdata, '-l', $logfile,
 		'restart');
 
 	$self->_update_pid(1);
@@ -1196,9 +1204,188 @@ sub get_new_node
 	# Add node to list of nodes
 	push(@all_nodes, $node);
 
+	# Get information about the node
+	$node->_read_pg_config;
+
+	# bless the object into the appropriate subclass,
+	# according to the found version
+	if (ref $node->{_pg_version} && $node->{_pg_version}->[0] <= $devtip )
+	{
+		my $maj = $node->{_pg_version}->[0];
+		my $subclass = __PACKAGE__ . "V_$maj";
+		if ($maj < 10)
+		{
+			$maj = $node->{_pg_version}->[1];
+			$subclass .= "_$maj";
+		}
+		bless $node, $subclass;
+	}
+
 	return $node;
 }
 
+# Private routine to run the pg_config binary found in our environment (or in
+# our install_path, if we have one), and collect all fields that matter to us.
+#
+sub _read_pg_config
+{
+	my ($self) = @_;
+	my $inst = $self->{_install_path};
+	my $pg_config = "pg_config";
+
+	if (defined $inst)
+	{
+		# If the _install_path is invalid, our PATH variables might find an
+		# unrelated pg_config executable elsewhere.  Sanity check the
+		# directory.
+		BAIL_OUT("directory not found: $inst")
+			unless -d $inst;
+
+		# If the directory exists but is not the root of a postgresql
+		# installation, or if the user configured using
+		# --bindir=$SOMEWHERE_ELSE, we're not going to find pg_config, so
+		# complain about that, too.
+		$pg_config = "$inst/bin/pg_config";
+		BAIL_OUT("pg_config not found: $pg_config")
+			unless -e $pg_config;
+		BAIL_OUT("pg_config not executable: $pg_config")
+			unless -x $pg_config;
+
+		# Leave $pg_config install_path qualified, to be sure we get the right
+		# version information, below, or die trying
+	}
+
+	local %ENV = $self->_get_env();
+
+	# We only want the version field
+	open my $fh, "-|", $pg_config, "--version"
+		or
+		BAIL_OUT("$pg_config failed: $!");
+	my $version_line = <$fh>;
+	close $fh or die;
+
+	$self->{_pg_version} = _pg_version_array($version_line);
+
+	BAIL_OUT("could not parse pg_config --version output: $version_line")
+		unless defined $self->{_pg_version};
+}
+
+# Private routine which returns a reference to an array of integers
+# representing the pg_version of a PostgresNode, or parsed from a postgres
+# version string.  Development versions (such as "14devel") are converted
+# to an array with minus one as the last value (such as [14, -1]).
+#
+# For idempotency, will return the argument back to the caller if handed an
+# array reference.
+sub _pg_version_array
+{
+	my ($arg) = @_;
+
+	# accept node arguments
+	return _pg_version_array($arg->{_pg_version})
+		if (blessed($arg) && $arg->isa("PostgresNode"));
+
+	# idempotency
+	return $arg
+		if (ref($arg) && ref($arg) =~ /ARRAY/);
+
+	# Accept standard formats, in case caller has handed us the output of a
+	# postgres command line tool
+	$arg = $1
+		if ($arg =~ m/\(?PostgreSQL\)? (\d+(?:

Re: PL/R regression on windows, but not linux with master.

2021-04-11 Thread Joe Conway

On 4/11/21 12:51 PM, Dave Cramer wrote:



On Sun, 11 Apr 2021 at 12:43, Tom Lane > wrote:


I wrote:
 > Joe Conway mailto:m...@joeconway.com>> writes:
 >> Would an equivalent "PGWARNING" be something we are open to adding and
 >> back-patching?

 > It's not real obvious how pl/r could solve this in a reliable way
 > otherwise, so adding that would be OK with me, but I wonder whether
 > back-patching is going to help you any.  You'd still need to compile
 > against older headers I should think.  So I'd suggest
 > (1) add PGWARNING in HEAD only

Concretely, maybe like the attached?


+1 from me.
I especially like the changes to the comments as it's more apparent what they 
should be used for.


+1

Looks great to me.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Possible SSI bug in heap_update

2021-04-11 Thread Tom Lane
While re-reading heap_update() in connection with that PANIC we're
chasing, my attention was drawn to this comment:

/*
 * Note: beyond this point, use oldtup not otid to refer to old tuple.
 * otid may very well point at newtup->t_self, which we will overwrite
 * with the new tuple's location, so there's great risk of confusion if we
 * use otid anymore.
 */

This seemingly sage advice is being ignored in one place:

CheckForSerializableConflictIn(relation, otid, 
BufferGetBlockNumber(buffer));

I wonder whether that's a mistake.  There'd be only a low probability
of our detecting it through testing, I fear.

regards, tom lane




Re: PL/R regression on windows, but not linux with master.

2021-04-11 Thread Dave Cramer
On Sun, 11 Apr 2021 at 12:43, Tom Lane  wrote:

> I wrote:
> > Joe Conway  writes:
> >> Would an equivalent "PGWARNING" be something we are open to adding and
> >> back-patching?
>
> > It's not real obvious how pl/r could solve this in a reliable way
> > otherwise, so adding that would be OK with me, but I wonder whether
> > back-patching is going to help you any.  You'd still need to compile
> > against older headers I should think.  So I'd suggest
> > (1) add PGWARNING in HEAD only
>
> Concretely, maybe like the attached?
>

+1 from me.
I especially like the changes to the comments as it's more apparent what
they should be used for.

Dave Cramer

>
> regards, tom lane
>
>


Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-11 Thread Peter Geoghegan
On Sun, Apr 11, 2021 at 9:10 AM Peter Geoghegan  wrote:
> I don't have any reason to believe that using a super-exclusive lock
> during heap page vacuuming is necessary. My guess is that returning to
> doing it that way might make the buildfarm green again. That would at
> least confirm my suspicion that this code is relevant. The
> super-exclusive lock might have been masking the problem for a long
> time.

This isn't just any super-exclusive lock, either -- we were calling
ConditionalLockBufferForCleanup() at this point.

I now think that there is a good chance that we are seeing these
symptoms because the "conditional-ness" went away -- we accidentally
relied on that.

-- 
Peter Geoghegan




Re: PL/R regression on windows, but not linux with master.

2021-04-11 Thread Tom Lane
I wrote:
> Joe Conway  writes:
>> Would an equivalent "PGWARNING" be something we are open to adding and 
>> back-patching?

> It's not real obvious how pl/r could solve this in a reliable way
> otherwise, so adding that would be OK with me, but I wonder whether
> back-patching is going to help you any.  You'd still need to compile
> against older headers I should think.  So I'd suggest
> (1) add PGWARNING in HEAD only

Concretely, maybe like the attached?

regards, tom lane

diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 9acb57a27b..f53607e12e 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -40,20 +40,22 @@
 #define WARNING		19			/* Warnings.  NOTICE is for expected messages
  * like implicit sequence creation by SERIAL.
  * WARNING is for unexpected messages. */
+#define PGWARNING	19			/* Must equal WARNING; see NOTE below. */
 #define WARNING_CLIENT_ONLY	20	/* Warnings to be sent to client as usual, but
  * never to the server log. */
 #define ERROR		21			/* user error - abort transaction; return to
  * known state */
-/* Save ERROR value in PGERROR so it can be restored when Win32 includes
- * modify it.  We have to use a constant rather than ERROR because macros
- * are expanded only when referenced outside macros.
- */
-#ifdef WIN32
-#define PGERROR		21
-#endif
+#define PGERROR		21			/* Must equal ERROR; see NOTE below. */
 #define FATAL		22			/* fatal error - abort process */
 #define PANIC		23			/* take down the other backends with me */
 
+/*
+ * NOTE: the alternate names PGWARNING and PGERROR are useful for dealing
+ * with third-party headers that make other definitions of WARNING and/or
+ * ERROR.  One can, for example, re-define ERROR as PGERROR after including
+ * such a header.
+ */
+
 
 /* macros for representing SQLSTATE strings compactly */
 #define PGSIXBIT(ch)	(((ch) - '0') & 0x3F)


Re: psql - add SHOW_ALL_RESULTS option

2021-04-11 Thread Tom Lane
... btw, Coverity also doesn't like this fragment of the patch:

/srv/coverity/git/pgsql-git/postgresql/src/bin/psql/common.c: 1084 in 
ShowNoticeMessage()
1078 static void
1079 ShowNoticeMessage(t_notice_messages *notes)
1080 {
1081PQExpBufferData *current = notes->in_flip ? ¬es->flip : 
¬es->flop;
1082if (current->data != NULL && *current->data != '\0')
1083pg_log_info("%s", current->data);
>>> CID 1476041:  Null pointer dereferences  (FORWARD_NULL)
>>> Passing "current" to "resetPQExpBuffer", which dereferences null 
>>> "current->data".
1084resetPQExpBuffer(current);
1085 }
1086 
1087 /*
1088  * SendQueryAndProcessResults: utility function for use by SendQuery()
1089  * and PSQLexecWatch().

Its point here is that either the test of "current->data != NULL" is
useless, or resetPQExpBuffer needs such a test too.  I'm inclined
to guess the former.

(Just as a matter of style, I don't care for the flip/flop terminology
here, not least because it's not clear why exactly two buffers suffice
and will suffice forevermore.  I'd be inclined to use an array of
two buffers with an index variable.)

regards, tom lane




Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-11 Thread Peter Geoghegan
On Sun, Apr 11, 2021 at 8:57 AM Tom Lane  wrote:
> > Does this patch seem to fix the problem?
>
> Hmm ... that looks pretty suspicious, I agree, but why wouldn't an
> exclusive buffer lock be enough to prevent concurrency with heap_update?

I don't have any reason to believe that using a super-exclusive lock
during heap page vacuuming is necessary. My guess is that returning to
doing it that way might make the buildfarm green again. That would at
least confirm my suspicion that this code is relevant. The
super-exclusive lock might have been masking the problem for a long
time.

How about temporarily committing this patch, just to review how it
affects the buildfarm?

-- 
Peter Geoghegan




Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-11 Thread Tom Lane
Peter Geoghegan  writes:
> On Sat, Apr 10, 2021 at 10:04 PM Tom Lane  wrote:
>> Just eyeing the evidence on hand, I'm wondering if something has decided
>> it can start setting the page-all-visible bit without adequate lock,
>> perhaps only in system catalogs.  heap_update is clearly assuming that
>> that flag won't change underneath it, and if it did, it's clear how this
>> symptom would ensue.

> Does this patch seem to fix the problem?

Hmm ... that looks pretty suspicious, I agree, but why wouldn't an
exclusive buffer lock be enough to prevent concurrency with heap_update?

(I have zero faith in being able to show that this patch fixes the
problem by testing, given how hard it is to reproduce.  We need to
convince ourselves that this is a fix by logic.)

regards, tom lane




Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-11 Thread Peter Geoghegan
On Sat, Apr 10, 2021 at 10:04 PM Tom Lane  wrote:
> Just eyeing the evidence on hand, I'm wondering if something has decided
> it can start setting the page-all-visible bit without adequate lock,
> perhaps only in system catalogs.  heap_update is clearly assuming that
> that flag won't change underneath it, and if it did, it's clear how this
> symptom would ensue.

Does this patch seem to fix the problem?

-- 
Peter Geoghegan


0001-Acquire-super-exclusive-lock-in-lazy_vacuum_heap_rel.patch
Description: Binary data


Re: PL/R regression on windows, but not linux with master.

2021-04-11 Thread Tom Lane
Joe Conway  writes:
> On 4/11/21 10:13 AM, Tom Lane wrote:
>> Indeed.  elog.h already provides a "PGERROR" macro to use for restoring
>> the value of ERROR.  We have not heard of a need to do anything special
>> for WARNING though --- maybe that's R-specific?

> R also defines WARNING in its headers.

Ah.

> Would an equivalent "PGWARNING" be something we are open to adding and 
> back-patching?

It's not real obvious how pl/r could solve this in a reliable way
otherwise, so adding that would be OK with me, but I wonder whether
back-patching is going to help you any.  You'd still need to compile
against older headers I should think.  So I'd suggest

(1) add PGWARNING in HEAD only

(2) in pl/r, do something like
#ifndef PGWARNING
#define PGWARNING 19
#endif
which should be safe since it's that in all previous supported
versions.

Also, I notice that elog.h is wrapping PGERROR in #ifdef WIN32,
which might be an overly constricted view of when it's helpful.

regards, tom lane




Re: psql - add SHOW_ALL_RESULTS option

2021-04-11 Thread Tom Lane
Coverity has pointed out another problem with this patch:

/srv/coverity/git/pgsql-git/postgresql/src/bin/psql/common.c: 1425 in 
SendQuery()
1419/*
1420 * Do nothing if they are messing with 
savepoints themselves:
1421 * If the user did COMMIT AND CHAIN, 
RELEASE or ROLLBACK, our
1422 * savepoint is gone. If they issued a 
SAVEPOINT, releasing
1423 * ours would remove theirs.
1424 */
>>> CID 1476042:  Control flow issues  (DEADCODE)
>>> Execution cannot reach the expression "strcmp(PQcmdStatus(results), 
>>> "COMMIT") == 0" inside this statement: "if (results && (strcmp(PQcm...".
1425if (results &&
1426(strcmp(PQcmdStatus(results), 
"COMMIT") == 0 ||
1427 strcmp(PQcmdStatus(results), 
"SAVEPOINT") == 0 ||
1428 strcmp(PQcmdStatus(results), 
"RELEASE") == 0 ||
1429 strcmp(PQcmdStatus(results), 
"ROLLBACK") == 0))
1430svptcmd = NULL;

It's right: this is dead code because all paths through the if-nest
starting at line 1373 now leave results = NULL.  Hence, this patch
has broken the autocommit logic; it's no longer possible to tell
whether we should do anything with our savepoint.

Between this and the known breakage of control-C, it seems clear
to me that this patch was nowhere near ready for prime time.
I think shoving it in on the last day before feature freeze was
ill-advised, and it ought to be reverted.  We can try again later.

regards, tom lane




Re: Avoid unnecessary table open/close for TRUNCATE foo, foo, foo; kind of commands

2021-04-11 Thread Fujii Masao




On 2021/04/10 11:32, Bharath Rupireddy wrote:

On Fri, Apr 9, 2021 at 9:23 PM Fujii Masao  wrote:

On 2021/04/10 0:39, Amul Sul wrote:

On Fri, Apr 9, 2021 at 8:51 PM Bharath Rupireddy
 wrote:


Hi,

While checking the ExecuteTruncate code for the FOREIGN TRUNCATE
feature, I saw that we filter out the duplicate relations specified in
the TRUNCATE command. But before skipping the duplicates, we are just
opening the relation, then if it is present in the already seen
relids, then closing it and continuing further.

I think we can just have the duplicate checking before table_open so
that in cases like TRUNCATE foo, foo, foo, foo; we could save costs of
table_open and table_close. Attaching a small patch. Thoughts?

This is just like what we already do for child tables, see following
in ExecuteTruncate:
  foreach(child, children)
  {
  Oidchildrelid = lfirst_oid(child);

  if (list_member_oid(relids, childrelid))
  continue;



Well yes, the patch looks pretty much reasonable to be.


LGTM, too. I will commit this patch.
Though that code exists even in older version, I'm not thinking
to back-patch that because it's not a bug.


Thanks. +1 to not back-patch.


Pushed only to the master. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: PL/R regression on windows, but not linux with master.

2021-04-11 Thread Joe Conway

On 4/11/21 10:13 AM, Tom Lane wrote:

Andrew Dunstan  writes:

Well, plr.h does this:

#define WARNING 19
#define ERROR   20



The coding pattern in plr.h looks quite breakable.


Meh -- that code has gone 18+ years before breaking.


Indeed.  elog.h already provides a "PGERROR" macro to use for restoring
the value of ERROR.  We have not heard of a need to do anything special
for WARNING though --- maybe that's R-specific?


R also defines WARNING in its headers. If I remember correctly there are (or at 
least were, it *has* been 18+ years since I looked at this particular thing) 
some odd differences in the R headers under Windows and Linux.


In any case we would be happy to use "PGERROR".

Would an equivalent "PGWARNING" be something we are open to adding and 
back-patching?


Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: Add header support to text format and matching feature

2021-04-11 Thread Rémi Lapeyre
> 
> >> This now reads "Represents whether the header must be absent, present or 
> >> match.”. 
> 
> Since match shouldn't be preceded with be, I think we can say:
> 
> Represents whether the header must match, be absent or be present.

Thanks, here’s a v10 version of the patch that fixes this.



v10-0001-Add-header-support-to-COPY-TO-text-format.patch
Description: Binary data


v10-0002-Add-header-matching-mode-to-COPY-FROM.patch
Description: Binary data


Re: PL/R regression on windows, but not linux with master.

2021-04-11 Thread Tom Lane
Andrew Dunstan  writes:
>> Well, plr.h does this:
>> 
>> #define WARNING  19
>> #define ERROR20

> The coding pattern in plr.h looks quite breakable. Instead of hard
> coding values like this they should save the value from the postgres
> headers in another variable before undefining it and then restore that
> value after inclusion of the R headers.

Indeed.  elog.h already provides a "PGERROR" macro to use for restoring
the value of ERROR.  We have not heard of a need to do anything special
for WARNING though --- maybe that's R-specific?

regards, tom lane




Re: Add header support to text format and matching feature

2021-04-11 Thread Zhihong Yu
On Sun, Apr 11, 2021 at 4:01 AM Rémi Lapeyre 
wrote:

> >
> > Hi,
> > >> sure it matches what is expected and exit immediatly if it does not.
> >
> > Typo: immediately
> >
> > +CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER
> file_server
> >
> > nit: since header is singular, you can name the table header_doesnt_match
> >
> > +  from the one expected, or the name or case do not match, the copy
> will
> >
> > For 'the name or case do not match', either use plural for the subjects
> or change 'do' to doesn't
> >
>
> Thanks, I fixed both typos.
>
> > -   opts_out->header_line = defGetBoolean(defel);
> > +   opts_out->header_line = DefGetCopyHeader(defel);
> >
> > Existing method starts with lower case d, I wonder why the new method
> starts with upper case D.
> >
>
> I don’t remember why I used DefGetCopyHeader, should I change it?
>
> > +   if (fldct < list_length(cstate->attnumlist))
> > +   ereport(ERROR,
> > +   (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> > +errmsg("missing header")));
> >
> > The message seems to be inaccurate: the header may be there - it just
> misses some fields.
>
> I changed the error messages, they now are:
> ERROR:  incomplete header, expected 3 columns but got 2
> ERROR:  extra data after last expected header, expected 3 columns but
> got 4
>
> >
> > + * Represents whether the header must be absent, present or present and
> match.
> >
> > present and match: it seems present is redundant - if header is absent,
> how can it match ?
>
> This now reads "Represents whether the header must be absent, present or
> match.”.
>
> Cheers,
> Rémi
>
> >
> > Cheers
>
>
> >> This now reads "Represents whether the header must be absent, present
or match.”.

Since match shouldn't be preceded with be, I think we can say:

Represents whether the header must match, be absent or be present.

Cheers


Re: PL/R regression on windows, but not linux with master.

2021-04-11 Thread Andrew Dunstan


On 4/10/21 8:56 PM, Tomas Vondra wrote:
> On 4/11/21 2:38 AM, Dave Cramer wrote:
>>
>>
>>
>> On Sat, 10 Apr 2021 at 20:34, Tom Lane > > wrote:
>>
>> Dave Cramer mailto:davecra...@gmail.com>> writes:
>> > On Sat, 10 Apr 2021 at 20:24, Tom Lane > > wrote:
>> >> That's quite bizarre.  What is the actual error level according to
>> >> the source code, and where is the error being thrown exactly?
>>
>> > Well it really is an ERROR, and is being downgraded on windows to
>> WARNING.
>>
>> That seems quite awful.
>>
>> However, now that I think about it, the elog.h error-level constants
>> were renumbered not so long ago.  Maybe you've failed to recompile
>> everything for v14?
>>
>>
>> We see this on a CI with a fresh pull from master.
>>
>> As I said I will dig into it and figure it out. 
>>
> Well, plr.h does this:
>
> #define WARNING   19
> #define ERROR 20
>
> which seems a bit weird, because elog.h does this (since 1f9158ba481):
>
> #define WARNING   19
> #define WARNING_CLIENT_ONLY   20
> #define ERROR 21
>
> Not sure why this would break Windows but not Linux, though.
>
>


The coding pattern in plr.h looks quite breakable. Instead of hard
coding values like this they should save the value from the postgres
headers in another variable before undefining it and then restore that
value after inclusion of the R headers. That would work across versions
even if we renumber them.


cheers


andrew

-- 

Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Add header support to text format and matching feature

2021-04-11 Thread Rémi Lapeyre
> 
> Hi,
> >> sure it matches what is expected and exit immediatly if it does not. 
> 
> Typo: immediately
> 
> +CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
> 
> nit: since header is singular, you can name the table header_doesnt_match
> 
> +  from the one expected, or the name or case do not match, the copy will
> 
> For 'the name or case do not match', either use plural for the subjects or 
> change 'do' to doesn't
> 

Thanks, I fixed both typos.

> -   opts_out->header_line = defGetBoolean(defel);
> +   opts_out->header_line = DefGetCopyHeader(defel);
> 
> Existing method starts with lower case d, I wonder why the new method starts 
> with upper case D.
> 

I don’t remember why I used DefGetCopyHeader, should I change it?

> +   if (fldct < list_length(cstate->attnumlist))
> +   ereport(ERROR,
> +   (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> +errmsg("missing header")));
> 
> The message seems to be inaccurate: the header may be there - it just misses 
> some fields.

I changed the error messages, they now are:
ERROR:  incomplete header, expected 3 columns but got 2
ERROR:  extra data after last expected header, expected 3 columns but got 4

> 
> + * Represents whether the header must be absent, present or present and 
> match.
> 
> present and match: it seems present is redundant - if header is absent, how 
> can it match ?

This now reads "Represents whether the header must be absent, present or 
match.”.

Cheers,
Rémi

> 
> Cheers




v9-0001-Add-header-support-to-COPY-TO-text-format.patch
Description: Binary data


v9-0002-Add-header-matching-mode-to-COPY-FROM.patch
Description: Binary data


Re: TRUNCATE on foreign table

2021-04-11 Thread Bharath Rupireddy
On Sun, Apr 11, 2021 at 9:47 AM Justin Pryzby  wrote:
> Find attached language fixes.

Thanks for the patches.

> I'm also proposing to convert an if/else to an switch(), since I don't like
> "if/else if" without an "else", and since the compiler can warn about missing
> enum values in switch cases.

I think we have a good bunch of if, else-if (without else) in the code
base, and I'm not sure the compilers have warned about them. Apart
from that whether if-else or switch-case is just a coding choice. And
we have only two values for DropBehavior enum i.e DROP_RESTRICT and
DROP_CASCADE(I'm not sure we will extend this enum to have more
values), if we have more then switch case would have looked cleaner.
But IMO, existing if and else-if would be good. Having said that, it's
up to the committer which one they think better in this case.

> You could also write:
> | Assert(behavior == DROP_RESTRICT || behavior == DROP_CASCADE)

IMO, we don't need to assert on behaviour as we just carry that
variable from ExecuteTruncateGuts to postgresExecForeignTruncate
without any modifications. And ExecuteTruncateGuts would get it from
the syntaxer, so no point it will have a different value than
DROP_RESTRICT or DROP_CASCADE.

> Also, you currently test:
> > + if (extra & TRUNCATE_REL_CONTEXT_ONLY)
>
> but TRUNCATE_REL_ aren't indepedent bits, so shouldn't be tested with "&".

Yeah this is an issue. We could just change the #defines to values
0x0001, 0x0002, 0x0004, 0x0008 ... 0x0020 and so on and then testing
with & would work. So, this way, more than option can be multiplexed
into the same int value. To multiplex, we need to think: will there be
a scenario where a single rel in the truncate can have multiple
options at a time and do we want to distinguish these options while
deparsing?

#define TRUNCATE_REL_CONTEXT_NORMAL  0x0001 /* specified without
ONLY clause */
#define TRUNCATE_REL_CONTEXT_ONLY 0x0002 /* specified with
ONLY clause */
#define TRUNCATE_REL_CONTEXT_CASCADING0x0004   /* not specified
but truncated

And I'm not sure what's the agreement on retaining or removing #define
values, currently I see only TRUNCATE_REL_CONTEXT_ONLY is being used,
others are just being set but not used. As I said upthread, it will be
good to remove the unused macros/enums, retain only the ones that are
used, especially TRUNCATE_REL_CONTEXT_CASCADING this is not required I
feel, because we can add the child partitions that are foreign tables
to relids as just normal foreign tables with TRUNCATE_REL_CONTEXT_ONLY
option.

On the patches:
0001-WIP-doc-review-Allow-TRUNCATE-command-to-truncate-fo.patch ---> LGTM.
0002-Convert-an-if-else-if-without-an-else-to-a-switch.patch. --> IMO,
we can ignore this patch.
0003-Test-integer-using-and-not.patch --> if we redefine the marcos to
multiplex them into a single int value, we don't need this patch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: proposal: possibility to read dumped table's name from file

2021-04-11 Thread Pavel Stehule
Hi

út 16. 2. 2021 v 20:32 odesílatel Pavel Stehule 
napsal:

>
> Hi
>
> rebase
>

fresh rebase

Regards

Pavel


> Regards
>
> Pavel
>
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 529b167c96..24bfe07ee9 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1000,6 +1000,42 @@ PostgreSQL documentation
   
  
 
+ 
+  --options-file=filename
+  
+   
+Read options from file (one option per line). Short or long options
+are supported. If you use "-" as a filename, the filters are read
+from stdin.
+   
+
+   
+With the following options file, the dump would include table
+mytable1 and data from foreign tables of
+some_foreign_server foreign server, but exclude data
+from table mytable2.
+
+-t mytable1
+--include-foreign-data=some_foreign_server
+--exclude-table-data=mytable2
+
+   
+
+   
+The text after symbol # is ignored. This can
+be used for comments, notes. Empty lines are ignored too.
+   
+
+   
+The option --options-file can be used more times,
+and the nesting is allowed. The options from options files are
+processed first, other options from command line later. Any option
+file is processed only one time. In next time the processing is
+ignored.
+   
+  
+ 
+
  
   --quote-all-identifiers
   
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index d0ea489614..665b719c89 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -54,9 +54,11 @@
 #include "catalog/pg_trigger_d.h"
 #include "catalog/pg_type_d.h"
 #include "common/connect.h"
+#include "common/string.h"
 #include "dumputils.h"
 #include "fe_utils/string_utils.h"
 #include "getopt_long.h"
+#include "lib/stringinfo.h"
 #include "libpq/libpq-fs.h"
 #include "parallel.h"
 #include "pg_backup_db.h"
@@ -126,18 +128,35 @@ static SimpleOidList foreign_servers_include_oids = {NULL, NULL};
 static SimpleStringList extension_include_patterns = {NULL, NULL};
 static SimpleOidList extension_include_oids = {NULL, NULL};
 
+static SimpleStringList optsfilenames_processed = {NULL, NULL};
+
 static const CatalogId nilCatalogId = {0, 0};
 
 /* override for standard extra_float_digits setting */
 static bool have_extra_float_digits = false;
 static int	extra_float_digits;
 
+static const char *filename = NULL;
+static const char *format = "p";
+static bool g_verbose = false;
+static const char *dumpencoding = NULL;
+static const char *dumpsnapshot = NULL;
+static char *use_role = NULL;
+static long rowsPerInsert;
+static int numWorkers = 1;
+static int compressLevel = -1;
+
 /*
  * The default number of rows per INSERT when
  * --inserts is specified without --rows-per-insert
  */
 #define DUMP_DEFAULT_ROWS_PER_INSERT 1
 
+/*
+ * Option's code of "options-file" option
+ */
+#define OPTIONS_FILE_OPTNUM 12
+
 /*
  * Macro for producing quoted, schema-qualified name of a dumpable object.
  */
@@ -304,14 +323,226 @@ static void appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 static char *get_synchronized_snapshot(Archive *fout);
 static void setupDumpWorker(Archive *AHX);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
+static bool read_options_from_file(const char *filename,
+   DumpOptions *dopt,
+   const char *optstring,
+   const struct option *longopts,
+   const char *progname);
+
+/*
+ * It assigns the values of options to related DumpOption fields or to
+ * some global values. Options file loading is not processed here.
+ */
+static bool
+process_option(int opt,
+			   char *optargstr,
+			   DumpOptions *dopt,
+			   const char *progname)
+{
+	char	   *endptr;
+
+	switch (opt)
+	{
+		case 'a':			/* Dump data only */
+			dopt->dataOnly = true;
+			break;
+
+		case 'b':			/* Dump blobs */
+			dopt->outputBlobs = true;
+			break;
+
+		case 'B':			/* Don't dump blobs */
+			dopt->dontOutputBlobs = true;
+			break;
+
+		case 'c':			/* clean (i.e., drop) schema prior to create */
+			dopt->outputClean = 1;
+			break;
+
+		case 'C':			/* Create DB */
+			dopt->outputCreateDB = 1;
+			break;
+
+		case 'd':			/* database name */
+			dopt->cparams.dbname = pg_strdup(optargstr);
+			break;
+
+		case 'e':			/* include extension(s) */
+			simple_string_list_append(&extension_include_patterns, optargstr);
+			dopt->include_everything = false;
+			break;
+
+		case 'E':			/* Dump encoding */
+			dumpencoding = pg_strdup(optargstr);
+			break;
+
+		case 'f':
+			filename = pg_strdup(optargstr);
+			break;
+
+		case 'F':
+			format = pg_strdup(optargstr);
+			break;
+
+		case 'h':			/* server host */
+			dopt->cparams.pghost = pg_strdup(optargstr);
+			break;
+
+		case 'j':			/* number of dump jobs */
+			numWorkers = atoi(optargstr);
+			break;
+
+		case 'n':			/* include schema(s) */
+			simple_string_list_append(&schema_inc