More tests to stress directly checksum_impl.h

2020-03-05 Thread Michael Paquier
Hi all,

As of the thread which led to addd034 (please see
https://www.postgresql.org/message-id/e1j9ioh-0005kn...@gemulon.postgresql.org,
and sorry about that), it happens that we don't have any tests which
validate the internal data checksum implementation present in core as
of checksum_impl.h.  pageinspect includes a SQL-callable function to
calculate the checksum of a page, mentioned by David in CC, and only
one test exists to make sure that a checksum is not NULL, but it does
not really help if the formula is touched.

Attached is a patch to close the gap by adding new tests to
pageinspect aimed at detecting any formula change.  The trick is to
make the page data representative enough so as it is possible to
detect problems if any part of the formulas are changed, like updates
of pg_checksum_block or checksumBaseOffsets.

Any thoughts or other ideas?
Thanks,
--
Michael
diff --git a/contrib/pageinspect/expected/page.out b/contrib/pageinspect/expected/page.out
index b6aea0124b..e475f2dfa0 100644
--- a/contrib/pageinspect/expected/page.out
+++ b/contrib/pageinspect/expected/page.out
@@ -211,3 +211,20 @@ select tuple_data_split('test8'::regclass, t_data, t_infomask, t_infomask2, t_bi
 (1 row)
 
 drop table test8;
+-- generic checksum tests
+SHOW block_size \gset
+SELECT blkno,
+page_checksum(decode(repeat('01', :block_size), 'hex'), blkno) AS checksum_01,
+page_checksum(decode(repeat('04', :block_size), 'hex'), blkno) AS checksum_04,
+page_checksum(decode(repeat('ff', :block_size), 'hex'), blkno) AS checksum_ff,
+page_checksum(decode(repeat('abcd', :block_size / 2), 'hex'), blkno) AS checksum_abcd,
+page_checksum(decode(repeat('e6d6', :block_size / 2), 'hex'), blkno) AS checksum_e6d6,
+page_checksum(decode(repeat('4a5e', :block_size / 2), 'hex'), blkno) AS checksum_4a5e
+  FROM generate_series(0, 100, 50) AS a (blkno);
+ blkno | checksum_01 | checksum_04 | checksum_ff | checksum_abcd | checksum_e6d6 | checksum_4a5e 
+---+-+-+-+---+---+---
+ 0 |1175 |   28338 |3612 |-30781 |-16269 |-27377
+50 |1225 |   28352 |3598 |-30795 |-16251 |-27391
+   100 |1139 |   28438 |3648 |-30881 |-16305 |-27349
+(3 rows)
+
diff --git a/contrib/pageinspect/sql/page.sql b/contrib/pageinspect/sql/page.sql
index bd049aeb24..e484827053 100644
--- a/contrib/pageinspect/sql/page.sql
+++ b/contrib/pageinspect/sql/page.sql
@@ -86,3 +86,14 @@ select t_bits, t_data from heap_page_items(get_raw_page('test8', 0));
 select tuple_data_split('test8'::regclass, t_data, t_infomask, t_infomask2, t_bits)
 from heap_page_items(get_raw_page('test8', 0));
 drop table test8;
+
+-- generic checksum tests
+SHOW block_size \gset
+SELECT blkno,
+page_checksum(decode(repeat('01', :block_size), 'hex'), blkno) AS checksum_01,
+page_checksum(decode(repeat('04', :block_size), 'hex'), blkno) AS checksum_04,
+page_checksum(decode(repeat('ff', :block_size), 'hex'), blkno) AS checksum_ff,
+page_checksum(decode(repeat('abcd', :block_size / 2), 'hex'), blkno) AS checksum_abcd,
+page_checksum(decode(repeat('e6d6', :block_size / 2), 'hex'), blkno) AS checksum_e6d6,
+page_checksum(decode(repeat('4a5e', :block_size / 2), 'hex'), blkno) AS checksum_4a5e
+  FROM generate_series(0, 100, 50) AS a (blkno);


signature.asc
Description: PGP signature


Re: Exposure related to GUC value of ssl_passphrase_command

2020-03-05 Thread keisuke kuroda
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

I tested the patch on the master branch (addd034) and it works fine.

I think that test case which non-superuser can't see this parameter is 
unnecessary. 
There is a similar test for pg_read_all_settings role.

The new status of this patch is: Ready for Committer


Re: [Proposal] Global temporary tables

2020-03-05 Thread 曾文旌(义从)



> 2020年3月5日 下午10:38,Robert Haas  写道:
> 
> On Thu, Mar 5, 2020 at 9:19 AM tushar  wrote:
>> WARNING:  relfilenode 13589/1663/19063 not exist in gtt shared hash when 
>> forget
>> ERROR:  out of shared memory
>> HINT:  You might need to increase max_active_gtt.
>> 
>> also , would be great  if we can make this error message  user friendly like 
>>  - "max connection reached"  rather than memory error
> 
> That would be nice, but the bigger problem is that the WARNING there
> looks totally unacceptable. It's looks like it's complaining of some
> internal issue (i.e. a bug or corruption) and the grammar is poor,
> too.

Yes, WARNING should not exist.
This is a bug in the rollback process and I have fixed it in 
global_temporary_table_v17-pg13.patch


Wenjing


> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company





Re: [Proposal] Global temporary tables

2020-03-05 Thread 曾文旌(义从)


> 2020年3月5日 下午10:19,tushar  写道:
> 
> On 3/3/20 2:10 PM, 曾文旌(义从) wrote:
>> I fixed in global_temporary_table_v16-pg13.patch.
> Please refer this scenario -
> 
> --Connect to psql -
> 
> postgres=# alter system set max_active_global_temporary_table =1;
> ALTER SYSTEM
> 
> --restart the server (./pg_ctl -D data restart) 
> 
> --create global temp table 
> 
> postgres=# create global temp  table ccc1  (c int);
> CREATE TABLE
> 
> --Try to Create another global temp table
> 
> postgres=# create global temp  table ccc2  (c int);
> WARNING:  relfilenode 13589/1663/19063 not exist in gtt shared hash when 
> forget
> ERROR:  out of shared memory
> HINT:  You might need to increase max_active_gtt.
> 
> postgres=# show max_active_gtt;
> ERROR:  unrecognized configuration parameter "max_active_gtt"
> postgres=# 
> postgres=# show max_active_global_temporary_table ;
>  max_active_global_temporary_table 
> ---
>  1
> (1 row)
> 
> postgres=# 
> 
> I cannot find "max_active_gtt"  GUC . I think you are referring to  
> "max_active_global_temporary_table" here ? 
> 
You're right.

Fixed in global_temporary_table_v17-pg13.patch


Wenjing


> also , would be great  if we can make this error message  user friendly like  
> - "max connection reached"  rather than memory error
> 
> -- 
> regards,tushar
> EnterpriseDB  https://www.enterprisedb.com/ 
> The Enterprise PostgreSQL Company



Re: Atomics in localbuf.c

2020-03-05 Thread Antonin Houska
Andres Freund  wrote:

> On March 5, 2020 12:42:06 PM PST, Antonin Houska  wrote:
> >Andres Freund  wrote:
> >
> >> On March 5, 2020 9:21:55 AM PST, Antonin Houska 
> >wrote:
> >> >What's the reason to use pg_atomic...read_...() and
> >> >pg_atomic...write_...()
> >> >functions in localbuf.c?
> >> >
> >> >It looks like there was an intention not to use them
> >> >
> >>
> >>https://www.postgresql.org/message-id/CAPpHfdtfr3Aj7xJonXaKR8iY2p8uXOQ%2Be4BMpMDAM_5R4OcaDA%40mail.gmail.com
> >> >
> >> >but the following discussion does not explain the decision to use
> >them.
> >> 
> >> Read/write don't trigger locked/atomic operations. They just
> >guarantee that
> >> you're not gonna read/write a torn value. Or a cached one. Since
> >> local/shared buffers share the buffer header definition, we still
> >have to
> >> use proper functions to access the atomic variables.
> >
> >Sure, the atomic operations are necessary for shared buffers, but I
> >still
> >don't understand why they are needed for *local* buffers - local
> >buffers their
> >headers (BufferDesc) in process local memory, so there should be no
> >concerns
> >about concurrent access.
> >
> >Another thing that makes me confused is this comment in
> >InitLocalBuffers():
> >
> > /*
> >  * Intentionally do not initialize the buffer's atomic variable
> >  * (besides zeroing the underlying memory above). That way we get
> >  * errors on platforms without atomics, if somebody (re-)introduces
> >  * atomic operations for local buffers.
> >  */
> >
> >That sounds like there was an intention not to use the atomic functions
> >in
> >localbuf.c, but eventually they ended up there. Do I still miss
> >something?
> 
> Again, the read/write functions do not imply atomic instructions.

ok. What I missed is that BufferDesc.state is declared as pg_atomic_uint32
rather than plain int, so the pg_atomic_...() functions should be used
regardless the buffer is shared or local. Sorry for the noise.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: Internal key management system

2020-03-05 Thread Masahiko Sawada
On Fri, 6 Mar 2020 at 15:25, Moon, Insung  wrote:
>
> Dear Sawada-san
>
> I don't know if my environment or email system is weird, but the V5
> patch file is only include simply a changed list.
> and previous V4 patch file size was 64kb, but the v5 patch file size was 2kb.
> Can you check it?
>

Thank you! I'd attached wrong file.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


kms_v5.patch
Description: Binary data


Re: Internal key management system

2020-03-05 Thread Moon, Insung
Dear Sawada-san

I don't know if my environment or email system is weird, but the V5
patch file is only include simply a changed list.
and previous V4 patch file size was 64kb, but the v5 patch file size was 2kb.
Can you check it?

Best regards.
Moon.

On Tue, Mar 3, 2020 at 5:58 PM Masahiko Sawada
 wrote:
>
> On Tue, 3 Mar 2020 at 08:49, Cary Huang  wrote:
> >
> > Hi Masahiko
> > Please see below my comments regarding kms_v4.patch that you have shared 
> > earlier.
>
> Thank you for reviewing this patch!
>
> >
> > (1)
> > There is a discrepancy between the documentation and the actual function 
> > definition. For example in func.sgml, it states pg_wrap_key takes argument 
> > in text data type but in pg_proc.dat and kmgr.c, the function is defined to 
> > take argument in bytea data type.
> >
> > ===> doc/src/sgml/func.sgml
> > + 
> > +  
> > +   pg_wrap_key
> > +  
> > +  pg_wrap_key(data 
> > text)
> > + 
> > + 
> > +  bytea
> > + 
> >
> > ===> src/include/catalog/pg_proc.dat
> > +{ oid => '8201', descr => 'wrap the given secret',
> > +  proname => 'pg_wrap_key',
> > +  provolatile => 'v', prorettype => 'bytea',
> > +  proargtypes => 'bytea', prosrc => 'pg_wrap_key' },
> >
> > ===> src/backend/crypto/kmgr.c
> > +Datum
> > +pg_wrap_key(PG_FUNCTION_ARGS)
> > +{
> > +   bytea  *data = PG_GETARG_BYTEA_PP(0);
> > +   bytea  *res;
> > +   int datalen;
> > +   int reslen;
> > +   int len;
>
> Fixed.
>
> > +
> >
> > (2)
> > I think the documentation needs to make clear the difference between a key 
> > and a user secret. Some parts of it are trying to mix the 2 terms together 
> > when they shouldn't. To my understanding, a key is expressed as binary data 
> > that is actually used in the encryption and decryption operations. A user 
> > secret, on the other hand, is more like a passphrase, expressed as string, 
> > that is used to derive a encryption key for subsequent encryption 
> > operations.
> >
> > If I just look at the function names "pg_wrap_key" and "pg_unwrap_key", I 
> > immediately feel that these functions are used to encapsulate and uncover 
> > cryptographic key materials. The input and output to these 2 functions 
> > should all be key materials expressed in bytea data type. In previous email 
> > discussion, there was only one key material in discussion, called the 
> > master key (generated during initdb and stored in cluster), and this 
> > somehow automatically makes people (including myself) associate pg_wrap_key 
> > and pg_unwrap_key to be used on this master key and raise a bunch of 
> > security concerns around it.
> >
> > Having read the documentation provided by the patch describing pg_wrap_key 
> > and pg_unwrap_key, they seem to serve another purpose. It states that 
> > pg_wrap_key is used to encrypt a user-supplied secret (text) with the 
> > master key and produce a wrapped secret while pg_unwrap_key does the 
> > opposite, so we can prevent user from having to enter the secret in 
> > plaintext when using pgcrypto functions.
> >
> > This use case is ok but user secret is not really a cryptographic key 
> > material is it? It is more similar to a secret passphrase expressed in text 
> > and pg_wrap_key is merely used to turn the passphrase into a wrapped 
> > passphrase expressed in bytea.
> >
> > If I give pg_wrap_key with a real key material expressed in bytea, I will 
> > not be able to unwrap it properly:
> >
> > Select pg_unwrap_key 
> > (pg_wrap_key(E'\\x2b073713476f9f0e761e45c64be8175424d2742e7d53df90b6416f1d84168e8a')
> >  );
> >
> > pg_unwrap_key
> > --
> > +\x077\x13Go\x0Ev\x1EEK\x17T$t.}SߐAo\x1D\x16
> > (1 row)
> >
> > Maybe we should rename these SQL functions like this to prevent confusion.
> > => pg_wrap_secret (takes a text, returns a bytea)
> > => pg_unwrap_secret(takes a bytea, returns a text)
>
> Agreed to change argument types. User secret will be normally text
> password as we do with pgcrypto. So probably these functions can cover
> most cases. I changed the function name to pg_wrap and pg_unwrap
> because these functions generically wrap and unwrap the given data.
>
> >
> > if there is a use case for users to encapsulate key materials then we can 
> > define 2 more wrap functions for these, if there is no use case, don't 
> > bother:
> > => pg_wrap_key (takes a bytea, returns a bytea)
> > => pg_unwrap_key (takes a bytea, returns a bytea)
>
> +1. Like pgcrypto has both pgp_sym_encrypt_bytea and pgp_sym_encrypt,
> maybe we can have such functions.
>
> >
> > (3)
> > I would rephrase "chapter 32: Encryption Key Management Part III. Server 
> > Administration" documentation like this:
> >
> > =
> > PostgreSQL supports Encryption Key Management System, which is enabled when 
> > PostgreSQL is built with --with-openssl option 

Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-03-05 Thread Kyotaro Horiguchi
Hello.

I looked this briefly but not tested.

At Fri, 06 Mar 2020 00:24:01 +0300, Kartyshov Ivan  
wrote in 
> On 2018-03-06 14:50, Simon Riggs wrote:
> > On 6 March 2018 at 11:24, Dmitry Ivanov 
> > wrote:
> >>> In PG11, I propose the following command, sticking mostly to Ants'
> >>> syntax, and allowing to wait for multiple events before it returns. It
> >>> doesn't hold snapshot and will not get cancelled by Hot Standby.
> >>> WAIT FOR event [, event ...] options
> >>> event is
> >>> LSN value
> >>> TIMESTAMP value
> >>> options
> >>> TIMEOUT delay
> >>> UNTIL TIMESTAMP timestamp
> >>> (we have both, so people don't need to do math, they can use whichever
> >>> they have)
> >> I have a (possibly) dumb question: if we have specified several
> >> events,
> >> should WAIT finish if only one of them triggered? It's not immediately
> >> obvious if we're waiting for ALL of them to trigger, or just one will
> >> suffice (ANY). IMO the syntax could be extended to something like:
> >> WAIT FOR [ANY | ALL] event [, event ...] options,
> >> with ANY being the default variant.
> > +1
> 
> Here I made new patch of feature, discussed above.
> 
> WAIT FOR - wait statement to pause beneath statements
> ==
> 
> Synopsis
> ==
> WAIT FOR [ANY | SOME | ALL] event [, event ...] options
> and event is:
> LSN value
> TIMESTAMP value
> 
> and options is:
> TIMEOUT delay
> UNTIL TIMESTAMP timestamp

The syntax seems getting confused. What happens if we typed in the
command "WAIT FOR TIMESTAMP '...' UNTIL TIMESTAMP ''"?  It seems
to me the options is useles.  Couldn't the TIMEOUT option be a part of
event?  I know gram.y doesn't accept that syntax but it is not
apparent from the description above.

As I read through the previous thread, one of the reason for this
feature implemented as a syntax is it was inteded to be combined into
BEGIN statement.  If there is not any use case for the feature midst
of a transaction, why don't you turn it into a part of BEGIN command?

> Description
> ==
> WAIT FOR - make to wait statements (that are beneath) on sleep until
> event happens (Don’t process new queries until an event happens).
...
> Notice: WAIT FOR will release on PostmasterDeath or Interruption
> events
> if they come earlier then LSN or timeout.

I think interrupts ought to result in ERROR.

wait.c adds a fair amount of code and uses proc-array based
approach. But Thomas suggested queue-based approach and I also think
it is better.  We already have a queue-based mechanism that behaves
almost the same with this feature in the comit code on master-side. It
avoids spurious backend wakeups. Couldn't we extend SyncRepWaitForLSN
or share a part of the code/infrastructures so that this feature can
share the code?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


Re: logical replication empty transactions

2020-03-05 Thread Craig Ringer
On Mon, 2 Mar 2020 at 19:26, Amit Kapila  wrote:

> One thing that is not clear to me is how will we advance restart_lsn
> if we don't send any empty xact in a system where there are many such
> xacts?

Same way we already do it for writes that are not replicated over
logical replication, like vacuum work etc. The upstream sends feedback
with reply-requested. The downstream replies. The upstream advances
confirmed_flush_lsn, and that lazily updates restart_lsn.

The bigger issue here is that if you don't send empty txns on logical
replication you don't get an eager, timely response from the
replica(s), which delays synchronous replication. You need to send
empty txns when synchronous replication is enabled, or instead poke
the walsender to force immediate feedback with reply requested.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise




Re: Proposal: PqSendBuffer removal

2020-03-05 Thread Craig Ringer
On Fri, 6 Mar 2020 at 07:27, Aleksei Ivanov  wrote:
>
> > What do you mean "just one syscall"?  The entire point here is that it'd 
> > take more syscalls to send the same amount of data.
>
> I mean that it messages are large enough more than 2K we will need 4 syscalls 
> without copy it to the internal buffer, but currently we will copy 8K of 
> messages and send it using 1 call. I think that under some threshold of 
> packet length it is redundant to copy it to internal buffer and the data can 
> be sent directly.

I think what you're suggesting is more complex than you may expect.
PostgreSQL is single threaded and relies pretty heavily on the ability
to buffer internally. It also expects its network I/O to always
succeed. Just switching to directly doing nonblocking I/O is not very
feasible. Changing the network I/O paths may expose a lot more
opportunities for send vs receive deadlocks.

It also complicates the protocol's handling of message boundaries,
since failures and interruptions can occur at more points.

Have you measured anything that suggests that our admittedly
inefficient multiple handling of send buffers is
performance-significant compared to the vast amount of memory
allocation and copying we do all over the place elsewhere? Do you have
a concrete reason to want to remove this?

If I had to change this model I'd probably be looking at an
iovector-style approach, like we use with shm_mq. Assemble an array of
buffer descriptors pointing to short, usually statically allocated
buffers and populate one with each pqformat step. Then when the
message is assembled, use writev(2) or similar to dispatch it. Maybe
do some automatic early flushing if the buffer space overflows. But
that might need a protocol extension so we had a way to recover after
interrupted sending of a partial message...

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-03-05 Thread Amit Kapila
On Thu, Mar 5, 2020 at 11:20 PM Tomas Vondra
 wrote:
>
> On Wed, Mar 04, 2020 at 10:28:32AM +0530, Amit Kapila wrote:
> >
>
> Sure, there's a lot to discuss. And it's possible (likely) it's not
> feasible to get this into PG13. But I think it's still worth discussing
> it, instead of just punting it into the next CF right away.
>

That makes sense to me.

> >> There's been a tremendous amount of work done since I last
> >> worked on it, and a lot was discussed on this thread, so it'll take a
> >> while to get familiar with the new code ...
> >>
> >> The first thing I realized that WAL-logging of assignments in v12 does
> >> both the "old" logging (using dedicated message) and "new" with
> >> toplevel-XID embedded in the first message. Yes, the patch was wrong,
> >> because it eliminated all calls to ProcArrayApplyXidAssignment() and so
> >> it was trivial to crash the replica due to KnownAssignedXids overflow.
> >> But I don't think re-introducing XLOG_XACT_ASSIGNMENT message is the
> >> right fix.
> >>
> >> I actually proposed doing this (having both ways to log assignments) so
> >> that there's no regression risk with (wal_level < logical). But IIRC
> >> Andres objected to it, argumenting that we should not log the same piece
> >> of information in two very different ways at the same time (IIRC it was
> >> discussed on the FOSDEM dev meeting, so I don't have a link to share).
> >> And I do agree with him ...
> >>
> >
> >So, aren't we worried about the overhead of the amount of WAL and
> >performance impact for the transactions?  We might want to check the
> >pgbench read-write test to see if that will add any significant
> >overhead.
> >
>
> Well, sure. I agree we need to see how this affects performance, and
> I'll do some benchmarks (I think I did that when submitting the patch,
> but I don't recall the numbers / details).
>
> Isn't it a bit strange to log stuff twice, though, if we worry about
> performance? Surely that's more expensive than logging it just once. Of
> course, it might be useful if most systems need just the "old" way.
>
> I know it's going to be a bit hand-wavy, but I think embedding the
> assignments into existing WAL messages is about the cheapest way to log
> this. I would not expect this to be mesurably more expensive than what
> we have now, but I might be wrong.
>

I agree that this shouldn't be much expensive, but it is better to be
sure in that regard.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-05 Thread Amit Kapila
On Fri, Mar 6, 2020 at 2:19 AM Robert Haas  wrote:
>
> On Thu, Mar 5, 2020 at 2:18 PM Mahendra Singh Thalor  
> wrote:
> > Here, attaching new patch set for review.
>
> I was kind of assuming that the way this would work is that it would
> set a flag or increment a counter or something when we acquire a
> relation extension lock, and then reverse the process when we release
> it. Then the Assert could just check the flag. Walking the whole
> LOCALLOCK table is expensive.
>

I think we can keep such a flag in TopTransactionState.   We free such
locks after the work is done (except during error where we free them
at transaction abort) rather than at transaction commit, so one might
say it is better not to associate with transaction state, but not sure
if there is other better place.  Do you have any suggestions?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-05 Thread Amit Kapila
On Thu, Mar 5, 2020 at 1:54 PM Dilip Kumar  wrote:
>
> On Thu, Mar 5, 2020 at 12:15 PM Amit Kapila  wrote:
> >
> >
> > 5. I have also tried to think of another way to check if we already
> > hold lock type LOCKTAG_RELATION_EXTEND, but couldn't come up with a
> > cheaper way than this.  Basically, I think if we traverse the
> > MyProc->myProcLocks queue, we will get this information, but that
> > doesn't seem much cheaper than this.
>
> I think we can maintain a flag (rel_extlock_held).  And, we can set
> that true in LockRelationForExtension,
> ConditionalLockRelationForExtension functions and we can reset it in
> UnlockRelationForExtension or in the error path e.g. LockReleaseAll.
>

I think if we reset it in LockReleaseAll during the error path, then
we need to find a way to reset it during LockReleaseCurrentOwner as
that is called during Subtransaction Abort which can be tricky as we
don't know if it belongs to the current owner.  How about resetting in
Abort(Sub)Transaction and CommitTransaction after we release locks via
ResourceOwnerRelease.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Psql patch to show access methods info

2020-03-05 Thread vignesh C
On Fri, Mar 6, 2020 at 6:28 AM Alexander Korotkov
 wrote:
>
> On Thu, Mar 5, 2020 at 8:34 PM vignesh C  wrote:
> > On Wed, Mar 4, 2020 at 5:02 AM Alexander Korotkov 
> >  wrote:
> > >
> > > Hi!
> > >
> > > Thank you for the review.  Revised patch is attached.
> > >
> >
> > Thanks for working on comments and providing a new patch.
> > One small observation I noticed:
> > postgres=# \dAc brin oid
> >  Index access method operator classes
> >   AM  | Input type | Storage type | Operator class | Default?
> > --++--++--
> >  brin | oid|  | oid_minmax_ops | yes
> > (1 row)
> >
> > postgres=# \dAcx brin oid
> >  Index access method operator classes
> >   AM  | Input type | Storage type | Operator class | Default?
> > --++--++--
> >  brin | oid|  | oid_minmax_ops | yes
> > (1 row)
> >
> > Output of \dAc and \dAcx seems to be same. Is this expected?
>
> It might seem strange, but majority of psql commands allows arbitrary
> suffixes and ignore them.  For instance:
>
> postgres=# \dt
> Did not find any relations.
> postgres=# \dtt
> Did not find any relations.
>
> I think if we want to fix this, we should do it in a separate path,
> which would fix at the psql commands.
>

I feel your explanation sounds fair to me.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: BUG #16147: postgresql 12.1 (from homebrew) - pg_restore -h localhost --jobs=2 crashes

2020-03-05 Thread David Zhang

Hi,

I can reproduce this pg_restore crash issue (pg_dump crash too when 
running with multiple jobs) on MacOS 10.14 Mojave and MacOS 10.15 
Catalina using following steps.


1. build pg_resotre from 12.2 with "--with-gssapi" enabled, or use the 
release from https://www.postgresql.org/download/macosx/


2. start pg server and generate some load,
    pgbench -i -p 5432 -d postgres -s 10"

3. backup database,
    pg_dump -h localhost -Fc --no-acl --no-owner postgres > /tmp/128m

4. drop the tables,
    psql -d postgres -c "drop table pgbench_accounts; drop table 
pgbench_branches; drop table pgbench_history; drop table pgbench_tellers;"


5. restore database,
    pg_restore -d postgres -h localhost -Fc /tmp/128m --jobs=2
    Password:
    pg_restore: error: a worker process died unexpectedly

6. check tables, all display size 0 bytes.
    postgres=# \d+
      List of relations
     Schema |   Name   | Type  |  Owner   |  Size   | Description
+--+---+--+-+-
     public | pgbench_accounts | table | postgres | 0 bytes |
     public | pgbench_branches | table | postgres | 0 bytes |
     public | pgbench_history  | table | postgres | 0 bytes |
     public | pgbench_tellers  | table | postgres | 0 bytes |
    (4 rows)

7. core dump, about 2G,
(lldb) bt all
* thread #1, stop reason = signal SIGSTOP
  * frame #0: 0x7fff6c29c44e 
libdispatch.dylib`_dispatch_mgr_queue_push + 41
    frame #1: 0x7fff41475a74 
Security`___ZN8Security12KeychainCore14StorageManager14tickleKeychainEPNS0_12KeychainImplE_block_invoke_2 
+ 76
    frame #2: 0x7fff6c29250e 
libdispatch.dylib`_dispatch_client_callout + 8
    frame #3: 0x7fff6c29e567 
libdispatch.dylib`_dispatch_lane_barrier_sync_invoke_and_complete + 60
    frame #4: 0x7fff41475935 
Security`Security::KeychainCore::StorageManager::tickleKeychain(Security::KeychainCore::KeychainImpl*) 
+ 485
    frame #5: 0x7fff412400d8 
Security`Security::KeychainCore::KCCursorImpl::next(Security::KeychainCore::Item&) 
+ 352
    frame #6: 0x7fff41417975 
Security`Security::KeychainCore::IdentityCursor::next(Security::SecPointer&) 
+ 217

    frame #7: 0x7fff4143c4c3 Security`SecIdentitySearchCopyNext + 155
    frame #8: 0x7fff414477d8 
Security`SecItemCopyMatching_osx(__CFDictionary const*, void const**) + 261

    frame #9: 0x7fff4144b024 Security`SecItemCopyMatching + 338
    frame #10: 0x7fff56dab303 Heimdal`keychain_query + 531
    frame #11: 0x7fff56da8f4c Heimdal`hx509_certs_find + 92
    frame #12: 0x7fff56d67b52 Heimdal`_krb5_pk_find_cert + 466
    frame #13: 0x7fff376da9bb GSS`_gsspku2u_acquire_cred + 619
    frame #14: 0x7fff376bfc1c GSS`gss_acquire_cred + 940
    frame #15: 0x00010016e6e1 
libpq.5.dylib`pg_GSS_have_cred_cache(cred_out=0x000100505688) at 
fe-gssapi-common.c:67:10
    frame #16: 0x00010014f769 
libpq.5.dylib`PQconnectPoll(conn=0x000100505310) at fe-connect.c:2785:22
    frame #17: 0x00010014be9f 
libpq.5.dylib`connectDBComplete(conn=0x000100505310) at 
fe-connect.c:2095:10
    frame #18: 0x00010014bb0c 
libpq.5.dylib`PQconnectdbParams(keywords=0x7ffeefbfeee0, 
values=0x7ffeefbfeea0, expand_dbname=1) at fe-connect.c:625:10
    frame #19: 0x0001ec20 
pg_restore`ConnectDatabase(AHX=0x000100505070, dbname="postgres", 
pghost="david.highgo.ca", pgport=0x, username="david", 
prompt_password=TRI_DEFAULT) at pg_backup_db.c:287:20
    frame #20: 0x0001a75a 
pg_restore`CloneArchive(AH=0x0001002020f0) at 
pg_backup_archiver.c:4850:3
    frame #21: 0x000100017b4b 
pg_restore`RunWorker(AH=0x0001002020f0, slot=0x000100221718) at 
parallel.c:866:7
    frame #22: 0x0001000179f5 
pg_restore`ParallelBackupStart(AH=0x0001002020f0) at parallel.c:1028:4
    frame #23: 0x00014473 
pg_restore`RestoreArchive(AHX=0x0001002020f0) at 
pg_backup_archiver.c:662:12
    frame #24: 0x00011be4 pg_restore`main(argc=10, 
argv=0x7ffeefbff8f0) at pg_restore.c:447:3

    frame #25: 0x7fff6c2eb7fd libdyld.dylib`start + 1
(lldb)

8. however it works with either,
    PGGSSENCMODE=disable pg_restore -d postgres -h localhost -Fc 
/tmp/128m --jobs=2

or,
    pg_restore -d "dbname=postgres gssencmode=disable" -h localhost -Fc 
/tmp/128m --jobs=2


9. pg_config output and versions, no SSL configured,

$ pg_config
BINDIR = /Users/david/sandbox/pg122/app/bin
DOCDIR = /Users/david/sandbox/pg122/app/share/doc/postgresql
HTMLDIR = /Users/david/sandbox/pg122/app/share/doc/postgresql
INCLUDEDIR = /Users/david/sandbox/pg122/app/include
PKGINCLUDEDIR = /Users/david/sandbox/pg122/app/include/postgresql
INCLUDEDIR-SERVER = /Users/david/sandbox/pg122/app/include/postgresql/server
LIBDIR = /Users/david/sandbox/pg122/app/lib
PKGLIBDIR = /Users/david/sandbox/pg122/app/lib/postgresql
LOCALEDIR = /Users/david/sandbox/pg122/app/share/locale
MANDIR = 

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-03-05 Thread Alvaro Herrera
On 2020-Mar-06, Michael Paquier wrote:

> I was also thinking to split the patch into two pieces:
> - Introduction of common/archive.c and common/fe_archive.c (the former
> is used by xlogarchive.c and the latter only by pg_rewind).  The
> latter is dead code without the second patch, but this would validate
> the MSVC build with the buildfarm.
> - The pg_rewind pieces with the new option.

Hmm, doesn't the CF bot already validate the MSVC build?

Splitting in two seems all right, but I think one commit that introduces
dead code is not great.  It may make more sense to have one commit for
common/archive.c, and a second commit that does fe_archive plus
pg_rewind changes ...  If that doesn't work for whatever reason, then
doing a single commit may be preferrable.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Should we remove a fallback promotion? take 2

2020-03-05 Thread Michael Paquier
On Thu, Mar 05, 2020 at 09:40:54AM -0500, Robert Haas wrote:
> Seems reasonable, but it would be better if people proposed these
> kinds of changes closer to the beginning of the release cycle rather
> than in the crush at the end.

+1, to both points.
--
Michael


signature.asc
Description: PGP signature


Re: reindex concurrently and two toast indexes

2020-03-05 Thread Michael Paquier
On Thu, Mar 05, 2020 at 05:57:07PM +0100, Julien Rouhaud wrote:
> I agree that the approach wasn't quite robust.  I'll try to look at adding a
> new command for isolationtester, but that's probably not something we want to
> put in pg13?

Yes, that's too late.

> Note that while looking at it, I noticed another bug in RIC:
>
> [...]
>
> # reindex table concurrently t1;
> WARNING:  0A000: cannot reindex invalid index "public.t1_val_idx_ccold" 
> concurrently, skipping
> LOCATION:  ReindexRelationConcurrently, indexcmds.c:2821
> WARNING:  XX002: cannot reindex invalid index 
> "pg_toast.pg_toast_16395_index_ccold" concurrently, skipping
> LOCATION:  ReindexRelationConcurrently, indexcmds.c:2867
> REINDEX 
> # reindex index concurrently t1_val_idx_ccold;
> REINDEX
> 
> That case is also fixed in this patch.

This choice is intentional.  The idea about bypassing invalid indexes
for table-level REINDEX is that this would lead to a bloat in the
number of relations to handling if multiple runs are failing, leading
to more and more invalid indexes to handle each time.  Allowing a
single invalid non-toast index to be reindexed with CONCURRENTLY can
be helpful in some cases, like for example a CIC for a unique index
that failed and was invalid, where the relation already defined can be
reused.
--
Michael


signature.asc
Description: PGP signature


Re: Crash by targetted recovery

2020-03-05 Thread Kyotaro Horiguchi
At Thu, 5 Mar 2020 19:51:11 +0900, Fujii Masao  
wrote in 
> 
> 
> On 2020/03/05 12:08, Kyotaro Horiguchi wrote:
> > I understand that the reconnection for REDO record is useless. Ok I
> > take the !StandbyMode way.
> > The attached is the test script that is changed to count the added
> > test, and the slight revised main patch.
> 
> Thanks for the patch!
> 
> + /* Wal receiver should not active when entring XLOG_FROM_ARCHIVE */
> + Assert(!WalRcvStreaming());
> 
> +1 to add this assertion check.
> 
> Isn't it better to always check this while trying to read WAL from
> archive or pg_wal? So, what about the following change?
> 
> {
> case XLOG_FROM_ARCHIVE:
> case XLOG_FROM_PG_WAL:
> +   /*
> + * WAL receiver should not be running while trying to
> +* read WAL from archive or pg_wal.
> +*/
> +   Assert(!WalRcvStreaming());
> +
> /* Close any old file we might have open. */
> if (readFile >= 0)

(It seems retroverting to the first patch when I started this...)
The second place covers wider cases so I reverted the first place.

> + lastSourceFailed = false; /* We haven't failed on the new source */
> 
> Is this really necessary? Since ReadRecord() always reset
> lastSourceFailed to false, it seems not necessary.

It's just to make sure.  Actually lastSourceFailed is always false
when we get there.  But when the source is switched, lastSourceFailed
should be changed to false as a matter of design.  I'd like to do that
unless that harms.

> - else if (currentSource == 0)
> + else if (currentSource == 0 ||
> 
> Though this is a *separate topic*, 0 should be XLOG_FROM_ANY?
> There are some places where 0 is used as the value of currentSource.
> IMO they should be updated so that XLOG_FROM_ANY is used instead of 0.

Yeah, I've thought that many times but have neglected since it is not
critical and trivial as a separate patch.  I'd take the chance to do
that now.  Another minor glitch is "int oldSource = currentSource;" it
is not debugger-friendly so I changed it to XLogSource.  It is added
as a new patch file before the main patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 839edcba7e47156e77d5dc9ec7dfb9babc9ba846 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 26 Feb 2020 20:41:11 +0900
Subject: [PATCH v4 1/3] TAP test for a crash bug

---
 src/test/recovery/t/003_recovery_targets.pl | 34 -
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index fd14bab208..5b23ceb3c4 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 10;
 use Time::HiRes qw(usleep);
 
 # Create and test a standby from given backup, with a certain recovery target.
@@ -167,3 +167,35 @@ foreach my $i (0..100)
 $logfile = slurp_file($node_standby->logfile());
 ok($logfile =~ qr/FATAL:  recovery ended before configured recovery target was reached/,
 	'recovery end before target reached is a fatal error');
+
+# Corner case where targetted promotion happens on segment boundary
+$node_standby = get_new_node('standby_9');
+$node_standby->init_from_backup($node_master, 'my_backup',
+has_restoring => 1, has_streaming => 1);
+$node_standby->start;
+## read wal_segment_size
+my $result =
+  $node_standby->safe_psql('postgres', "SHOW wal_segment_size");
+die "unknown format of wal_segment_size: $result\n"
+  if ($result !~ /^([0-9]+)MB$/);
+my $segsize = $1 * 1024 * 1024;
+## stop just before the next segment boundary
+$result =
+  $node_standby->safe_psql('postgres', "SELECT pg_last_wal_replay_lsn()");
+my ($seg, $off) = split('/', $result);
+my $target = sprintf("$seg/%08X", (hex($off) / $segsize + 1) * $segsize);
+
+$node_standby->stop;
+$node_standby->append_conf('postgresql.conf', qq(
+recovery_target_inclusive=no
+recovery_target_lsn='$target'
+recovery_target_action='promote'
+));
+$node_standby->start;
+## do targetted promote
+$node_master->safe_psql('postgres', "CREATE TABLE t(); DROP TABLE t;");
+$node_master->safe_psql('postgres', "SELECT pg_switch_wal(); CHECKPOINT;");
+my $caughtup_query = "SELECT NOT pg_is_in_recovery()";
+$node_standby->poll_query_until('postgres', $caughtup_query)
+  or die "Timed out while waiting for standby to promote";
+pass("targetted promotion on segment bounary");
-- 
2.18.2

>From 82274c41cb529d87270d4e2a78dbf06cd139b8f6 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 6 Mar 2020 10:11:59 +0900
Subject: [PATCH v4 2/3] Tidy up XLogSource usage.

We used interger 0 instead of XLOG_FROM_ANY and 

Re: Do we need to handle orphaned prepared transactions in the server?

2020-03-05 Thread Bruce Momjian
On Mon, Mar  2, 2020 at 05:42:11PM +0500, Hamid Akhtar wrote:
> Here is the v2 of the same patch after rebasing it and running it through
> pgindent. There are no other code changes.

The paragraph about max_age_prepared_xacts doesn't define what is the
effect of treating a transaction as orphaned.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-03-05 Thread Michael Paquier
On Thu, Mar 05, 2020 at 07:52:24PM +0300, Alexey Kondratov wrote:
> OK, I was still having in mind pg_rewind as the only one user of this
> routine. Now it is a part of the common and I could imagine a hypothetical
> tool that is polling the archive and waiting for a specific WAL segment to
> become available. In this case 'command not found' is definitely the end of
> game, while the absence of segment is expected error, so we can continue
> looping.

Depending on the needs, we could also add in the future an option in
this API to not exit if the command is missing.  Who knows if a case
can be made..

By the way, as a matter of transparency, I have discussed this patch
with Alexander offlist and he has pinged me that I may be in a better
place to commit it per the amount of review I have done, so I'll try
to handle it.  I was also thinking to split the patch into two pieces:
- Introduction of common/archive.c and common/fe_archive.c (the former
is used by xlogarchive.c and the latter only by pg_rewind).  The
latter is dead code without the second patch, but this would validate
the MSVC build with the buildfarm.
- The pg_rewind pieces with the new option.
--
Michael


signature.asc
Description: PGP signature


Re: Psql patch to show access methods info

2020-03-05 Thread Alexander Korotkov
On Thu, Mar 5, 2020 at 8:34 PM vignesh C  wrote:
> On Wed, Mar 4, 2020 at 5:02 AM Alexander Korotkov  
> wrote:
> >
> > Hi!
> >
> > Thank you for the review.  Revised patch is attached.
> >
>
> Thanks for working on comments and providing a new patch.
> One small observation I noticed:
> postgres=# \dAc brin oid
>  Index access method operator classes
>   AM  | Input type | Storage type | Operator class | Default?
> --++--++--
>  brin | oid|  | oid_minmax_ops | yes
> (1 row)
>
> postgres=# \dAcx brin oid
>  Index access method operator classes
>   AM  | Input type | Storage type | Operator class | Default?
> --++--++--
>  brin | oid|  | oid_minmax_ops | yes
> (1 row)
>
> Output of \dAc and \dAcx seems to be same. Is this expected?

It might seem strange, but majority of psql commands allows arbitrary
suffixes and ignore them.  For instance:

postgres=# \dt
Did not find any relations.
postgres=# \dtt
Did not find any relations.

I think if we want to fix this, we should do it in a separate path,
which would fix at the psql commands.

BTW, new revision of the patch is attached.  It contains cosmetic
changes to the documentation, comments etc.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-Add-psql-AM-info-commands-v13.patch
Description: Binary data


[PATCH] Make pkg-config files cross-compile friendly

2020-03-05 Thread Sebastian Kemper
Currently the pc files use hard coded paths for "includedir" and
"libdir."

Example:

  Cflags: -I/usr/include
  Libs: -L/usr/lib -lpq

This is not very fortunate when cross compiling inside a buildroot,
where the includes and libs are inside a staging directory, because this
introduces host paths into the build:

  checking for pkg-config... 
/builder/shared-workdir/build/sdk/staging_dir/host/bin/pkg-config
  checking for PostgreSQL libraries via pkg_config... -L/usr/lib <

This commit addresses this by doing the following two things:

  1. Instead of hard coding the paths in "Cflags" and "Libs"
 "${includedir}" and "${libdir}" are used. Note: these variables can
 be overriden on the pkg-config command line
 ("--define-variable=libdir=/some/path").

  2. Add the variables "prefix" and "exec_prefix". If "includedir"
 and/or "libdir" are using these then construct them accordingly.
 This is done because buildroots (for instance OpenWrt) tend to
 rename the real pkg-config and call it indirectly from a script
 that sets "prefix", "exec_prefix" and "bindir", like so:

 pkg-config.real --define-variable=prefix=${STAGING_PREFIX} \
   --define-variable=exec_prefix=${STAGING_PREFIX} \
   --define-variable=bindir=${STAGING_PREFIX}/bin $@

Example #1: user calls ./configure with "--libdir=/some/lib" and
"--includedir=/some/include":

  prefix=/usr/local/pgsql
  exec_prefix=/usr/local/pgsql
  includedir=/some/include
  libdir=/some/lib

  Name: libpq
  Description: PostgreSQL libpq library
  Url: http://www.postgresql.org/
  Version: 12.1
  Requires:
  Requires.private:
  Cflags: -I${includedir}
  Libs: -L${libdir} -lpq
  Libs.private:  -lcrypt -lm

Example #2: user calls ./configure with no arguments:

  prefix=/usr/local/pgsql
  exec_prefix=/usr/local/pgsql
  includedir=${prefix}/include
  libdir=${exec_prefix}/lib

  Name: libpq
  Description: PostgreSQL libpq library
  Url: http://www.postgresql.org/
  Version: 12.1
  Requires:
  Requires.private:
  Cflags: -I${includedir}
  Libs: -L${libdir} -lpq
  Libs.private:  -lcrypt -lm

Like this the paths can be forced into the staging directory when using
a buildroot setup:

  checking for pkg-config... 
/home/sk/tmp/openwrt/staging_dir/host/bin/pkg-config
  checking for PostgreSQL libraries via pkg_config... 
-L/home/sk/tmp/openwrt/staging_dir/target-mips_24kc_musl/usr/lib

Signed-off-by: Sebastian Kemper 
---
 src/Makefile.shlib | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/src/Makefile.shlib b/src/Makefile.shlib
index 29a7f6d38c..33c23dabdd 100644
--- a/src/Makefile.shlib
+++ b/src/Makefile.shlib
@@ -387,14 +387,27 @@ endif # PORTNAME == cygwin || PORTNAME == win32


 %.pc: $(MAKEFILE_LIST)
-   echo 'Name: lib$(NAME)' >$@
+   echo 'prefix=$(prefix)' >$@
+   echo 'exec_prefix=$(exec_prefix)' >>$@
+ifeq ($(patsubst $(prefix)/%,,$(includedir)),)
+   echo 'includedir=$${prefix}/$(patsubst $(prefix)/%,%,$(includedir))' 
>>$@
+else
+   echo 'includedir=$(includedir)' >>$@
+endif
+ifeq ($(patsubst $(exec_prefix)/%,,$(libdir)),)
+   echo 'libdir=$${exec_prefix}/$(patsubst $(exec_prefix)/%,%,$(libdir))' 
>>$@
+else
+   echo 'libdir=$(libdir)' >>$@
+endif
+   echo >>$@
+   echo 'Name: lib$(NAME)' >>$@
echo 'Description: PostgreSQL lib$(NAME) library' >>$@
echo 'Url: $(PACKAGE_URL)' >>$@
echo 'Version: $(VERSION)' >>$@
echo 'Requires: ' >>$@
echo 'Requires.private: $(PKG_CONFIG_REQUIRES_PRIVATE)' >>$@
-   echo 'Cflags: -I$(includedir)' >>$@
-   echo 'Libs: -L$(libdir) -l$(NAME)' >>$@
+   echo 'Cflags: -I$${includedir}' >>$@
+   echo 'Libs: -L$${libdir} -l$(NAME)' >>$@
 # Record -L flags that the user might have passed in to the PostgreSQL
 # build to locate third-party libraries (e.g., ldap, ssl).  Filter out
 # those that point inside the build or source tree.  Use sort to
--
2.24.1





Re: Additional improvements to extended statistics

2020-03-05 Thread Tomas Vondra

Hi,

Here is a rebased version of this patch series. I've polished the first
two parts a bit - estimation of OR clauses and (Var op Var) clauses, and
added a bunch of regression tests to exercise this code. It's not quite
there yet, but I think it's feasible to get this committed for PG13.

The last part (extended stats on expressions) is far from complete, and
it's not feasible to get it into PG13. There's too much missing stuff.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
>From d7f639b6150fe9fd179066af2a536465d877842a Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Mon, 2 Dec 2019 23:02:17 +0100
Subject: [PATCH 1/3] Support using extended stats for parts of OR clauses

---
 src/backend/optimizer/path/clausesel.c| 109 +++---
 src/backend/statistics/extended_stats.c   |  45 +++-
 src/backend/statistics/mcv.c  |   5 +-
 .../statistics/extended_stats_internal.h  |   3 +-
 src/include/statistics/statistics.h   |   3 +-
 src/test/regress/expected/stats_ext.out   |   3 +-
 src/test/regress/sql/stats_ext.sql|   1 -
 7 files changed, 138 insertions(+), 31 deletions(-)

diff --git a/src/backend/optimizer/path/clausesel.c 
b/src/backend/optimizer/path/clausesel.c
index a3ebe10592..8c1a404ce2 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -92,7 +92,7 @@ clauselist_selectivity(PlannerInfo *root,
 */
s1 *= statext_clauselist_selectivity(root, clauses, varRelid,

 jointype, sjinfo, rel,
-   
 );
+   
 , false);
}
 
/*
@@ -104,6 +104,89 @@ clauselist_selectivity(PlannerInfo *root,

  estimatedclauses);
 }
 
+/*
+ * clauselist_selectivity_or -
+ *   Compute the selectivity of an implicitly-ORed list of boolean
+ *   expression clauses.  The list can be empty, in which case 0.0
+ *   must be returned.  List elements may be either RestrictInfos
+ *   or bare expression clauses --- the former is preferred since
+ *   it allows caching of results.
+ *
+ * See clause_selectivity() for the meaning of the additional parameters.
+ *
+ * The basic approach is to apply extended statistics first, on as many
+ * clauses as possible, in order to capture cross-column dependencies etc.
+ * The remaining clauses are then estimated using regular statistics tracked
+ * for individual columns.  This is done by simply passing the clauses to
+ * clauselist_selectivity and then combining the selectivities using the
+ * regular formula (s1+s2 - s1*s2).
+ */
+static Selectivity
+clauselist_selectivity_or(PlannerInfo *root,
+ List *clauses,
+ int varRelid,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo)
+{
+   ListCell   *lc;
+   Selectivity s1 = 0.0;
+   RelOptInfo *rel;
+   Bitmapset  *estimatedclauses = NULL;
+   int listidx;
+
+   /*
+* Determine if these clauses reference a single relation.  If so, and 
if
+* it has extended statistics, try to apply those.
+*/
+   rel = find_single_rel_for_clauses(root, clauses);
+   if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
+   {
+   /*
+* Estimate as many clauses as possible using extended 
statistics.
+*
+* 'estimatedclauses' tracks the 0-based list position index of
+* clauses that we've estimated using extended statistics, and 
that
+* should be ignored.
+*
+* XXX We can't multiply with current value, because for OR 
clauses
+* we start with 0.0, so we simply assign to s1 directly.
+*/
+   s1 = statext_clauselist_selectivity(root, clauses, varRelid,
+   
jointype, sjinfo, rel,
+   
, true);
+   }
+
+   /*
+* Selectivities of the remaining clauses for an OR clause are computed
+* as s1+s2 - s1*s2 to account for the probable overlap of selected 
tuple
+* sets. The clauses estimated using extended statistics are effectively
+* treated as a single clause.
+*
+* XXX is this too conservative?
+*/

Re: our checks for read-only queries are not great

2020-03-05 Thread Bruce Momjian
On Mon, Jan 27, 2020 at 02:26:42PM -0500, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Mon, Jan 13, 2020 at 01:56:30PM -0500, Stephen Frost wrote:
> > > > I think that having ALTER SYSTEM commands in pg_dumpall output
> > > > would be a problem.  It would cause all kinds of problems whenever
> > > > parameters change.  Thinking of the transition "checkpoint_segments"
> > > > -> "max_wal_size", you'd have to build some translation magic into 
> > > > pg_dump.
> > > > Besides, such a feature would make it harder to restore a dump taken
> > > > with version x into version x + n for n > 0.
> > > 
> > > pg_dump already specifically has understanding of how to deal with old
> > > options in other things when constructing a dump for a given version-
> > > and we already have issues that a dump taken with pg_dump X has a good
> > > chance of now being able to be restoreding into a PG X+1, that's why
> > > it's recommended to use the pg_dump for the version of PG you're
> > > intending to restore into, so I don't particularly agree with any of the
> > > arguments presented above.
> > 
> > One issue is that system table GUC settings (e.g., per-database,
> > per-user) cannot include postgresql.conf-only settings, like
> > max_wal_size, so system tables GUC settings are less likely to be
> > renamed than postgresql.conf.auto settings.  FYI, we are more inclined
> > to allow postgresql.conf-only changes than others because there is less
> > impact on applications.
> 
> I'm a bit unclear about what's being suggested here.  When you are
> talking about 'applications', are you referring specifically to pg_dump
> and pg_restore, or are you talking about regular user applications?

Sorry for the late reply.  I meant all applications.

> If you're referring to pg_dump/restore, then what I'm understanding from
> your comments is that if we made pg_dump/restore aware of ALTER SYSTEM
> and were made to support it that we would then be less inclined to
> change the names of postgresql.conf-only settings because, if we do so,
> we have to update pg_dump/restore.
> 
> I can see some argument in that direction but my initial reaction is
> that I don't feel like the bar would really be moved very far, and, if
> we came up with some mapping from one to the other for those, I actually
> think it'd be really helpful downstream for packagers and such who
> routinely are dealing with updating from an older postgresql.conf file
> to a newer one when an upgrade is done.

I should have given more examples.  Changing GUC variables like
search_path or work_mem, which can be set in per-database, per-user, and
per-session contexts, is more disruptive than changing GUCs that can
only can be set in postgresql.conf, like max_wal_size.  My point is that
all GUC changes don't have the same level of disruption.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-05 Thread Tom Lane
James Coleman  writes:
> That's what I figured, but as I mentioned I've having trouble figuring out
> how the fact that an analyze is in flight is determined. I assume it's
> something that lives of the EState or similar, but I'm not seeing anything
> obvious.

AFAIR, it's just whether or not the current planstate node has an
instrumentation struct attached.

regards, tom lane




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-05 Thread James Coleman
On Thu, Mar 5, 2020 at 5:53 PM Tom Lane  wrote:

> James Coleman  writes:
> > I'm looking at this now, and realized that at least for parallel plans
> the
> > current patch tracks the tuplesort instrumentation whether or not an
> > EXPLAIN ANALYZE is in process.
>
> > Is this fairly standard for executor nodes? Or is it expected to
> condition
> > some of this tracking based on whether or not an ANALYZE is running?
>
> No, it's entirely not standard.  Maybe you could make an argument that
> it's too cheap to bother making it conditional, but without a convincing
> argument for that, it needs to be conditional.
>

That's what I figured, but as I mentioned I've having trouble figuring out
how the fact that an analyze is in flight is determined. I assume it's
something that lives of the EState or similar, but I'm not seeing anything
obvious.

Thanks
James


Re: Proposal: PqSendBuffer removal

2020-03-05 Thread Aleksei Ivanov
*> What do you mean "just one syscall"?  The entire point here is that
it'd take more syscalls to send the same amount of data.*

I mean that it messages are large enough more than 2K we will need 4
syscalls without copy it to the internal buffer, but currently we will copy
8K of messages and send it using 1 call. I think that under some threshold
of packet length it is redundant to copy it to internal buffer and the data
can be sent directly.








*> It does strike me that with the v3 protocol, we do sometimes have
caseswhere internal_putbytes is reached with a fairly large "len".  If
we'veflushed out what's in PqSendBuffer to start with, and there's more
thana bufferload remaining in the source data, we could send the sourcedata
directly to output without copying it to the buffer first.That could
actually result in *fewer* kernel calls not more, if "len"is large enough.
But I strongly doubt that a code path that netsout to more kernel calls
will win.*

Yes, internal_putbytes can be updated to send data directly if the length
is more than internal buffer size.

On Thu, Mar 5, 2020 at 15:04 Tom Lane  wrote:

> Aleksei Ivanov  writes:
> > Yes, you are right there will be a separate call to send the data, but is
> > copying data each time more costly operation than just one syscall?
>
> What do you mean "just one syscall"?  The entire point here is that it'd
> take more syscalls to send the same amount of data.
>
> It does strike me that with the v3 protocol, we do sometimes have cases
> where internal_putbytes is reached with a fairly large "len".  If we've
> flushed out what's in PqSendBuffer to start with, and there's more than
> a bufferload remaining in the source data, we could send the source
> data directly to output without copying it to the buffer first.
> That could actually result in *fewer* kernel calls not more, if "len"
> is large enough.  But I strongly doubt that a code path that nets
> out to more kernel calls will win.
>
> regards, tom lane
>


Re: Retiring support for pre-7.3 FK constraint triggers

2020-03-05 Thread Daniel Gustafsson
> On 5 Mar 2020, at 21:52, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> Having code which is untested and not excercised by developers (or users, if 
>> my
>> assumption holds), yet being reachable by SQL, runs the risk of introducing
>> subtle bugs.  Is there a usecase for keeping it, or can/should it be removed 
>> in
>> 14?  That would still leave a lot of supported versions to upgrade to in case
>> there are users to need this.
> 
> Pushed.  Looking at the original commit, I noticed one now-obsolete
> comment that should also be removed, so I did that.

Thanks, I was looking around but totally missed that comment.

cheers ./daniel



Re: Allowing ALTER TYPE to change storage strategy

2020-03-05 Thread Tom Lane
Tomas Vondra  writes:
> Yeah, I agree #1 seems like the cleanest/best option. Are you worried
> about the overhead due to the extra complexity, or overhead due to
> cache getting invalidated for this particular reason?

The overhead is basically a hash_seq_search traversal over the typcache
each time we get a pg_type inval event, which there could be a lot of.
On the other hand we have a lot of inval overhead already, so this might
not amount to anything noticeable.

regards, tom lane




Re: Proposal: PqSendBuffer removal

2020-03-05 Thread Tom Lane
Aleksei Ivanov  writes:
> Yes, you are right there will be a separate call to send the data, but is
> copying data each time more costly operation than just one syscall?

What do you mean "just one syscall"?  The entire point here is that it'd
take more syscalls to send the same amount of data.

It does strike me that with the v3 protocol, we do sometimes have cases
where internal_putbytes is reached with a fairly large "len".  If we've
flushed out what's in PqSendBuffer to start with, and there's more than
a bufferload remaining in the source data, we could send the source
data directly to output without copying it to the buffer first.
That could actually result in *fewer* kernel calls not more, if "len"
is large enough.  But I strongly doubt that a code path that nets
out to more kernel calls will win.

regards, tom lane




Re: pgbench: option delaying queries till connections establishment?

2020-03-05 Thread Fabien COELHO


Hello Andres,


Slight aside: Have you ever looked at moving pgbench to non-blocking
connection establishment? It seems weird to use non-blocking everywhere
but connection establishment.


Nope. If there is some interest, why not. The reason for not doing it is 
that the typical use case is just to connect once at the beginning so that 
connections do not matter anyway. Now with -C it makes sense.



I've changed the performance calculations depending on -C or not. Ramp-up
effects are smoothed.



I've merged all time-related stuff (time_t, instr_time, int64) to use a
unique type (pg_time_usec_t) and set of functions/macros, which simplifies
the code somehow.


Hm.  I'm not convinced it's a good idea for pgbench to do its own thing
here.


Having 3 time types (in fact, 4, double is used as well for some 
calculations) in just one file to deal with time does not help much to 
understand the code, and there is quite a few line to translate from one 
to the other.



+static int pthread_barrier_init(pthread_barrier_t *barrier, void *unused, 
int nthreads);


How about using 'struct unknown_type *unused' instead of "unused"?
Because the void *unused will accept everything...


Never encountered this pattern. It does not seem to be used anywhere in pg 
sources. I'd be afraid that some compilers would complain. I can try 
anyway.



+/* Thread synchronization barriers */
+static pthread_barrier_t
+   start_barrier,  /* all threads are started */
+   bench_barrier;  /* benchmarking ready to start */
+


We don't really need two barriers here. The way that pthread barriers 
are defined is that they 'reset' after all the threads have arrived. You 
can argue we still want that, but ...


Yes, one barrier could be reused.


 /* print out results */
 static void
-printResults(StatsData *total, instr_time total_time,
-instr_time conn_total_time, int64 latency_late)
+printResults(StatsData *total,


Given that we're changing the output (for the better) of pgbench again,
I wonder if we should add the pgbench version to the benchmark
output.


Dunno. Maybe.


Otherwise it seems easy to end up e.g. seeing a performance
difference between pg12 and pg14, where all that's actually happening is
a different output because each run used the respective pgbench version.


Yep.


+pg_time_usec_t total_delay,/* benchmarking 
time */
+pg_time_usec_t conn_total_delay,   /* is_connect */
+pg_time_usec_t conn_elapsed_delay, /* !is_connect 
*/
+int64 latency_late)


I'm not a fan of naming these 'delay'. To me that doesn't sounds like
it's about the time the total benchmark has taken.


Hmmm… I'd like to differentiate variable names which contain timestamp 
versus those which contain intervals, given that it is the same underlying 
type. That said, I'm not very happy with "delay" either.


What would you suggest?


+pg_time_get_usec(void)


For me the function name sounds like you're getting the usec out of a
pg_time. Not that it's getting a new timestamp.


Ok, I'll think of something else, possibly "pg_now"? "pg_time_now"?


+#define PG_TIME_SET_CURRENT_LAZY(t)\
+   if ((t) == 0)   \
+   (t) = pg_time_get_usec()
+
+#define PG_TIME_GET_DOUBLE(t) (0.01 * (t))
 #endif /* INSTR_TIME_H */


I'd make it an inline function instead of this.


I did it that way because it was already done with defines on instr_time, 
but I'm fine with inline.


I'll try to look at it over the week-end.

--
Fabien.

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-05 Thread Tom Lane
James Coleman  writes:
> I'm looking at this now, and realized that at least for parallel plans the
> current patch tracks the tuplesort instrumentation whether or not an
> EXPLAIN ANALYZE is in process.

> Is this fairly standard for executor nodes? Or is it expected to condition
> some of this tracking based on whether or not an ANALYZE is running?

No, it's entirely not standard.  Maybe you could make an argument that
it's too cheap to bother making it conditional, but without a convincing
argument for that, it needs to be conditional.

regards, tom lane




Re: Allowing ALTER TYPE to change storage strategy

2020-03-05 Thread Tom Lane
I wrote:
> If not, we probably should bite the bullet and go for #1, since
> I have little doubt that we'll need that someday anyway.
> The trick will be to keep down the cache invalidation overhead...

Here's a version that does it like that.  I'm less worried about the
overhead than I was before, because I realized that we already had
a syscache callback for pg_type there.  And it was being pretty
stupid about which entries it reset, too, so this version might
actually net out as less overhead (in some workloads anyway).

For ease of review I just added the new TCFLAGS value out of
sequence, but I'd be inclined to renumber the bits back into
sequence before committing.

regards, tom lane

diff --git a/doc/src/sgml/ref/alter_type.sgml b/doc/src/sgml/ref/alter_type.sgml
index 67be1dd..c20ed99 100644
--- a/doc/src/sgml/ref/alter_type.sgml
+++ b/doc/src/sgml/ref/alter_type.sgml
@@ -30,6 +30,7 @@ ALTER TYPE name RENAME TO name SET SCHEMA new_schema
 ALTER TYPE name ADD VALUE [ IF NOT EXISTS ] new_enum_value [ { BEFORE | AFTER } neighbor_enum_value ]
 ALTER TYPE name RENAME VALUE existing_enum_value TO new_enum_value
+ALTER TYPE name SET ( property = value [, ... ] )
 
 where action is one of:
 
@@ -70,7 +71,7 @@ ALTER TYPE name RENAME VALUE 
 

-SET DATA TYPE
+ALTER ATTRIBUTE ... SET DATA TYPE
 
  
   This form changes the type of an attribute of a composite type.
@@ -135,6 +136,84 @@ ALTER TYPE name RENAME VALUE 
 

+
+   
+
+ SET ( property = value [, ... ] )
+
+
+ 
+  This form is only applicable to base types.  It allows adjustment of a
+  subset of the base-type properties that can be set in CREATE
+  TYPE.  Specifically, these properties can be changed:
+  
+   
+
+ RECEIVE can be set to the name of a binary input
+ function, or NONE to remove the type's binary
+ input function.  Using this option requires superuser privilege.
+
+   
+   
+
+ SEND can be set to the name of a binary output
+ function, or NONE to remove the type's binary
+ output function.  Using this option requires superuser privilege.
+
+   
+   
+
+ TYPMOD_IN can be set to the name of a type
+ modifier input function, or NONE to remove the
+ type's type modifier input function.  Using this option requires
+ superuser privilege.
+
+   
+   
+
+ TYPMOD_OUT can be set to the name of a type
+ modifier output function, or NONE to remove the
+ type's type modifier output function.  Using this option requires
+ superuser privilege.
+
+   
+   
+
+ ANALYZE can be set to the name of a type-specific
+ statistics collection function, or NONE to remove
+ the type's statistics collection function.  Using this option
+ requires superuser privilege.
+
+   
+   
+
+ STORAGE
+  TOAST
+  per-type storage settings
+ 
+ can be set to plain,
+ extended, external,
+ or main (see  for
+ more information about what these mean).  However, changing
+ from plain to another setting requires superuser
+ privilege (because it requires that the type's C functions all be
+ TOAST-ready), and changing to plain from another
+ setting is not allowed at all (since the type may already have
+ TOASTed values present in the database).  Note that changing this
+ option doesn't by itself change any stored data, it just sets the
+ default TOAST strategy to be used for table columns created in the
+ future.  See  to change the TOAST
+ strategy for existing table columns.
+
+   
+  
+  See  for more details about these
+  type properties.  Note that where appropriate, a change in these
+  properties for a base type will be propagated automatically to domains
+  based on that type.
+ 
+
+   
   
   
 
@@ -156,7 +235,7 @@ ALTER TYPE name RENAME VALUE USAGE privilege on the data type.
+   have USAGE privilege on the attribute's data type.
   
  
 
@@ -353,7 +432,20 @@ ALTER TYPE colors ADD VALUE 'orange' AFTER 'red';
To rename an enum value:
 
 ALTER TYPE colors RENAME VALUE 'purple' TO 'mauve';
-
+
+  
+
+  
+   To create binary I/O functions for an existing base type:
+
+CREATE FUNCTION mytypesend(mytype) RETURNS bytea ...;
+CREATE FUNCTION mytyperecv(internal, oid, integer) RETURNS mytype ...;
+ALTER TYPE mytype SET (
+send = mytypesend,
+receive = mytyperecv
+);
+
+  
  
 
  
diff --git a/src/backend/catalog/pg_type.c b/src/backend/catalog/pg_type.c
index 56e0bcf..cd56714 100644
--- a/src/backend/catalog/pg_type.c
+++ b/src/backend/catalog/pg_type.c
@@ -155,8 +155,8 @@ TypeShellMake(const char *typeName, Oid typeNamespace, Oid 

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-05 Thread James Coleman
On Tue, Jan 21, 2020 at 9:37 AM James Coleman  wrote:

> That being said, the patch also needs some more work on improving
> EXPLAIN ANALYZE output (perhaps min/max/mean or median of
> memory usage number of groups in each sort mode), and I think it's far
> more feasible that I can tackle that piecemeal before the next CF.
>

I'm looking at this now, and realized that at least for parallel plans the
current patch tracks the tuplesort instrumentation whether or not an
EXPLAIN ANALYZE is in process.

Is this fairly standard for executor nodes? Or is it expected to condition
some of this tracking based on whether or not an ANALYZE is running?

I'm found EXEC_FLAG_EXPLAIN_ONLY but no parallel for analyze. Similarly the
InstrumentOption bit flags on the executor state seems to indicate whether
specific ANALYZE options should be enabled, but I haven't yet seen anything
conditioned solely on whether an ANALYZE is in flight. Could someone point
me in the right direction is this is expected?

Thanks,
James


Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-05 Thread David Rowley
On Fri, 6 Mar 2020 at 03:27, Laurenz Albe  wrote:
>
> On Thu, 2020-03-05 at 19:40 +1300, David Rowley wrote:
> > 1. I'd go for 2 new GUCs and reloptions.
> > autovacuum_vacuum_insert_threshold (you're currently calling this
> > autovacuum_vacuum_insert_limit. I don't see why the word "limit" is
> > relevant here). The other GUC I think should be named
> > autovacuum_vacuum_insert_scale_factor and these should work exactly
> > the same way as autovacuum_vacuum_threshold and
> > autovacuum_vacuum_scale_factor, but be applied in a similar way to the
> > vacuum settings, but only be applied after we've checked to ensure the
> > table is not otherwise eligible to be vacuumed.
>
> I disagree about the scale_factor (and have not added it to the
> updated version of the patch).  If we have a scale_factor, then the
> time between successive autovacuum runs would increase as the table
> gets bigger, which defeats the purpose of reducing the impact of each
> autovacuum run.

My view here is not really to debate what logically makes the most
sense.  I don't really think for a minute that the current
auto-vacuums scale_factor and thresholds are perfect for the job. It's
true that the larger a table becomes, the less often it'll be
vacuumed, but these are control knobs that people have become
accustomed to and I don't really think that making an exception for
this is warranted.  Perhaps we can zero out the scale factor by
default and set the threshold into the millions of tuples. We can have
people chime in on what they think about that and why once the code is
written and even perhaps committed.

Lack of a scale_factor does leave people who regularly truncate their
"append-only" tables out in the cold a bit.  Perhaps they'd like
index-only scans to kick in soon after they truncate without having to
wait for 10 million tuples, or so.

> > 10. I'm slightly worried about the case where we don't quite trigger a
> > normal vacuum but trigger a vacuum due to INSERTs then skip cleaning
> > up the indexes but proceed to leave dead index entries causing indexes
> > to become bloated.  It does not seem impossible that given the right
> > balance of INSERTs and UPDATE/DELETEs that this could happen every
> > time and the indexes would just become larger and larger.
>
> I understand.
>
> This might particularly be a problem with larger tables, where
> a normal autovacuum is rare because of the scale_factor.
>
> Perhaps we can take care of the problem by *not* skipping index
> cleanup if "changes_since_analyze" is substantially greater than 0.
>
> What do you think?

Well, there is code that skips the index scans when there are 0 dead
tuples found in the heap. If the table is truly INSERT-only then it
won't do any harm since we'll skip the index scan anyway.  I think
it's less risky to clean the indexes. If we skip that then there will
be a group of people will suffer from index bloat due to this, no
matter if they realise it or not.

> > 11. We probably do also need to debate if we want this on or off by
> > default.   I'd have leaned towards enabling by default if I'd not
> > personally witnessed the fact that people rarely* increase auto-vacuum
> > to run faster than the standard cost settings. I've seen hundreds of
> > servers over the years with all workers busy for days on something
> > they'll never finish quickly enough.  We increased those settings 10x
> > in PG12, so there will be fewer people around suffering from that now,
> > but even after having reduced the vacuum_cost_delay x10 over the PG11
> > settings, it's by no means fast enough for everyone.  I've mixed
> > feelings about giving auto-vacuum more work to do for those people, so
> > perhaps the best option is to keep this off by default so as not to
> > affect the people who don't tune auto-vacuum.  They'll just suffer the
> > pain all at once when they hit max freeze age instead of more
> > gradually with the additional load on the workers.   At least adding
> > this feature gives the people who do tune auto-vacuum some ability to
> > handle read-only tables in some sane way.
> >
> > An alternative way of doing it would be to set the threshold to some
> > number of million tuples and set the scale_factor to 0.2 so that it
> > only has an effect on larger tables, of which generally people only
> > have a smallish number of.
>
> Yes, I think that disabling this by default defeats the purpose.

Perhaps the solution to that is somewhere else then.  I can picture
some sort of load average counters for auto-vacuum and spamming the
logs with WARNINGs if we maintain high enough load for long enough,
but we'd likely be better just completely overhauling the vacuum cost
settings to be a percentage of total effort rather than some fixed
speed.  That would allow more powerful servers to run vacuum more
quickly and it would also run more quickly during low load periods.
We'd just need to sample now and again how long vacuuming a series of
page takes then sleep for a time 

Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-03-05 Thread Kartyshov Ivan

On 2018-03-06 14:50, Simon Riggs wrote:
On 6 March 2018 at 11:24, Dmitry Ivanov  
wrote:

In PG11, I propose the following command, sticking mostly to Ants'
syntax, and allowing to wait for multiple events before it returns. 
It

doesn't hold snapshot and will not get cancelled by Hot Standby.

WAIT FOR event [, event ...] options

event is
LSN value
TIMESTAMP value

options
TIMEOUT delay
UNTIL TIMESTAMP timestamp
(we have both, so people don't need to do math, they can use 
whichever

they have)



I have a (possibly) dumb question: if we have specified several 
events,

should WAIT finish if only one of them triggered? It's not immediately
obvious if we're waiting for ALL of them to trigger, or just one will
suffice (ANY). IMO the syntax could be extended to something like:

WAIT FOR [ANY | ALL] event [, event ...] options,

with ANY being the default variant.


+1


Here I made new patch of feature, discussed above.

WAIT FOR - wait statement to pause beneath statements
==

Synopsis
==
WAIT FOR [ANY | SOME | ALL] event [, event ...] options
and event is:
LSN value
TIMESTAMP value

and options is:
TIMEOUT delay
UNTIL TIMESTAMP timestamp
Description
==
WAIT FOR - make to wait statements (that are beneath) on sleep until
event happens (Don’t process new queries until an event happens).

How to use it
==
WAIT FOR LSN ‘LSN’ [, timeout in ms];

#Wait until LSN 0/303EC60 will be replayed, or 10 second passed.
WAIT FOR ANY LSN ‘0/303EC60’, TIMEOUT 1;

#Or same without timeout.
WAIT FOR LSN ‘0/303EC60’;

#Or wait for some timestamp.
WAIT FOR TIMESTAMP '2020-01-02 17:20:19.028161+03';

#Wait until ALL events occur: LSN to be replayed and timestamp
passed.
WAIT FOR ALL LSN ‘0/303EC60’, TIMESTAMP '2020-01-28 11:10:39.021341+03';

Notice: WAIT FOR will release on PostmasterDeath or Interruption events
if they come earlier then LSN or timeout.

Testing the implementation
==
The implementation was tested with src/test/recovery/t/018_waitfor.pl

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 8d91f3529e6..1a2bc7dfa93 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -187,6 +187,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/waitfor.sgml b/doc/src/sgml/ref/waitfor.sgml
new file mode 100644
index 000..8fa2ddb4492
--- /dev/null
+++ b/doc/src/sgml/ref/waitfor.sgml
@@ -0,0 +1,136 @@
+
+
+
+ 
+  WAIT FOR
+ 
+
+ 
+  WAIT FOR
+  7
+  SQL - Language Statements
+ 
+
+ 
+  WAIT FOR
+  wait for the target LSN to be replayed
+ 
+
+ 
+
+WAIT FOR LSN 'lsn_number' 
+WAIT FOR LSN 'lsn_number' TIMEOUT wait_timeout
+WAIT FOR LSN 'lsn_number' UNTIL TIMESTAMP wait_time
+WAIT FOR TIMESTAMP wait_time 
+
+ 
+
+ 
+  Description
+
+  
+   WAIT FOR provides a simple
+   interprocess communication mechanism to wait for timestamp or the target log sequence
+   number (LSN) on standby in PostgreSQL
+   databases with master-standby asynchronous replication. When run with the
+   LSN option, the WAIT FOR command
+   waits for the specified LSN to be replayed. By default, wait
+   time is unlimited. Waiting can be interrupted using Ctrl+C, or
+   by shutting down the postgres server. You can also limit the wait
+   time using the TIMEOUT option.
+  
+
+ 
+
+ 
+  Parameters
+
+  
+   
+LSN
+
+ 
+  Specify the target log sequence number to wait for.
+ 
+
+   
+
+   
+TIMEOUT wait_timeout
+
+ 
+  Limit the time interval to wait for the LSN to be replayed.
+  The specified wait_timeout must be an integer
+  and is measured in milliseconds.
+ 
+
+   
+
+   
+UNTIL TIMESTAMP wait_time
+
+ 
+  Limit the time to wait for the LSN to be replayed.
+  The specified wait_time must be timestamp.
+ 
+
+   
+
+  
+ 
+
+ 
+  Examples
+
+  
+   Run WAIT FOR from psql,
+   limiting wait time to 1 milliseconds:
+
+
+WAIT FOR LSN '0/3F07A6B1' TIMEOUT 1;
+NOTICE:  LSN is not reached. Try to increase wait time.
+LSN reached
+-
+ f
+(1 row)
+
+  
+
+  
+   Wait until the specified LSN is replayed:
+
+WAIT FOR LSN '0/3F07A611';
+LSN reached
+-
+ t
+(1 row)
+
+  
+
+  
+   Limit LSN wait time to 50 milliseconds, and then cancel the command:
+
+WAIT FOR LSN '0/3F0FF791' TIMEOUT 50;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to increase wait time.
+ERROR:  canceling statement due to user request
+ LSN reached
+-
+ f
+(1 row)
+
+
+ 
+
+ 
+  Compatibility
+
+  
+   There is no WAIT FOR statement in the SQL
+   standard.
+  
+ 
+
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index cef09dd38b3..bfe62ee3516 100644
--- a/doc/src/sgml/reference.sgml
+++ 

Re: Proposal: PqSendBuffer removal

2020-03-05 Thread Aleksei Ivanov
Thank you for your reply!

Yes, you are right there will be a separate call to send the data, but is
copying data each time more costly operation than just one syscall?

Besides, if we already have a ready message packet to be sent why should we
wait?

Waiting for your reply,
Best regards!



On Thu, Mar 5, 2020 at 13:10 Tom Lane  wrote:

> Aleksei Ivanov  writes:
> > I am really curious what was the original intention of using the
> > PqSendBuffer and is it possible to remove it now.
>
> > Currently all messages are copied from StringInfo to this buffer and
> sent,
> > which from my point of view is redundant operation.
>
> That would mean doing a separate send() kernel call for every few bytes,
> no?  I think the point of that buffer is to be sure we accumulate a
> reasonable number of bytes to pass to the kernel for each send().
>
> regards, tom lane
>


Re: Allowing ALTER TYPE to change storage strategy

2020-03-05 Thread Tomas Vondra

On Thu, Mar 05, 2020 at 02:52:44PM -0500, Tom Lane wrote:

Tomas Vondra  writes:

FWIW I'm not suggesting you go and implement #1 or #2 for me, that'd be
up to me I guess. But I disagree there's no use case for it, and #3
makes this featuer useless for me.


OK, then we need to do something else.  Do you have ideas for other
alternatives?



I don't have any other ideas, unfortunately. And I think if I had one,
it'd probably be some sort of ugly hack anyway :-/


If not, we probably should bite the bullet and go for #1, since
I have little doubt that we'll need that someday anyway.
The trick will be to keep down the cache invalidation overhead...



Yeah, I agree #1 seems like the cleanest/best option. Are you worried
about the overhead due to the extra complexity, or overhead due to
cache getting invalidated for this particular reason?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Proposal: PqSendBuffer removal

2020-03-05 Thread Tom Lane
Aleksei Ivanov  writes:
> I am really curious what was the original intention of using the
> PqSendBuffer and is it possible to remove it now.

> Currently all messages are copied from StringInfo to this buffer and sent,
> which from my point of view is redundant operation.

That would mean doing a separate send() kernel call for every few bytes,
no?  I think the point of that buffer is to be sure we accumulate a
reasonable number of bytes to pass to the kernel for each send().

regards, tom lane




Proposal: PqSendBuffer removal

2020-03-05 Thread Aleksei Ivanov
Dear community,

I am really curious what was the original intention of using the
PqSendBuffer and is it possible to remove it now.

Currently all messages are copied from StringInfo to this buffer and sent,
which from my point of view is redundant operation.
It is possible to directly send messages from StringInfo to client. For
example: allocate more bytes from the beginning and fill out it before sent
to client.

Maybe there was already discussion about it or if I missing something
please fill free to correct me.

Thank you in advance!


Re: Atomics in localbuf.c

2020-03-05 Thread Andres Freund
Hi

On March 5, 2020 12:42:06 PM PST, Antonin Houska  wrote:
>Andres Freund  wrote:
>
>> On March 5, 2020 9:21:55 AM PST, Antonin Houska 
>wrote:
>> >What's the reason to use pg_atomic...read_...() and
>> >pg_atomic...write_...()
>> >functions in localbuf.c?
>> >
>> >It looks like there was an intention not to use them
>> >
>>
>>https://www.postgresql.org/message-id/CAPpHfdtfr3Aj7xJonXaKR8iY2p8uXOQ%2Be4BMpMDAM_5R4OcaDA%40mail.gmail.com
>> >
>> >but the following discussion does not explain the decision to use
>them.
>> 
>> Read/write don't trigger locked/atomic operations. They just
>guarantee that
>> you're not gonna read/write a torn value. Or a cached one. Since
>> local/shared buffers share the buffer header definition, we still
>have to
>> use proper functions to access the atomic variables.
>
>Sure, the atomic operations are necessary for shared buffers, but I
>still
>don't understand why they are needed for *local* buffers - local
>buffers their
>headers (BufferDesc) in process local memory, so there should be no
>concerns
>about concurrent access.
>
>Another thing that makes me confused is this comment in
>InitLocalBuffers():
>
>   /*
>* Intentionally do not initialize the buffer's atomic variable
>* (besides zeroing the underlying memory above). That way we get
>* errors on platforms without atomics, if somebody (re-)introduces
>* atomic operations for local buffers.
>*/
>
>That sounds like there was an intention not to use the atomic functions
>in
>localbuf.c, but eventually they ended up there. Do I still miss
>something?

Again, the read/write functions do not imply atomic instructions.

Ants
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: Retiring support for pre-7.3 FK constraint triggers

2020-03-05 Thread Tom Lane
Daniel Gustafsson  writes:
> When poking around here I realized that defGetStringList was also left unused.
> It was added with the logical decoding code but the single callsite has since
> been removed.  As it's published in a header we might not want to remove it,
> but I figured I'd bring it up as were talking about removing code.

Hm.  Kind of inclined to leave it, since somebody will probably need it
again someday.

regards, tom lane




kerberos regression test enhancement

2020-03-05 Thread David Zhang

Hi Hackers,

I found one interesting behavior when "--with-gssapi" is enabled,

given a very "common" configuration in gp_hba.conf like below,

    host    postgres    david   192.168.0.114/32    trust

the query message is always encrypted when using a very "common" way 
connect to PG server,


    $ psql -h pgserver -d postgres -U david

unless I specify "gssencmode=disable" with -d option,

    $ psql -h pgserver -U david  -d "dbname=postgres gssencmode=disable"

Based on above behaviors, I did a further exercise on kerberos 
regression test and found the test coverage is not enough. It should be 
enhanced to cover the above behavior when user specified a "host" 
followed by "trust" access in pg_hba.conf.


the attachment is a patch to cover the behaviors mentioned above for 
kerberos regression test.


Any thoughts?


Thanks,

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca
diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl
index b3aeea9574..7c2e65ce76 100644
--- a/src/test/kerberos/t/001_auth.pl
+++ b/src/test/kerberos/t/001_auth.pl
@@ -19,7 +19,7 @@ use Test::More;
 
 if ($ENV{with_gssapi} eq 'yes')
 {
-   plan tests => 18;
+   plan tests => 20;
 }
 else
 {
@@ -333,3 +333,25 @@ test_access(
0,
'',
'succeeds with include_realm=0 and defaults');
+
+truncate($node->data_dir . '/pg_ident.conf', 0);
+unlink($node->data_dir . '/pg_hba.conf');
+$node->append_conf('pg_hba.conf',
+   qq{host all all $hostaddr/32 trust});
+$node->restart;
+
+test_access(
+   $node,
+   'test1',
+   'SELECT not gss_authenticated AND encrypted from pg_stat_gssapi where 
pid = pg_backend_pid();',
+   0,
+   '',
+   'succeeds with GSS-encrypted with default gssencmode and host trust 
hba');
+
+test_access(
+   $node,
+   "test1",
+   'SELECT not gss_authenticated and not encrypted from pg_stat_gssapi 
where pid = pg_backend_pid();',
+   0,
+   "gssencmode=disable",
+   "succeeds with GSS encryption disabled with access disabled and host 
trust hba");


Re: Retiring support for pre-7.3 FK constraint triggers

2020-03-05 Thread Tom Lane
Daniel Gustafsson  writes:
> Having code which is untested and not excercised by developers (or users, if 
> my
> assumption holds), yet being reachable by SQL, runs the risk of introducing
> subtle bugs.  Is there a usecase for keeping it, or can/should it be removed 
> in
> 14?  That would still leave a lot of supported versions to upgrade to in case
> there are users to need this.

Pushed.  Looking at the original commit, I noticed one now-obsolete
comment that should also be removed, so I did that.

regards, tom lane




Re: Retiring support for pre-7.3 FK constraint triggers

2020-03-05 Thread Tom Lane
Alvaro Herrera  writes:
> On 2020-Mar-05, Tom Lane wrote:
>> As long as we're thinking of zapping code that is long past its sell-by
>> date, I propose getting rid of this stanza in indexcmds.c, which
>> basically causes CREATE INDEX to ignore certain opclass specifications:

> I agree, this should be fine to remove.

Done.

>> which dates to 8.2 (2a8d3d83e of 2005-11-07).  This is less bad than the
>> other thing, since it won't affect the behavior of any command that
>> wouldn't otherwise just fail; but maybe its time has passed as well?
>> Although Alvaro's point comparing these behaviors to pg_dump's support
>> cutoff of 8.0 suggests that maybe we should leave this one for now.

> Yeah, dunno, 'rtree' is even immortalized in tests; commit f2e403803fe6
> as recently as March 2019 was seen modifying that.

Hah, I didn't realize we actually had code coverage for that!

> I guess we can wait a couple years more on that one, since it's not
> damaging anything anyway.

Agreed, I left it be.

regards, tom lane




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-05 Thread Robert Haas
On Thu, Mar 5, 2020 at 2:18 PM Mahendra Singh Thalor  wrote:
> Here, attaching new patch set for review.

I was kind of assuming that the way this would work is that it would
set a flag or increment a counter or something when we acquire a
relation extension lock, and then reverse the process when we release
it. Then the Assert could just check the flag. Walking the whole
LOCALLOCK table is expensive.

Also, spelling counts. This patch features "extention" multiple times,
plus also "hask," "beloging," "belog," and "whle", which is an awful
lot of typos for a 70-line patch. If you are using macOS, try opening
the patch in TextEdit. If you are inventing a new function name, spell
the words you include the same way they are spelled elsewhere.

Even aside from the typo, AssertAnyExtentionLockHeadByMe() is not a
very good function name. It sounds like it's asserting that we hold an
extension lock, rather than that we don't, and also, that's not
exactly what it checks anyway, because there's this special case for
when we're acquiring a relation extension lock we already hold.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Atomics in localbuf.c

2020-03-05 Thread Antonin Houska
Andres Freund  wrote:

> On March 5, 2020 9:21:55 AM PST, Antonin Houska  wrote:
> >What's the reason to use pg_atomic...read_...() and
> >pg_atomic...write_...()
> >functions in localbuf.c?
> >
> >It looks like there was an intention not to use them
> >
> >https://www.postgresql.org/message-id/CAPpHfdtfr3Aj7xJonXaKR8iY2p8uXOQ%2Be4BMpMDAM_5R4OcaDA%40mail.gmail.com
> >
> >but the following discussion does not explain the decision to use them.
> 
> Read/write don't trigger locked/atomic operations. They just guarantee that
> you're not gonna read/write a torn value. Or a cached one. Since
> local/shared buffers share the buffer header definition, we still have to
> use proper functions to access the atomic variables.

Sure, the atomic operations are necessary for shared buffers, but I still
don't understand why they are needed for *local* buffers - local buffers their
headers (BufferDesc) in process local memory, so there should be no concerns
about concurrent access.

Another thing that makes me confused is this comment in InitLocalBuffers():

/*
 * Intentionally do not initialize the buffer's atomic variable
 * (besides zeroing the underlying memory above). That way we get
 * errors on platforms without atomics, if somebody (re-)introduces
 * atomic operations for local buffers.
 */

That sounds like there was an intention not to use the atomic functions in
localbuf.c, but eventually they ended up there. Do I still miss something?

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: Retiring support for pre-7.3 FK constraint triggers

2020-03-05 Thread Daniel Gustafsson
> On 5 Mar 2020, at 19:36, Alvaro Herrera  wrote:
> 
> On 2020-Mar-05, Tom Lane wrote:
> 
>> As long as we're thinking of zapping code that is long past its sell-by
>> date, I propose getting rid of this stanza in indexcmds.c, which
>> basically causes CREATE INDEX to ignore certain opclass specifications:
> 
> I agree, this should be fine to remove.

The attached patchset removes this stanza as well.

When poking around here I realized that defGetStringList was also left unused.
It was added with the logical decoding code but the single callsite has since
been removed.  As it's published in a header we might not want to remove it,
but I figured I'd bring it up as were talking about removing code.

cheers ./daniel



0002-Remove-handling-of-removed-_ops-classes.patch
Description: Binary data


0001-Remove-support-for-pre-7.3-constraint-trigger-conver.patch
Description: Binary data


Re: Planning counters in pg_stat_statements (using pgss_store)

2020-03-05 Thread legrand legrand
Never mind ...

Please consider PG13 shortest path ;o)

My one is  parse->queryId != UINT64CONST(0) in pgss_planner_hook().
It fixes IVM problem (verified), 
and keep CTAS equal to pgss without planning counters (verified too).

Regards
PAscal






--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html




Re: ALTER tbl rewrite loses CLUSTER ON index

2020-03-05 Thread Ibrar Ahmed
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

I tested the patch on the master branch 
(a77315fdf2a197a925e670be2d8b376c4ac02efc) and it works fine.

The new status of this patch is: Ready for Committer


Fwd: WAL usage calculation patch

2020-03-05 Thread Kirill Bychik
> I'm quite worried about the stability of those counters for regression tests.
> Wouldn't a checkpoint happening during the test change them?

Agree, stability of test could be an issue, even shifting of write
format or compression method or adding compatible changes could break
such test. Frankly speaking, the numbers expected are not actually
calculated, my logic was rather well described by "these numbers
should be non-zero for real tables". I believe the test can be
modified to check that numbers are above zero, both for bytes written
and for records stored.

Having a checkpoint in the moddle of the test can be almost 100%
countered by triggering one before the test. I'll add a checkpoint
call to the test scenario, if no objections here.

> While at it, did you consider adding a full-page image counter in the 
> WalUsage?
> That's something I'd really like to have and it doesn't seem hard to 
> integrate.

Well, not sure I understand you 100%, being new to Postgres dev. Do
you want a separate counter for pages written whenever doPageWrites is
true? I can do that, if needed. Please confirm.

> Another point is that this patch won't help to see autovacuum activity.
> As an example, I did a quick te.
> ...LONG QUOTE...
> but that may seem strange to only account for (auto)vacuum activity, rather
> than globally, grouping per RmgrId or CommandTag for instance.  We could then
> see the complete WAL usage per-database.  What do you think?

I wanted to keep the patch small and simple, and fit to practical
needs. This patch is supposed to provide tuning assistance, catching
an io heavy query in commit-bound situation.
Total WAL usage per DB can be assessed rather easily using other means.
Let's get this change into the codebase and then work on connecting
WAL usage to  (auto)vacuum stats.

>
> Some minor points I noticed:
>
> - the extension patch doesn't apply anymore, I guess since 70a7732007bc4689

Will fix, thank you.

>
>  #define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE009)
> +#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE010)
>
> Shouldn't it be 0xA rather than 0x10?

Oww, my bad, this is embaracing! Will fix, thank you.

> - it would be better to add a version number to the patches, so we're sure
>   which one we're talking about.

Noted, thank you.

Please comment on the proposed changes, I will cook up a new version
once all are agreed upon.




Re: Allowing ALTER TYPE to change storage strategy

2020-03-05 Thread Tom Lane
Tomas Vondra  writes:
> FWIW I'm not suggesting you go and implement #1 or #2 for me, that'd be
> up to me I guess. But I disagree there's no use case for it, and #3
> makes this featuer useless for me.

OK, then we need to do something else.  Do you have ideas for other
alternatives?

If not, we probably should bite the bullet and go for #1, since
I have little doubt that we'll need that someday anyway.
The trick will be to keep down the cache invalidation overhead...

regards, tom lane




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-05 Thread Mahendra Singh Thalor
On Wed, 4 Mar 2020 at 12:03, Dilip Kumar  wrote:
>
> On Wed, Mar 4, 2020 at 11:45 AM Mahendra Singh Thalor
>  wrote:
> >
> > On Mon, 24 Feb 2020 at 15:39, Amit Kapila  wrote:
> > >
> > > On Thu, Feb 20, 2020 at 8:06 AM Andres Freund  wrote:
> > > >
> > > > Hi,
> > > >
> > > > On 2020-02-19 11:12:18 +0530, Amit Kapila wrote:
> > > > > I think till we know the real need for changing group locking, going
> > > > > in the direction of what Tom suggested to use an array of LWLocks [1]
> > > > > to address the problems in hand is a good idea.
> > > >
> > > > -many
> > > >
> > > > I think that building yet another locking subsystem is the entirely
> > > > wrong idea - especially when there's imo no convincing architectural
> > > > reasons to do so.
> > > >
> > >
> > > Hmm, AFAIU, it will be done by having an array of LWLocks which we do
> > > at other places as well (like BufferIO locks).  I am not sure if we
> > > can call it as new locking subsystem, but if we decide to continue
> > > using lock.c and change group locking then I think we can do that as
> > > well, see my comments below regarding that.
> > >
> > > >
> > > > > It is not very clear to me that are we thinking to give up on Tom's
> > > > > idea [1] and change group locking even though it is not clear or at
> > > > > least nobody has proposed an idea/patch which requires that?  Or are
> > > > > we thinking that we can do what Tom suggested for relation extension
> > > > > lock and also plan to change group locking for future parallel
> > > > > operations that might require it?
> > > >
> > > > What I'm advocating is that extension locks should continue to go
> > > > through lock.c. And yes, that requires some changes to group locking,
> > > > but I still don't see why they'd be complicated.
> > > >
> > >
> > > Fair position, as per initial analysis, I think if we do below three
> > > things, it should work out without changing to a new way of locking
> > > for relation extension or page type locks.
> > > a. As per the discussion above, ensure in code we will never try to
> > > acquire another heavy-weight lock after acquiring relation extension
> > > or page type locks (probably by having Asserts in code or maybe some
> > > other way).
> > > b. Change lock.c so that group locking is not considered for these two
> > > lock types. For ex. in LockCheckConflicts, along with the check (if
> > > (proclock->groupLeader == MyProc && MyProc->lockGroupLeader == NULL)),
> > > we also check lock->tag and call it a conflict for these two locks.
> > > c. The deadlock detector can ignore checking these two types of locks
> > > because point (a) ensures that those won't lead to deadlock.  One idea
> > > could be that FindLockCycleRecurseMember just ignores these two types
> > > of locks by checking the lock tag.
> >
> > Thanks Amit for summary.
> >
> > Based on above 3 points, here attaching 2 patches for review.
> >
> > 1. v01_0001-Conflict-EXTENTION-lock-in-group-member.patch (Patch by Dilip 
> > Kumar)
> > Basically this patch is for point b and c.
> >
> > 2. v01_0002-Added-assert-to-verify-that-we-never-try-to-take-any.patch
> > (Patch by me)
> > This patch is for point a.
> >
> > After applying both the patches, make check-world is passing.
> >
> > We are testing both the patches and will post results.
> >
> > Thoughts?
>
> +static void AssertAnyExtentionLockHeadByMe(void);
>
> +/*
> + * AssertAnyExtentionLockHeadByMe -- test whether any EXTENSION lock held by
> + * this backend.  If any EXTENSION lock is hold by this backend, then assert
> + * will fail.  To use this function, assert should be enabled.
> + */
> +void AssertAnyExtentionLockHeadByMe()
> +{
>
> Some minor observations on 0002.
> 1. static is missing in a function definition.
> 2. Function name should start in new line after function return type
> in function definition, as per pg guideline.
> +void AssertAnyExtentionLockHeadByMe()
> ->
> void
> AssertAnyExtentionLockHeadByMe()

Thanks Dilip for review.

I have fixed above 2 points in v2 patch set.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-05 Thread Mahendra Singh Thalor
On Thu, 5 Mar 2020 at 13:54, Dilip Kumar  wrote:
>
> On Thu, Mar 5, 2020 at 12:15 PM Amit Kapila  wrote:
> >
> > On Wed, Mar 4, 2020 at 11:45 AM Mahendra Singh Thalor
> >  wrote:
> > >
> > > On Mon, 24 Feb 2020 at 15:39, Amit Kapila  wrote:
> > > >
> > > > On Thu, Feb 20, 2020 at 8:06 AM Andres Freund  
> > > > wrote:
> > > > >
> > > > > What I'm advocating is that extension locks should continue to go
> > > > > through lock.c. And yes, that requires some changes to group locking,
> > > > > but I still don't see why they'd be complicated.
> > > > >
> > > >
> > > > Fair position, as per initial analysis, I think if we do below three
> > > > things, it should work out without changing to a new way of locking
> > > > for relation extension or page type locks.
> > > > a. As per the discussion above, ensure in code we will never try to
> > > > acquire another heavy-weight lock after acquiring relation extension
> > > > or page type locks (probably by having Asserts in code or maybe some
> > > > other way).
> > > > b. Change lock.c so that group locking is not considered for these two
> > > > lock types. For ex. in LockCheckConflicts, along with the check (if
> > > > (proclock->groupLeader == MyProc && MyProc->lockGroupLeader == NULL)),
> > > > we also check lock->tag and call it a conflict for these two locks.
> > > > c. The deadlock detector can ignore checking these two types of locks
> > > > because point (a) ensures that those won't lead to deadlock.  One idea
> > > > could be that FindLockCycleRecurseMember just ignores these two types
> > > > of locks by checking the lock tag.
> > >
> > > Thanks Amit for summary.
> > >
> > > Based on above 3 points, here attaching 2 patches for review.
> > >
> > > 1. v01_0001-Conflict-EXTENTION-lock-in-group-member.patch (Patch by Dilip 
> > > Kumar)
> > > Basically this patch is for point b and c.
> > >
> > > 2. v01_0002-Added-assert-to-verify-that-we-never-try-to-take-any.patch
> > > (Patch by me)
> > > This patch is for point a.
> > >
> > > After applying both the patches, make check-world is passing.
> > >
> > > We are testing both the patches and will post results.
> > >
> >

Thanks Amit and Dilip for reviewing the patches.

> > I think we need to do detailed code review in the places where we are
> > taking Relation Extension Lock and see whether we are acquiring
> > another heavy-weight lock after that. It seems to me that in
> > brin_getinsertbuffer, after acquiring Relation Extension Lock, we
> > might again try to acquire the same lock.  See
> > brin_initialize_empty_new_buffer which is called after acquiring
> > Relation Extension Lock, in that function, we call
> > RecordPageWithFreeSpace and that can again try to acquire the same
> > lock if it needs to perform fsm_extend.  I think there will be similar
> > instances in the code.  I think it is fine if we again try to acquire
> > it, but the current assertion in your patch needs to be adjusted for
> > that.

I agree with you.  Dilip is doing code review and he will post
results.  As you mentioned that while holing Relation Extension Lock,
we might again try to acquire same Relation Extension Lock, so to
handle this in assert I did some changes in patch and attaching patch
for review. (I will test this scenario)

> >
> > Few other minor comments on
> > v01_0002-Added-assert-to-verify-that-we-never-try-to-take-any:
> > 1. Ideally, this should be the first patch as we first need to ensure
> > that we don't take any heavy-weight locks after acquiring a relation
> > extension lock.

Fixed.

> > 2. I think it is better to add an Assert after initial error checks
> > (after RecoveryInProgress().. check)

I am not getting your points. Can you explain me, that which type of
assert you are suggesting?

> > 3.
> > + Assert (locallock->tag.lock.locktag_type != LOCKTAG_RELATION_EXTEND ||
> > + locallock->nLocks == 0);
> >
> > I think it is not possible that we have an entry in
> > LockMethodLocalHash and its value is zero.  Do you see any such
> > possibility, if not, then we might want to remove it?

Yes, this condition is not needed. Fixed.

> >
> > 4. We already have a macro for LOCALLOCK_LOCKMETHOD, can we write
> > another one tag type?  This will make the check look a bit cleaner and
> > probably if we need to extend it in future for Page type locks, then
> > also it will be good.

Good point. I added macros in this version.

Here, attaching new patch set for review.

Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com


v02_0001-Added-assert-to-verify-that-we-never-try-to-take-any.patch
Description: Binary data


v02_0002-Conflict-EXTENTION-lock-in-group-member.patch
Description: Binary data


Re: Allowing ALTER TYPE to change storage strategy

2020-03-05 Thread Tomas Vondra

On Wed, Mar 04, 2020 at 06:56:42PM -0500, Tom Lane wrote:

I wrote:

3. Drop the ability for ALTER TYPE to promote from PLAIN to not-PLAIN
typstorage, and adjust the typcache so that it only remembers boolean
toastability not the specific toasting strategy.  Then the cache is
still immutable so no need for update logic.

I'm kind of liking #3, ugly as it sounds, because I'm not sure how
much of a use-case there is for the upgrade-from-PLAIN case.
Particularly now that TOAST is so ingrained in the system, it seems
rather unlikely that a production-grade data type wouldn't have
been designed to be toastable from the beginning, if there could be
any advantage to that.  Either #1 or #2 seem like mighty high prices
to pay for offering an option that might have no real-world uses.


Here's a v5 based on that approach.  I also added some comments about
the potential race conditions involved in recursing to domains.



Well, I don't know what to say, really. This very thread started with me
explaining how I've repeatedly needed a way to upgrade from PLAIN, so I
don't quite agree with your claim that there's no use case for that.

Granted, the cases may be my faults - sometimes I have not expected the
type to need TOAST initially, and then later realizing I've been wrong.
In other cases I simply failed to realize PLAIN is the default value
even for varlena types (yes, it's a silly mistake).

FWIW I'm not suggesting you go and implement #1 or #2 for me, that'd be
up to me I guess. But I disagree there's no use case for it, and #3
makes this featuer useless for me.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: more ALTER .. DEPENDS ON EXTENSION fixes

2020-03-05 Thread Ibrar Ahmed
On Thu, Mar 5, 2020 at 11:38 PM Alvaro Herrera 
wrote:

> On 2020-Mar-05, Ibrar Ahmed wrote:
>
> > Is this intentional that there is no error when removing a non-existing
> > dependency?
>
> Hmm, I think we can do nothing silently if nothing is called for.
> So, yes, that seems to be the way it should work.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

I think we need a tab-completion patch too for this.

-- 
Ibrar Ahmed


004_tab_complete.patch
Description: Binary data


Re: more ALTER .. DEPENDS ON EXTENSION fixes

2020-03-05 Thread Alvaro Herrera
On 2020-Mar-05, Ibrar Ahmed wrote:

> Is this intentional that there is no error when removing a non-existing
> dependency?

Hmm, I think we can do nothing silently if nothing is called for.
So, yes, that seems to be the way it should work.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Retiring support for pre-7.3 FK constraint triggers

2020-03-05 Thread Alvaro Herrera
On 2020-Mar-05, Tom Lane wrote:

> As long as we're thinking of zapping code that is long past its sell-by
> date, I propose getting rid of this stanza in indexcmds.c, which
> basically causes CREATE INDEX to ignore certain opclass specifications:

I agree, this should be fine to remove.

> Elsewhere in indexcmds.c, there's this:
> 
> /*
>  * Hack to provide more-or-less-transparent updating of old RTREE
>  * indexes to GiST: if RTREE is requested and not found, use GIST.
>  */
> if (strcmp(accessMethodName, "rtree") == 0)
> {
> ereport(NOTICE,
> (errmsg("substituting access method \"gist\" for obsolete 
> method \"rtree\"")));
> accessMethodName = "gist";
> tuple = SearchSysCache1(AMNAME, 
> PointerGetDatum(accessMethodName));
> }
> 
> which dates to 8.2 (2a8d3d83e of 2005-11-07).  This is less bad than the
> other thing, since it won't affect the behavior of any command that
> wouldn't otherwise just fail; but maybe its time has passed as well?
> Although Alvaro's point comparing these behaviors to pg_dump's support
> cutoff of 8.0 suggests that maybe we should leave this one for now.

Yeah, dunno, 'rtree' is even immortalized in tests; commit f2e403803fe6
as recently as March 2019 was seen modifying that.

(Another curious factoid is that SQLite supports something that vaguely
looks rtreeish https://sqlite.org/rtree.html -- However, because it
doesn't use the same syntax Postgres uses, it's not a point against
removing our hack.)

I guess we can wait a couple years more on that one, since it's not
damaging anything anyway.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: more ALTER .. DEPENDS ON EXTENSION fixes

2020-03-05 Thread Ibrar Ahmed
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

It works for me

Re: more ALTER .. DEPENDS ON EXTENSION fixes

2020-03-05 Thread Ibrar Ahmed
On Mon, Mar 2, 2020 at 12:45 PM Ahsan Hadi  wrote:

>
>
> On Sat, Feb 29, 2020 at 2:38 AM Alvaro Herrera 
> wrote:
>
>> On 2020-Feb-28, ahsan hadi wrote:
>>
>>
>> > Tested the pg_dump patch for dumping "ALTER .. DEPENDS ON EXTENSION" in
>> case of indexes, functions, triggers etc. The "ALTER .. DEPENDS ON
>> EXTENSION" is included in the dump. However in some case not sure why
>> "ALTER INDEX.DEPENDS ON EXTENSION" is repeated several times in the
>> dump?
>>
>> Hi, thanks for testing.
>>
>> Are the repeated commands for the same index, same extension?
>
>
> Yes same index and same extension...
>

You cannot do that after applying all the patches.


>
>
>>   Did you
>> apply the same command multiple times before running pg_dump?
>>
>
> Yes but in some cases I applied the command once and it appeared multiple
> times in the dump..
>

Not for me, it works for me.


>
>
>>
>> There was an off-list complaint that if you repeat the ALTER .. DEPENDS
>> for the same object on the same extension, then the same dependency is
>> registered multiple times.  (You can search pg_depend for "deptype = 'x'"
>> to see that).  I suppose that would lead to the line being output
>> multiple times by pg_dump, also.  Is that what you did?
>>
>
> I checked out pg_depend for "deptype='x'" the same dependency is
> registered multiple times...
>
>>
>> If so: Patch 0002 is supposed to fix that problem, by raising an error
>> if the dependency is already registered ... though it occurs to me now
>> that it would be more in line with custom to make the command a silent
>> no-op.  In fact, doing that would cause old dumps (generated with
>> databases containing duplicated entries) to correctly restore a single
>> entry, without error.  Therefore my inclination now is to change 0002
>> that way and push and backpatch it ahead of 0001.
>>
>
> Makes sense, will also try our Patch 0002.
>
>>
>> I realize just now that I have failed to verify what happens with
>> partitioned indexes.
>>
>
> Yes I also missed this one..
>

It works for partitioned indexes.


Is this intentional that there is no error when removing a non-existing
dependency?


Re: Atomics in localbuf.c

2020-03-05 Thread Andres Freund
Hi

On March 5, 2020 9:21:55 AM PST, Antonin Houska  wrote:
>What's the reason to use pg_atomic...read_...() and
>pg_atomic...write_...()
>functions in localbuf.c?
>
>It looks like there was an intention not to use them
>
>https://www.postgresql.org/message-id/CAPpHfdtfr3Aj7xJonXaKR8iY2p8uXOQ%2Be4BMpMDAM_5R4OcaDA%40mail.gmail.com
>
>but the following discussion does not explain the decision to use them.

Read/write don't trigger locked/atomic operations. They just guarantee that 
you're not gonna read/write a torn value. Or a cached one. Since local/shared 
buffers share the buffer header definition, we still have to use proper 
functions to access the atomic variables.

Regards,

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: proposal: schema variables

2020-03-05 Thread Pavel Stehule
> CREATE VARIABLE
> postgres=# drop function foofunc();
> DROP FUNCTION
> postgres=# select test.v1;
> ERROR:  cache lookup failed for function 16437
>
>
Thank you for this analyze and patches. I merged them to attached patch



>
> 3- Variable DEFAULT expression is apparently being evaluated at the time of
> first access. whereas I think that It should be at the time of variable
> creation. consider the following example:
>
> postgres=# create variable test.v2 as timestamp default now();
> CREATE VARIABLE
> postgres=# select now();
>   now
> ---
>  2020-03-05 12:13:29.775373+00
> (1 row)
> postgres=# select test.v2;
>  v2
> 
>  2020-03-05 12:13:37.192317 -- I was expecting this to be earlier than
> the above timestamp.
> (1 row)
>
> postgres=# select test.v2;
>  v2
> 
>  2020-03-05 12:13:37.192317
> (1 row)
> postgres=# let test.v2 = default;
> LET
> postgres=# select test.v2;
>  v2
> 
>  2020-03-05 12:14:07.538615
> (1 row)
>
>
This is expected and wanted - same behave has plpgsql variables.

CREATE OR REPLACE FUNCTION public.foo()
 RETURNS void
 LANGUAGE plpgsql
AS $function$
declare x timestamp default current_timestamp;
begin
  raise notice '%', x;
end;
$function$

postgres=# select foo();
NOTICE:  2020-03-05 18:49:12.465054
┌─┐
│ foo │
╞═╡
│ │
└─┘
(1 row)

postgres=# select foo();
NOTICE:  2020-03-05 18:49:13.255197
┌─┐
│ foo │
╞═╡
│ │
└─┘
(1 row)

You can use

CREATE VARIABLE cuser AS text DEFAULT session_user;

Has not any sense to use a value from creating time

And a analogy with CREATE TABLE

CREATE TABLE fooo(a timestamp DEFAULT current_timestamp) -- there is not a
create time timestamp


I fixed buggy behave of IMMUTABLE variables

Regards

Pavel



>
> To continue my testing of the patch I made few fixes for the
> above-mentioned
> comments. The patch for those changes is attached if it could be of any
> use.
>
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>


schema-variables-20200305.patch.gz
Description: application/gzip


Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-03-05 Thread Tomas Vondra

On Wed, Mar 04, 2020 at 09:13:49AM +0530, Dilip Kumar wrote:

On Wed, Mar 4, 2020 at 3:16 AM Tomas Vondra
 wrote:


Hi,

I started looking at this patch series again, hoping to get it moving
for PG13.


Nice.

There's been a tremendous amount of work done since I last

worked on it, and a lot was discussed on this thread, so it'll take a
while to get familiar with the new code ...

The first thing I realized that WAL-logging of assignments in v12 does
both the "old" logging (using dedicated message) and "new" with
toplevel-XID embedded in the first message. Yes, the patch was wrong,
because it eliminated all calls to ProcArrayApplyXidAssignment() and so
it was trivial to crash the replica due to KnownAssignedXids overflow.
But I don't think re-introducing XLOG_XACT_ASSIGNMENT message is the
right fix.

I actually proposed doing this (having both ways to log assignments) so
that there's no regression risk with (wal_level < logical). But IIRC
Andres objected to it, argumenting that we should not log the same piece
of information in two very different ways at the same time (IIRC it was
discussed on the FOSDEM dev meeting, so I don't have a link to share).
And I do agree with him ...

The question is, why couldn't the replica use the same assignment info
we already write for logical decoding? The main challenge is that now
the assignment can be sent in many different xlog messages, from a bunch
of resource managers (essentially, any xlog message with a xid can have
embedded XID of the toplevel xact). So the handling would either need to
happen in every rmgr, or we need to move it before we call the rmgr.

For exampple, we might do this e.g. in StartupXLOG() I think, per the
attached patch (FWIW this particular fix was written by Masahiko Sawada,
not me). This does the trick for me - I'm no longer able to reproduce
the KnownAssignedXids overflow.

The one difference is that we used to call ProcArrayApplyXidAssignment
for larger groups of XIDs, as sent in the assignment message. Now we
call it for each individual assignment. I don't know if this is an
issue, but I suppose we might introduce some sort of local caching
(accumulate the assignments into a local array, call the function only
when we have enough of them).


Thanks for the pointers,  I will think over these points.



Aside from that, I think there's a minor bug in xact.c - the patch adds
a "assigned" field to TransactionStateData, but then it fails to add a
default value into TopTransactionStateData. We probably interpret NULL
as false, but then there's nothing for the pointer. I suspect it might
leave some random garbage there, leading to strange things later.


Actually, we will never access that field for the
TopTransactionStateData, right?
See below code,  we have a check that only if IsSubTransaction(), then
we access the "assigned" filed.

+bool
+IsSubTransactionAssignmentPending(void)
+{
+ if (!XLogLogicalInfoActive())
+ return false;
+
+ /* we need to be in a transaction state */
+ if (!IsTransactionState())
+ return false;
+
+ /* it has to be a subtransaction */
+ if (!IsSubTransaction())
+ return false;
+
+ /* the subtransaction has to have a XID assigned */
+ if (!TransactionIdIsValid(GetCurrentTransactionIdIfAny()))
+ return false;
+
+ /* and it needs to have 'assigned' */
+ return !CurrentTransactionState->assigned;
+
+}



The problem is not with the "assigned" field, really. AFAICS we probably
initialize it to false because we interpret NULL as false. My concern
was that we essentially leave the last pointer not initialized. That
seems like a bug, not sure if it breaks something in practice.



Another thing I noticed is LogicalDecodingProcessRecord() extracts the
toplevel XID using a macro

   txid = XLogRecGetTopXid(record);

but then it just starts accessing the fields directly again in the
ReorderBufferAssignChild call. I think we should do this instead:

 ReorderBufferAssignChild(ctx->reorder,
  txid,
 XLogRecGetXid(record),
  buf.origptr);


Make sense.  I will change this in the patch.



+1, thanks


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-03-05 Thread Tomas Vondra

On Wed, Mar 04, 2020 at 10:28:32AM +0530, Amit Kapila wrote:

On Wed, Mar 4, 2020 at 3:16 AM Tomas Vondra
 wrote:


Hi,

I started looking at this patch series again, hoping to get it moving
for PG13.



It is good to keep moving this forward, but there are quite a few
problems with the design which need a broader discussion.  Some of
what I recall are:
a. Handling of abort of concurrent transactions.  There is some code
in the patch which might work, but there is not much discussion when
it was posted.
b. Handling of partial tuples (while streaming, we came to know that
toast tuple is not complete or speculative insert is incomplete).  For
this also, we have proposed a few solutions which need further
discussion.  One of those is implemented in the patch series.
c. We might also need some handling for replication origins.
d. Try to minimize the performance overhead of WAL logging for
invalidations.  We discussed different solutions for this and
implemented one of those.
e. How to skip already streamed transactions.

There might be a few more which I can't recall now.  Apart from this,
I haven't done any detailed review of subscriber-side implementation
where we write streamed transactions to file.  All of this will need
much more discussion and review before we can say it is ready to
commit, so I thought it might be better to pick it up for PG14 and
focus on other things that have a better chance for PG13 especially
because all the problems were not solved/discussed before last CF.
However, it is a good idea to keep moving this and have a discussion
on some of these issues.



Sure, there's a lot to discuss. And it's possible (likely) it's not
feasible to get this into PG13. But I think it's still worth discussing
it, instead of just punting it into the next CF right away.


There's been a tremendous amount of work done since I last
worked on it, and a lot was discussed on this thread, so it'll take a
while to get familiar with the new code ...

The first thing I realized that WAL-logging of assignments in v12 does
both the "old" logging (using dedicated message) and "new" with
toplevel-XID embedded in the first message. Yes, the patch was wrong,
because it eliminated all calls to ProcArrayApplyXidAssignment() and so
it was trivial to crash the replica due to KnownAssignedXids overflow.
But I don't think re-introducing XLOG_XACT_ASSIGNMENT message is the
right fix.

I actually proposed doing this (having both ways to log assignments) so
that there's no regression risk with (wal_level < logical). But IIRC
Andres objected to it, argumenting that we should not log the same piece
of information in two very different ways at the same time (IIRC it was
discussed on the FOSDEM dev meeting, so I don't have a link to share).
And I do agree with him ...



So, aren't we worried about the overhead of the amount of WAL and
performance impact for the transactions?  We might want to check the
pgbench read-write test to see if that will add any significant
overhead.



Well, sure. I agree we need to see how this affects performance, and
I'll do some benchmarks (I think I did that when submitting the patch,
but I don't recall the numbers / details).

Isn't it a bit strange to log stuff twice, though, if we worry about
performance? Surely that's more expensive than logging it just once. Of
course, it might be useful if most systems need just the "old" way.

I know it's going to be a bit hand-wavy, but I think embedding the
assignments into existing WAL messages is about the cheapest way to log
this. I would not expect this to be mesurably more expensive than what
we have now, but I might be wrong.


The question is, why couldn't the replica use the same assignment info
we already write for logical decoding?



I haven't thought about it in detail, but we can think on those lines
if the performance overhead is in the acceptable range.



OK, let me do some measurements ...


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Psql patch to show access methods info

2020-03-05 Thread vignesh C
On Wed, Mar 4, 2020 at 5:02 AM Alexander Korotkov 
wrote:
>
> Hi!
>
> Thank you for the review.  Revised patch is attached.
>

Thanks for working on comments and providing a new patch.
One small observation I noticed:
postgres=# \*dAc* brin oid
 Index access method operator classes
  AM  | Input type | Storage type | Operator class | Default?
--++--++--
 brin | oid|  | oid_minmax_ops | yes
(1 row)

postgres=# \*dAcx* brin oid
 Index access method operator classes
  AM  | Input type | Storage type | Operator class | Default?
--++--++--
 brin | oid|  | oid_minmax_ops | yes
(1 row)

Output of \dAc and \dAcx seems to be same. Is this expected?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-05 Thread Justin Pryzby
On Thu, Mar 05, 2020 at 03:27:31PM +0100, Laurenz Albe wrote:
> On Thu, 2020-03-05 at 19:40 +1300, David Rowley wrote:
> > 1. I'd go for 2 new GUCs and reloptions.
> > autovacuum_vacuum_insert_scale_factor and these should work exactly
> 
> I disagree about the scale_factor (and have not added it to the
> updated version of the patch).  If we have a scale_factor, then the
> time between successive autovacuum runs would increase as the table
> gets bigger, which defeats the purpose of reducing the impact of each
> autovacuum run.

I would vote to include scale factor.  You're right that a nonzero scale factor
would cause vacuum to run with geometrically decreasing frequency.  The same
thing currently happens with autoanalyze as a table grows in size.  I found
that our monthly-partitioned tables were being analyzed too infrequently
towards the end of the month.  (At the beginning of the month, 10% is 2.4 hours
worth of timeseries data, but at the end of the month 10% is 3 days, which was
an issue when querying the previous day may have rowcount estimates near zero.)
If someone wanted to avoid that, they'd set scale_factor=0.  I think this patch
should parallel what's already in place, and we can add documention for the
behavior if need be.  Possibly scale_factor should default to zero, which I
think might make sense since insert-only tables seem to be the main target of
this patch.

> +++ b/doc/src/sgml/maintenance.sgml
> +   
> +Tables that have received more than
> +
> +inserts since they were last vacuumed and are not eligible for vacuuming
> +based on the above criteria will be vacuumed to reduce the impact of a 
> future
> +anti-wraparound vacuum run.
> +Such a vacuum will aggressively freeze tuples, and it will not clean up 
> dead
> +index tuples.

"BUT will not clean .."

> +++ b/src/backend/postmaster/autovacuum.c
> + /*
> +  * If the number of inserted tuples exceeds the limit

I would say "exceeds the threshold"

Thanks for working on this; we would use this feature on our insert-only
tables.

-- 
Justin




Atomics in localbuf.c

2020-03-05 Thread Antonin Houska
What's the reason to use pg_atomic...read_...() and pg_atomic...write_...()
functions in localbuf.c?

It looks like there was an intention not to use them

https://www.postgresql.org/message-id/CAPpHfdtfr3Aj7xJonXaKR8iY2p8uXOQ%2Be4BMpMDAM_5R4OcaDA%40mail.gmail.com

but the following discussion does not explain the decision to use them.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: Retiring support for pre-7.3 FK constraint triggers

2020-03-05 Thread Vik Fearing
On 05/03/2020 16:33, Tom Lane wrote:
> Elsewhere in indexcmds.c, there's this:
> 
> /*
>  * Hack to provide more-or-less-transparent updating of old RTREE
>  * indexes to GiST: if RTREE is requested and not found, use GIST.
>  */
> if (strcmp(accessMethodName, "rtree") == 0)
> {
> ereport(NOTICE,
> (errmsg("substituting access method \"gist\" for obsolete 
> method \"rtree\"")));
> accessMethodName = "gist";
> tuple = SearchSysCache1(AMNAME, 
> PointerGetDatum(accessMethodName));
> }

Aww, this one is in my list of gotcha trivia questions.

That's not a reason not to remove it, of course.
-- 
Vik Fearing




Re: reindex concurrently and two toast indexes

2020-03-05 Thread Julien Rouhaud
On Thu, Mar 05, 2020 at 12:53:54PM +0900, Michael Paquier wrote:
> On Wed, Mar 04, 2020 at 09:21:45AM +0100, Julien Rouhaud wrote:
> > Thanks for the patch!  I started to look at it during the weekend, but
> > I got interrupted and unfortunately didn't had time to look at it
> > since.
>
> No problem, thanks for looking at it.  I have looked at it again this
> morning, and applied it.
>
> > The fix looks good to me.  I also tried multiple failure scenario and
> > it's unsurprisingly working just fine.  Should we add some regression
> > tests for that?  I guess most of it could be borrowed from the patch
> > to fix the toast index issue I sent last week.
>
> I have doubts when it comes to use a strategy based on
> pg_cancel_backend() and a match of application_name (see for example
> 5ad72ce but I cannot find the associated thread).  I think that we
> could design something more robust here and usable by all tests, with
> two things coming into my mind:
> - A new meta-command for isolation tests to be able to cancel a
> session with PQcancel().
> - Fault injection in the backend.
> For the case of this thread, the cancellation command would be a better
> match.

I agree that the approach wasn't quite robust.  I'll try to look at adding a
new command for isolationtester, but that's probably not something we want to
put in pg13?

Here's a v3 that takes address the various comments you previously noted, and
for which I also removed the regression tests.

Note that while looking at it, I noticed another bug in RIC:

# create table t1(id integer, val text); create index on t1(val);
CREATE TABLE

CREATE INDEX

# reindex table concurrently t1;
^CCancel request sent
ERROR:  57014: canceling statement due to user request
LOCATION:  ProcessInterrupts, postgres.c:3171

# select indexrelid::regclass, indrelid::regclass, indexrelid, indrelid from 
pg_index where not indisvalid;
 indexrelid  |indrelid | indexrelid | 
indrelid
-+-++--
 t1_val_idx_ccold| t1  |  16401 |   
 16395
 pg_toast.pg_toast_16395_index_ccold | pg_toast.pg_toast_16395 |  16400 |   
 16398
(2 rows)


# reindex table concurrently t1;
WARNING:  0A000: cannot reindex invalid index "public.t1_val_idx_ccold" 
concurrently, skipping
LOCATION:  ReindexRelationConcurrently, indexcmds.c:2821
WARNING:  XX002: cannot reindex invalid index 
"pg_toast.pg_toast_16395_index_ccold" concurrently, skipping
LOCATION:  ReindexRelationConcurrently, indexcmds.c:2867
REINDEX

# reindex index concurrently t1_val_idx_ccold;
REINDEX

That case is also fixed in this patch.
>From 77ec865a9a655b2b973846f9a8fa93c966ca55f5 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Fri, 21 Feb 2020 20:15:04 +0100
Subject: [PATCH] Don't reindex invalid indexes on TOAST tables.

Such indexes can only be duplicated leftovers of failed REINDEX CONCURRENTLY
commands.  As we only allow to drop invalid indexes on TOAST tables, reindexing
those would lead to useless duplicated indexes that can't be dropped anymore.

Reported-by: Sergei Kornilov, Justin Pryzby
Author: Julien Rouhaud
Reviewed-by: Michael Paquier
Discussion: 
https://postgr.es/m/36712441546604286%40sas1-890ba5c2334a.qloud-c.yandex.net
Discussion: https://postgr.es/m/20200216190835.ga21...@telsasoft.com
Backpatch-through: 12
---
 src/backend/catalog/index.c  | 30 
 src/backend/commands/indexcmds.c | 47 +---
 2 files changed, 67 insertions(+), 10 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 7223679033..d3d28df97a 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -46,6 +46,7 @@
 #include "catalog/pg_depend.h"
 #include "catalog/pg_description.h"
 #include "catalog/pg_inherits.h"
+#include "catalog/pg_namespace_d.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_tablespace.h"
@@ -3724,6 +3725,35 @@ reindex_relation(Oid relid, int flags, int options)
{
Oid indexOid = lfirst_oid(indexId);
 
+   /*
+* We skip any invalid index on a TOAST table.  Those 
can only be
+* a duplicate leftover of a failed REINDEX 
CONCURRENTLY, and if we
+* rebuild it it won't be possible to drop it anymore.
+*/
+   if (rel->rd_rel->relnamespace == PG_TOAST_NAMESPACE)
+   {
+   HeapTuple   tup;
+   boolskipit;
+
+   tup = SearchSysCache1(INDEXRELID, 
ObjectIdGetDatum(indexOid));
+   if (!HeapTupleIsValid(tup))
+   elog(ERROR, "cache lookup failed for 
index 

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-03-05 Thread Alexey Kondratov

On 05.03.2020 09:24, Michael Paquier wrote:

On Wed, Mar 04, 2020 at 08:14:20PM +0300, Alexey Kondratov wrote:

- I did not actually get why you don't check for a missing command
when using wait_result_is_any_signal.  In this case I'd think that it
is better to exit immediately as follow-up calls would just fail.

Believe me or not, but I put 'false' there intentionally. The idea was that
if the reason is a signal, then maybe user tired of waiting and killed that
restore_command process theirself or something like that, so it is better to
exit immediately. If it was a missing command, then there is no hurry, so we
can go further and complain that attempt of recovering WAL segment has
failed.

Actually, I guess that there is no big difference if we include missing
command here or not. There is no complicated logic further compared to real
recovery process in Postgres, where we cannot simply return false in that
case.

On the contrary, it seems to me that the difference is very important.
Imagine for example a frontend tool which calls RestoreArchivedWALFile
in a loop, and that this one fails because the command called is
missing.  This tool would keep looping for nothing.  So checking for a
missing command and leaving immediately would be more helpful for the
user.  Can you think about scenarios where it would make sense to be
able to loop in this case instead of failing?



OK, I was still having in mind pg_rewind as the only one user of this 
routine. Now it is a part of the common and I could imagine a 
hypothetical tool that is polling the archive and waiting for a specific 
WAL segment to become available. In this case 'command not found' is 
definitely the end of game, while the absence of segment is expected 
error, so we can continue looping.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company





Re: pg_ls_tmpdir to show directories and shared filesets

2020-03-05 Thread Justin Pryzby
On Tue, Mar 03, 2020 at 05:23:13PM -0300, Alvaro Herrera wrote:
> On 2020-Mar-03, Justin Pryzby wrote:
> 
> > But I don't think it makes sense to go through more implementation/review
> > cycles without some agreement from a larger group regarding the
> > desired/intended interface.  Should there be a column for "parent dir" ?  
> > Or a
> > column for "is_dir" ?  Should dirs be shown at all, or only files ?
> 
> IMO: is_dir should be there (and subdirs should be listed), but
> parent_dir should not appear.  Also, the "path" should show the complete
> pathname, including containing dirs, starting from whatever the "root"
> is for the operation.
> 
> So for the example in the initial email, it would look like
> 
> path  isdir
> pgsql_tmp11025.0.sharedfileset/   t
> pgsql_tmp11025.0.sharedfileset/0.0f
> pgsql_tmp11025.0.sharedfileset/1.0f
> 
> plus additional columns, same as pg_ls_waldir et al.
> 
> I'd rather not have the code assume that there's a single level of
> subdirs, or assuming that an entry in the subdir cannot itself be a dir;
> that might end up hiding files for no good reason.
> 

Thanks for your input, see attached.

I'm not sure if prefer the 0002 patch alone (which recurses into dirs all at
once during the initial call), or 0002+3+4, which incrementally reads the dirs
on each call (but requires keeping dirs opened).

> I don't understand what purpose is served by having pg_ls_waldir() hide
> directories.

We could talk about whether the other functions should show dirs, if it's worth
breaking their return type.  Or if they should show hidden or special files,
which doesn't require breaking the return.  But until then I am to leave the
behavior alone.

-- 
Justin
>From a5b9a03445d1c768662cafebd8ab3bd7a62890aa Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 27 Dec 2019 23:34:14 -0600
Subject: [PATCH v6 1/4] BUG: in errmsg

Note there's two changes here.
Should backpatch to v12, where pg_ls_tmpdir was added.
---
 src/backend/utils/adt/genfile.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index 3741b87486..897b11a77d 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -590,7 +590,7 @@ pg_ls_dir_files(FunctionCallInfo fcinfo, const char *dir, bool missing_ok)
 		if (stat(path, ) < 0)
 			ereport(ERROR,
 	(errcode_for_file_access(),
-	 errmsg("could not stat directory \"%s\": %m", dir)));
+	 errmsg("could not stat file \"%s\": %m", path)));
 
 		/* Ignore anything but regular files */
 		if (!S_ISREG(attrib.st_mode))
-- 
2.17.0

>From 2437276b9c7525981d4a70b804c81021b2f5fa1f Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 14 Dec 2019 16:22:15 -0600
Subject: [PATCH v6 2/4] pg_ls_tmpdir to show directories

See also 9cd92d1a33699f86aa53d44ab04cc3eb50c18d11
---
 doc/src/sgml/func.sgml   |  13 +--
 src/backend/utils/adt/genfile.c  | 133 +++
 src/include/catalog/catversion.h |   2 +-
 src/include/catalog/pg_proc.dat  |   8 +-
 4 files changed, 114 insertions(+), 42 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 323366feb6..35abff16c9 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21382,8 +21382,9 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());

setof record

-List the name, size, and last modification time of files in the
-temporary directory for tablespace.  If
+For the temporary directory for tablespace,
+list each file's name, size, last modification time, and boolean
+indicating if it is a directory.  Directories are shown recursively.  If
 tablespace is not provided, the
 pg_default tablespace is used.  Access is granted
 to members of the pg_monitor role and may be
@@ -21479,9 +21480,11 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 pg_ls_tmpdir


-pg_ls_tmpdir returns the name, size, and last modified
-time (mtime) of each file in the temporary file directory for the specified
-tablespace.  If tablespace is
+pg_ls_tmpdir lists each file in the temporary file
+directory for the specified tablespace, along with
+its size, last modified time (mtime) and boolean indicating if it is a
+directory.  Directories are used for temporary files used by parallel
+processes, and are shown recursively.  If tablespace is
 not provided, the pg_default tablespace is used.  By
 default only superusers and members of the pg_monitor
 role can use this function.  Access may be granted to others using
diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index 897b11a77d..c5148f547b 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -40,6 +40,11 @@ typedef struct
 	char	   

Re: Resume vacuum and autovacuum from interruption and cancellation

2020-03-05 Thread Ibrar Ahmed
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   not tested
Documentation:not tested

Please fix the regression test cases.

The new status of this patch is: Waiting on Author


Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-05 Thread Magnus Hagander
On Wed, Mar 4, 2020 at 11:15 PM Peter Eisentraut
 wrote:
>
> On 2020-03-05 05:53, Fujii Masao wrote:
> > Or, as another approach, it might be worth considering to make
> > the server always estimate the total backup size whether --progress is
> > specified or not, as Amit argued upthread. If the time required to
> > estimate the backup size is negligible compared to total backup time,
> > IMO this approach seems better. If we adopt this, we can also get
> > rid of PROGESS option from BASE_BACKUP replication command.
>
> I think that would be preferable.

>From a UI perspective I definitely agree.

The problem with that one is that it can take a non-trivlal amount of
time, that's why it was made an option (in the protocol) in the first
place. Particularly if you have a database with many small objets.

Is it enough to care about? I'm not sure, but it's definitely
something to consider. It was not negligible in some tests I ran then,
but it is quite some time ago and reality has definitely changed since
then.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: Retiring support for pre-7.3 FK constraint triggers

2020-03-05 Thread Tom Lane
Daniel Gustafsson  writes:
> On 5 Mar 2020, at 15:42, Tom Lane  wrote:
>> +1 --- I think this fits in well with my nearby proposal to remove
>> OPAQUE, which is also only relevant for pre-7.3 dumps.  Let's just
>> nuke that stuff.

> Sounds good. I was opting for 14 to not violate the no new patches in an 
> ongoing CF policy, but if there is concensus from committers then +1 from me.

As long as we're thinking of zapping code that is long past its sell-by
date, I propose getting rid of this stanza in indexcmds.c, which
basically causes CREATE INDEX to ignore certain opclass specifications:

/*
 * Release 7.0 removed network_ops, timespan_ops, and datetime_ops, so we
 * ignore those opclass names so the default *_ops is used.  This can be
 * removed in some later release.  bjm 2000/02/07
 *
 * Release 7.1 removes lztext_ops, so suppress that too for a while.  tgl
 * 2000/07/30
 *
 * Release 7.2 renames timestamp_ops to timestamptz_ops, so suppress that
 * too for awhile.  I'm starting to think we need a better approach. tgl
 * 2000/10/01
 *
 * Release 8.0 removes bigbox_ops (which was dead code for a long while
 * anyway).  tgl 2003/11/11
 */
if (list_length(opclass) == 1)
{
char   *claname = strVal(linitial(opclass));

if (strcmp(claname, "network_ops") == 0 ||
strcmp(claname, "timespan_ops") == 0 ||
strcmp(claname, "datetime_ops") == 0 ||
strcmp(claname, "lztext_ops") == 0 ||
strcmp(claname, "timestamp_ops") == 0 ||
strcmp(claname, "bigbox_ops") == 0)
opclass = NIL;
}


At some point, the risk that this causes problems for developers of
new opclasses must outweigh the value of silently upgrading old dumps.
I think if we're zapping other pre-7.3-compatibility hacks for that
purpose, this one could go too.

Elsewhere in indexcmds.c, there's this:

/*
 * Hack to provide more-or-less-transparent updating of old RTREE
 * indexes to GiST: if RTREE is requested and not found, use GIST.
 */
if (strcmp(accessMethodName, "rtree") == 0)
{
ereport(NOTICE,
(errmsg("substituting access method \"gist\" for obsolete 
method \"rtree\"")));
accessMethodName = "gist";
tuple = SearchSysCache1(AMNAME, PointerGetDatum(accessMethodName));
}

which dates to 8.2 (2a8d3d83e of 2005-11-07).  This is less bad than the
other thing, since it won't affect the behavior of any command that
wouldn't otherwise just fail; but maybe its time has passed as well?
Although Alvaro's point comparing these behaviors to pg_dump's support
cutoff of 8.0 suggests that maybe we should leave this one for now.

regards, tom lane




Re: Retiring support for pre-7.3 FK constraint triggers

2020-03-05 Thread Daniel Gustafsson
> On 5 Mar 2020, at 15:42, Tom Lane  wrote:
> 
> Alvaro Herrera  writes:
>>> On 2020-Mar-05, Daniel Gustafsson wrote:
>>> While looking at the tg_updatedcols patch I happened to notice that we still
>>> support pre-7.3 constraint triggers by converting them on the fly.  AFAICT 
>>> this
>>> requires a pre-7.3 dump to hit.
> 
>> I know it's a late in the cycle for patches in commitfest, but why not
>> consider this for pg13 nonetheless?  It seems simple enough.  Also, per
>> https://coverage.postgresql.org/src/backend/commands/trigger.c.gcov.html
>> this is the only large chunk of uncovered code in commands/trigger.c.
> 
> +1 --- I think this fits in well with my nearby proposal to remove
> OPAQUE, which is also only relevant for pre-7.3 dumps.  Let's just
> nuke that stuff.

Sounds good. I was opting for 14 to not violate the no new patches in an 
ongoing CF policy, but if there is concensus from committers then +1 from me.

cheers ./daniel



Re: useless RangeIOData->typiofunc

2020-03-05 Thread Alvaro Herrera
On 2020-Mar-05, Tom Lane wrote:

> Alvaro Herrera  writes:
> > Thanks -- ISTM it makes more sense to put the FmgrInfo before the
> > typioparam too:
> 
> > typedef struct RangeIOData
> > {
> > TypeCacheEntry *typcache;/* range type's typcache entry */
> > FmgrInfoproc;/* element type's I/O function */
> > Oid typioparam;  /* element type's I/O parameter */
> > } RangeIOData;
> 
> Yeah, WFM.  Maybe even rename the FmgrInfo to "typioproc"
> or the like?

Good idea, thanks!  Pushed with that change.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Retiring support for pre-7.3 FK constraint triggers

2020-03-05 Thread David Steele

On 3/5/20 9:42 AM, Tom Lane wrote:

Alvaro Herrera  writes:

On 2020-Mar-05, Daniel Gustafsson wrote:

While looking at the tg_updatedcols patch I happened to notice that we still
support pre-7.3 constraint triggers by converting them on the fly.  AFAICT this
requires a pre-7.3 dump to hit.



I know it's a late in the cycle for patches in commitfest, but why not
consider this for pg13 nonetheless?  It seems simple enough.  Also, per
https://coverage.postgresql.org/src/backend/commands/trigger.c.gcov.html
this is the only large chunk of uncovered code in commands/trigger.c.


+1 --- I think this fits in well with my nearby proposal to remove
OPAQUE, which is also only relevant for pre-7.3 dumps.  Let's just
nuke that stuff.


+1.  CF entry added:

https://commitfest.postgresql.org/27/2506

Regards,
--
-David
da...@pgmasters.net




Re: Retiring support for pre-7.3 FK constraint triggers

2020-03-05 Thread Tom Lane
Alvaro Herrera  writes:
> On 2020-Mar-05, Daniel Gustafsson wrote:
>> While looking at the tg_updatedcols patch I happened to notice that we still
>> support pre-7.3 constraint triggers by converting them on the fly.  AFAICT 
>> this
>> requires a pre-7.3 dump to hit.

> I know it's a late in the cycle for patches in commitfest, but why not
> consider this for pg13 nonetheless?  It seems simple enough.  Also, per
> https://coverage.postgresql.org/src/backend/commands/trigger.c.gcov.html
> this is the only large chunk of uncovered code in commands/trigger.c.

+1 --- I think this fits in well with my nearby proposal to remove
OPAQUE, which is also only relevant for pre-7.3 dumps.  Let's just
nuke that stuff.

regards, tom lane




Re: Should we remove a fallback promotion? take 2

2020-03-05 Thread Robert Haas
On Thu, Mar 5, 2020 at 8:48 AM Fujii Masao  wrote:
> We discussed the $SUBJECT six years ago at the following thread.
> https://postgr.es/m/CAHGQGwGYkF+CvpOMdxaO=+anazc1oo9o4lqwo50mxpvfj+0...@mail.gmail.com
>
> Seems our consensus at that discussion was to leave a fallback
> promotion for a release or two for debugging purpose or as
> an emergency method because fast promotion might have
> some issues, and then to remove it later. Now, more than six years
> have already passed since that discussion. Is there still
> any reason to keep a fallback promotion? If nothing, I'd like to
> drop it from v13.

Seems reasonable, but it would be better if people proposed these
kinds of changes closer to the beginning of the release cycle rather
than in the crush at the end.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [Proposal] Global temporary tables

2020-03-05 Thread Robert Haas
On Thu, Mar 5, 2020 at 9:19 AM tushar  wrote:
> WARNING:  relfilenode 13589/1663/19063 not exist in gtt shared hash when 
> forget
> ERROR:  out of shared memory
> HINT:  You might need to increase max_active_gtt.
>
> also , would be great  if we can make this error message  user friendly like  
> - "max connection reached"  rather than memory error

That would be nice, but the bigger problem is that the WARNING there
looks totally unacceptable. It's looks like it's complaining of some
internal issue (i.e. a bug or corruption) and the grammar is poor,
too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Retiring support for pre-7.3 FK constraint triggers

2020-03-05 Thread Alvaro Herrera
On 2020-Mar-05, Daniel Gustafsson wrote:

> While looking at the tg_updatedcols patch I happened to notice that we still
> support pre-7.3 constraint triggers by converting them on the fly.  AFAICT 
> this
> requires a pre-7.3 dump to hit.
> 
> This was added in late 2007 in a2899ebdc28080eab0f4bb0b8a5f30aa7bb31a89 due to
> a report from the field, but I doubt this codepath is excercised much today 
> (if
> at all).

pg_dump's support for server versions prior to 8.0 was removed by commit
64f3524e2c8d (Oct 2016) so it seems fair to remove this too.  If people
need to upgrade from anything older than 7.3, they can do an intermediate jump.

> Having code which is untested and not excercised by developers (or users, if 
> my
> assumption holds), yet being reachable by SQL, runs the risk of introducing
> subtle bugs.  Is there a usecase for keeping it, or can/should it be removed 
> in
> 14?  That would still leave a lot of supported versions to upgrade to in case
> there are users to need this.  Unless there are immediate -1's, I'll park this
> in a CF for v14.

I know it's a late in the cycle for patches in commitfest, but why not
consider this for pg13 nonetheless?  It seems simple enough.  Also, per
https://coverage.postgresql.org/src/backend/commands/trigger.c.gcov.html
this is the only large chunk of uncovered code in commands/trigger.c.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-05 Thread Laurenz Albe
On Thu, 2020-03-05 at 19:40 +1300, David Rowley wrote:
> I think we need to move forward with doing something to cope with
> INSERT-only tables not being auto-vacuumed.
> 
> I think the patch you have is something along the lines to what I'd
> have imagined we should do. However, there are a few things that I'd
> do a different way.

Thanks for the review, and that's good news.

> 1. I'd go for 2 new GUCs and reloptions.
> autovacuum_vacuum_insert_threshold (you're currently calling this
> autovacuum_vacuum_insert_limit. I don't see why the word "limit" is
> relevant here). The other GUC I think should be named
> autovacuum_vacuum_insert_scale_factor and these should work exactly
> the same way as autovacuum_vacuum_threshold and
> autovacuum_vacuum_scale_factor, but be applied in a similar way to the
> vacuum settings, but only be applied after we've checked to ensure the
> table is not otherwise eligible to be vacuumed.

Yes, "threshold" is better than "limit" I have renamed the GUC and
the reloption.

I disagree about the scale_factor (and have not added it to the
updated version of the patch).  If we have a scale_factor, then the
time between successive autovacuum runs would increase as the table
gets bigger, which defeats the purpose of reducing the impact of each
autovacuum run.

Since autovacuum skips pages where it has nothing to do, we can expect
that runs on a large table won't be much more expensive than runs on a
smaller table, right?

> 3. The name "insert_only" does not seem the best for the new boolean
> variable that you're using in various places.  That name seems to be
> too closely related to our current intended use case. Maybe
> skip_index_cleanup is more to the point.

I originally called the variable "skip_indexes", but when I decided
that such vacuum runs also aggressively freeze the table, I thought
that the name was misleading and renamed it.

I won't put up a fight about this, though.

> 4. Are you sure you mean "Maximum" here?  Isn't it the minimum? At
> least it will be once you add both options. Otherwise, I think Maximum
> is not the correct word. Perhaps "The threshold"
> 
> + {"autovacuum_vacuum_insert_limit", PGC_SIGHUP, AUTOVACUUM,
> + gettext_noop("Maximum number of tuple inserts prior to vacuum."),
> + NULL
> + },

I had actually been debating whether to use "maximum" or "minimum".
I realize now that this strange uncertainty stems from the fact that
there is (yet) only a single parameter to govern this.

The updated patch desctibes the GUC as
"Number of tuple inserts prior to vacuum."

> 5. I think the new field in this struct should be named 
> vacuum_insert_threshold
> 
> @@ -252,6 +252,7 @@ typedef struct AutoVacOpts
>  {
>   bool enabled;
>   int vacuum_threshold;
> + int vacuum_ins_limit;

I agree as above, renamed.

> 6. Are you sure you meant to default this to 50?
> 
> index e58e4788a8..9d96d58ed2 100644
> --- a/src/backend/utils/misc/postgresql.conf.sample
> +++ b/src/backend/utils/misc/postgresql.conf.sample
> @@ -598,6 +598,8 @@
>  #autovacuum_naptime = 1min # time between autovacuum runs
>  #autovacuum_vacuum_threshold = 50 # min number of row updates before
>   # vacuum
> +#autovacuum_vacuum_insert_limit = 50 # max number of row inserts before
> + # vacuum
> 
> Seems excessive given there's no scale factor in the current patch.

That was a mistake.
I chose 1000 as the actual default value, but forgot to put the
same value into "postgresql.conf".

> 7. I know you know missing docs... would be good to get those.

The updated version of the patch has documentation.

I just wanted to get a feeling if my patch would be killed cold before
I went to the effort of writing documentation.

> 8. Should we care when setting the insert counter back to 0 if
> auto-vacuum has skipped pages?

Since this is only an approximate value anyway, I decided not to care.
I don't know if that is acceptable.

> 9. You should add a new column to the pg_stat_all_tables view to allow
> visibility of the insert since the last vacuum.  The column should be
> named n_ins_since_vacuum. This seems like the best combination of
> n_mod_since_analyze and n_tup_ins.

Done.

> 10. I'm slightly worried about the case where we don't quite trigger a
> normal vacuum but trigger a vacuum due to INSERTs then skip cleaning
> up the indexes but proceed to leave dead index entries causing indexes
> to become bloated.  It does not seem impossible that given the right
> balance of INSERTs and UPDATE/DELETEs that this could happen every
> time and the indexes would just become larger and larger.

I understand.

This might particularly be a problem with larger tables, where
a normal autovacuum is rare because of the scale_factor.

Perhaps we can take care of the problem by *not* skipping index
cleanup if "changes_since_analyze" is substantially greater than 0.

What do you think?

> 11. We probably do also need to debate if we want this on or off by
> default.   I'd have leaned towards enabling by 

Re: useless RangeIOData->typiofunc

2020-03-05 Thread Tom Lane
Alvaro Herrera  writes:
> Thanks -- ISTM it makes more sense to put the FmgrInfo before the
> typioparam too:

> typedef struct RangeIOData
> {
> TypeCacheEntry *typcache;/* range type's typcache entry */
> FmgrInfoproc;/* element type's I/O function */
> Oid typioparam;  /* element type's I/O parameter */
> } RangeIOData;

Yeah, WFM.  Maybe even rename the FmgrInfo to "typioproc"
or the like?

regards, tom lane




Re: useless RangeIOData->typiofunc

2020-03-05 Thread Alvaro Herrera
On 2020-Mar-04, Tom Lane wrote:

> Hm, I'm not sure that really lessens the cognitive load any, but
> if you do commit this please fix the dangling reference you left
> in the nearby comment:
> 
>  {
> TypeCacheEntry *typcache;   /* range type's typcache entry */
> -   Oid typiofunc;  /* element type's I/O function */
> Oid typioparam; /* element type's I/O parameter */
> FmgrInfoproc;   /* lookup result for typiofunc */
>^^^
>  } RangeIOData;

Thanks -- ISTM it makes more sense to put the FmgrInfo before the
typioparam too:

typedef struct RangeIOData
{
TypeCacheEntry *typcache;/* range type's typcache entry */
FmgrInfoproc;/* element type's I/O function */
Oid typioparam;  /* element type's I/O parameter */
} RangeIOData;

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/utils/adt/rangetypes.c b/src/backend/utils/adt/rangetypes.c
index 490bc2ae81..c6bb21219b 100644
--- a/src/backend/utils/adt/rangetypes.c
+++ b/src/backend/utils/adt/rangetypes.c
@@ -49,9 +49,8 @@
 typedef struct RangeIOData
 {
 	TypeCacheEntry *typcache;	/* range type's typcache entry */
-	Oid			typiofunc;		/* element type's I/O function */
+	FmgrInfo	proc;			/* element type's I/O function */
 	Oid			typioparam;		/* element type's I/O parameter */
-	FmgrInfo	proc;			/* lookup result for typiofunc */
 } RangeIOData;
 
 
@@ -309,6 +308,7 @@ get_range_io_data(FunctionCallInfo fcinfo, Oid rngtypid, IOFuncSelector func)
 		bool		typbyval;
 		char		typalign;
 		char		typdelim;
+		Oid			typiofunc;
 
 		cache = (RangeIOData *) MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
    sizeof(RangeIOData));
@@ -324,9 +324,9 @@ get_range_io_data(FunctionCallInfo fcinfo, Oid rngtypid, IOFuncSelector func)
 		 ,
 		 ,
 		 >typioparam,
-		 >typiofunc);
+		 );
 
-		if (!OidIsValid(cache->typiofunc))
+		if (!OidIsValid(typiofunc))
 		{
 			/* this could only happen for receive or send */
 			if (func == IOFunc_receive)
@@ -340,7 +340,7 @@ get_range_io_data(FunctionCallInfo fcinfo, Oid rngtypid, IOFuncSelector func)
 		 errmsg("no binary output function available for type %s",
 format_type_be(cache->typcache->rngelemtype->type_id;
 		}
-		fmgr_info_cxt(cache->typiofunc, >proc,
+		fmgr_info_cxt(typiofunc, >proc,
 	  fcinfo->flinfo->fn_mcxt);
 
 		fcinfo->flinfo->fn_extra = (void *) cache;


Re: [Proposal] Global temporary tables

2020-03-05 Thread tushar

On 3/3/20 2:10 PM, 曾文旌(义从) wrote:

I fixed in global_temporary_table_v16-pg13.patch.


Please refer this scenario -

--Connect to psql -

postgres=# alter system set max_active_global_temporary_table =1;
ALTER SYSTEM

--restart the server (./pg_ctl -D data restart)

--create global temp table

postgres=# create global temp  table ccc1  (c int);
CREATE TABLE

--Try to Create another global temp table

*postgres=# create global temp  table ccc2  (c int);**
**WARNING:  relfilenode 13589/1663/19063 not exist in gtt shared hash 
when forget**

**ERROR:  out of shared memory**
**HINT:  You might need to increase max_active_gtt.**
*
postgres=# show max_active_gtt;
ERROR:  unrecognized configuration parameter "max_active_gtt"
postgres=#
postgres=# show max_active_global_temporary_table ;
 max_active_global_temporary_table
---
 1
(1 row)

postgres=#

I cannot find "max_active_gtt"  GUC . I think you are referring to  
"max_active_global_temporary_table" here ?


also , would be great  if we can make this error message  user friendly 
like  - "max connection reached"  rather than memory error


--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company



Re: proposal: schema variables

2020-03-05 Thread Asif Rehman
On Sat, Feb 29, 2020 at 2:10 PM Pavel Stehule 
wrote:

>
>
> pá 28. 2. 2020 v 16:30 odesílatel Pavel Stehule 
> napsal:
>
>>
>>
>> čt 27. 2. 2020 v 15:37 odesílatel Pavel Stehule 
>> napsal:
>>
>>>
>>> Hi
>>>
>>>
 3) Any way to define CONSTANTs ?
 We already talked a bit about this subject and also Gilles Darold
 introduces it in this mailing-list topic but I'd like to insist on it.
 I think it would be nice to have a way to say that a variable should
 not be changed once defined.
 Maybe it's hard to implement and can be implemented later, but I just
 want to know if this concern is open.

>>>
>>> I played little bit with it and I didn't find any nice solution, but
>>> maybe I found the solution. I had ideas about some variants, but almost all
>>> time I had a problem with parser's shifts because all potential keywords
>>> are not reserved.
>>>
>>> last variant, but maybe best is using keyword WITH
>>>
>>> So the syntax can looks like
>>>
>>> CREATE [ TEMP ] VARIABLE varname [ AS ] type [ NOT NULL ] [ DEFAULT
>>> expression ] [ WITH [ OPTIONS ] '(' ... ')' ] ]
>>>
>>> What do you think about this syntax? It doesn't need any new keyword,
>>> and it easy to enhance it.
>>>
>>> CREATE VARIABLE foo AS int DEFAULT 10 WITH OPTIONS ( CONSTANT);
>>>
>>
>> After some more thinking and because in other patch I support syntax
>> CREATE TRANSACTION VARIABLE ... I change my opinion and implemented support
>> for
>> syntax CREATE IMMUTABLE VARIABLE for define constants.
>>
>
> second try to fix pg_dump
>
> Regards
>
> Pavel
>
>
>>
>> See attached patch
>>
>> Regards
>>
>> Pavel
>>
>>
>>>
>>> ?
>>>
>>> Regards
>>>
>>> Pavel
>>>
>>>
>>>
Hi Pavel,

I have been reviewing the latest patch (schema-variables-20200229.patch.gz)
and here are few comments:

1- There is a compilation error, when compiled with --with-llvm enabled on
CentOS 7.

llvmjit_expr.c: In function ‘llvm_compile_expr’:
llvmjit_expr.c:1090:5: warning: initialization from incompatible pointer
type [enabled by default]
 build_EvalXFunc(b, mod, "ExecEvalParamVariable",
 ^
llvmjit_expr.c:1090:5: warning: (near initialization for ‘(anonymous)[0]’)
[enabled by default]
llvmjit_expr.c:1090:5: warning: initialization from incompatible pointer
type [enabled by default]
llvmjit_expr.c:1090:5: warning: (near initialization for ‘(anonymous)[0]’)
[enabled by default]
llvmjit_expr.c:1090:5: warning: initialization from incompatible pointer
type [enabled by default]
llvmjit_expr.c:1090:5: warning: (near initialization for ‘(anonymous)[0]’)
[enabled by default]
llvmjit_expr.c:1090:5: warning: passing argument 5 of ‘build_EvalXFuncInt’
from incompatible pointer type [enabled by default]
llvmjit_expr.c:60:21: note: expected ‘struct ExprEvalStep *’ but argument
is of type ‘LLVMValueRef’
 static LLVMValueRef build_EvalXFuncInt(LLVMBuilderRef b, LLVMModuleRef mod,
 ^
llvmjit_expr.c:1092:29: error: ‘i’ undeclared (first use in this function)
 LLVMBuildBr(b, opblocks[i + 1]);
 ^
llvmjit_expr.c:1092:29: note: each undeclared identifier is reported only
once for each function it appears in
make[2]: *** [llvmjit_expr.o] Error 1



After looking into it, it turns out that:
- parameter order was incorrect in build_EvalXFunc()
- LLVMBuildBr() is using the undeclared variable 'i' whereas it should be
using 'opno'.


2- Similarly, If the default expression is referencing a function or object,
dependency should be marked, so if the function is not dropped silently.
otherwise, a cache lookup error will come.

postgres=# create or replace function foofunc() returns timestamp as $$
begin return now(); end; $$ language plpgsql;
CREATE FUNCTION
postgres=# create schema test;
CREATE SCHEMA
postgres=# create variable test.v1 as timestamp default foofunc();
CREATE VARIABLE
postgres=# drop function foofunc();
DROP FUNCTION
postgres=# select test.v1;
ERROR:  cache lookup failed for function 16437



3- Variable DEFAULT expression is apparently being evaluated at the time of
first access. whereas I think that It should be at the time of variable
creation. consider the following example:

postgres=# create variable test.v2 as timestamp default now();
CREATE VARIABLE
postgres=# select now();
  now
---
 2020-03-05 12:13:29.775373+00
(1 row)
postgres=# select test.v2;
 v2

 2020-03-05 12:13:37.192317 -- I was expecting this to be earlier than the
above timestamp.
(1 row)

postgres=# select test.v2;
 v2

 2020-03-05 12:13:37.192317
(1 row)
postgres=# let test.v2 = default;
LET
postgres=# select test.v2;
 v2

 2020-03-05 12:14:07.538615
(1 row)


To continue my testing of the patch I made few fixes for the above-mentioned
comments. The patch for those changes is attached if it could be of any use.

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : 

Re: Option to dump foreign data in pg_dump

2020-03-05 Thread Luis Carril
Hi everyone,

I am just responding on the latest mail on this thread. But the question is 
about functionality. The proposal is to add a single flag 
--include-foreign-data which controls whether or not data is dumped for all the 
foreign tables in a database. That may not serve the purpose. A foreign table 
may point to a view, materialized view or inheritance tree, and so on. A 
database can have foreign tables pointing to all of those kinds. Restoring data 
to a view won't be possible and restoring it into an inheritance tree would 
insert it into the parent only and not the children. Further, a user may not 
want the data to be dumped for all the foreign tables since their usages are 
different esp. considering restore. I think a better option is to extract data 
in a foreign table using --table if that's the only usage. Otherwise, we need a 
foreign table level flag indicating whether pg_dump should dump the data for 
that foreign table or not.

The option enables the user to dump data of tables backed by a specific 
foreign_server. It is up to the user to guarantee that the foreign server is 
also writable, that is the reason to make the option opt-in. The option can be 
combined with --table to dump specific tables if needed. If the user has 
different foreign servers in the database has to make the conscious decision of 
dumping each one of them. Without this option the user is totally unable to do 
it.


> On 2020-01-21 10:36, Luis Carril wrote:
>>> Yes we can support --include-foreign-data without parallel option and
>>> later add support for parallel option as a different patch.
>>
>> Hi,
>>
>>  I've attached a new version of the patch in which an error is
>> emitted if the parallel backup is used with the --include-foreign-data
>> option.
>
> This seems like an overreaction.  The whole point of
> lockTableForWorker() is to avoid deadlocks, but foreign tables don't
> have locks, so it's not a problem.  I think you can just skip foreign
> tables in lockTableForWorker() using the same logic that getTables() uses.
>
> I think parallel data dump would be an especially interesting option
> when using foreign tables, so it's worth figuring this out.

What do you think of Peter's comment?
I took a look at it, we could skip foreign tables by checking the catalog in 
lockTableForWorker but this would imply an extra query per call to the function 
(as in getTables), which would be irrelevant for most of the cases. Or we could 
pass in the TocEntry that it is a foreign table (although that seems highly 
specific).
Also, would it not be possible to offer support of LOCK TABLE on foreign tables?

At this point I would like to leave the patch as is, and discuss further 
improvement in a future patch.

Luis M.


From: Ashutosh Bapat 
Sent: Wednesday, March 4, 2020 5:39 PM
To: David Steele 
Cc: Luis Carril ; vignesh C ; 
Peter Eisentraut ; Alvaro Herrera 
; Daniel Gustafsson ; Laurenz Albe 
; PostgreSQL Hackers 

Subject: Re: Option to dump foreign data in pg_dump

I am just responding on the latest mail on this thread. But the question is 
about functionality. The proposal is to add a single flag 
--include-foreign-data which controls whether or not data is dumped for all the 
foreign tables in a database. That may not serve the purpose. A foreign table 
may point to a view, materialized view or inheritance tree, and so on. A 
database can have foreign tables pointing to all of those kinds. Restoring data 
to a view won't be possible and restoring it into an inheritance tree would 
insert it into the parent only and not the children. Further, a user may not 
want the data to be dumped for all the foreign tables since their usages are 
different esp. considering restore. I think a better option is to extract data 
in a foreign table using --table if that's the only usage. Otherwise, we need a 
foreign table level flag indicating whether pg_dump should dump the data for 
that foreign table or not.

On Wed, Mar 4, 2020 at 12:41 AM David Steele 
mailto:da...@pgmasters.net>> wrote:
Hi Luis,

On 1/29/20 11:05 AM, Peter Eisentraut wrote:
> On 2020-01-21 10:36, Luis Carril wrote:
>>> Yes we can support --include-foreign-data without parallel option and
>>> later add support for parallel option as a different patch.
>>
>> Hi,
>>
>>  I've attached a new version of the patch in which an error is
>> emitted if the parallel backup is used with the --include-foreign-data
>> option.
>
> This seems like an overreaction.  The whole point of
> lockTableForWorker() is to avoid deadlocks, but foreign tables don't
> have locks, so it's not a problem.  I think you can just skip foreign
> tables in lockTableForWorker() using the same logic that getTables() uses.
>
> I think parallel data dump would be an especially interesting option
> when using foreign tables, so it's worth figuring this out.

What do you think of Peter's comment?

Regards,
--
-David

Should we remove a fallback promotion? take 2

2020-03-05 Thread Fujii Masao

Hi,

We discussed the $SUBJECT six years ago at the following thread.
https://postgr.es/m/CAHGQGwGYkF+CvpOMdxaO=+anazc1oo9o4lqwo50mxpvfj+0...@mail.gmail.com

Seems our consensus at that discussion was to leave a fallback
promotion for a release or two for debugging purpose or as
an emergency method because fast promotion might have
some issues, and then to remove it later. Now, more than six years
have already passed since that discussion. Is there still
any reason to keep a fallback promotion? If nothing, I'd like to
drop it from v13.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: Allow auto_explain to log plans before queries are executed

2020-03-05 Thread Julien Rouhaud
On Thu, Feb 27, 2020 at 7:31 AM Julien Rouhaud  wrote:
>
> On Thu, Feb 27, 2020 at 7:12 AM Pavel Stehule  wrote:
> >
> > čt 27. 2. 2020 v 7:01 odesílatel Yugo NAGATA  napsal:
> >>  I think "query debugger" feature you proposed is out of scope of
> >> auto_explain module. I also think the feature to analyze running
> >> query online is great, but we will need another discussion on a new
> >> module or eature for it.
> >
> >
> > sure. My note was about using auto_explain like query_debugger. It has not 
> > too sense, and from this perspective, the original proposal to log plan 
> > before execution has more sense.
> >
> > you can log every plan with higher cost than some constant.
>
> Yes I thought about that too.  If you're not in an OLAP environment
> (or with a specific user running few expensive queries), setup an
> auto_explain.log_before_execution_min_cost.

There was some discussion but no clear consensus on what should really
be done.  I'm marking the patch as waiting on author which seems more
accurate.  Feel free to switch it back if that's a wrong move.




Retiring support for pre-7.3 FK constraint triggers

2020-03-05 Thread Daniel Gustafsson
While looking at the tg_updatedcols patch I happened to notice that we still
support pre-7.3 constraint triggers by converting them on the fly.  AFAICT this
requires a pre-7.3 dump to hit.

This was added in late 2007 in a2899ebdc28080eab0f4bb0b8a5f30aa7bb31a89 due to
a report from the field, but I doubt this codepath is excercised much today (if
at all).

Having code which is untested and not excercised by developers (or users, if my
assumption holds), yet being reachable by SQL, runs the risk of introducing
subtle bugs.  Is there a usecase for keeping it, or can/should it be removed in
14?  That would still leave a lot of supported versions to upgrade to in case
there are users to need this.  Unless there are immediate -1's, I'll park this
in a CF for v14.

cheers ./daniel



remove_pre73_fks.patch
Description: Binary data


Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2020-03-05 Thread Ibrar Ahmed
On Sat, Jun 29, 2019 at 12:56 AM Amit Kapila 
wrote:

> On Thu, Jun 27, 2019 at 11:02 AM Pavan Deolasee
>  wrote:
> >
> >>> On 2019-04-07 18:04:27 -0700, Andres Freund wrote:
> >>> > Here's a *prototype* patch for this.  It only implements what I
> >>> > described for heap_multi_insert, not for plain inserts. I wanted to
> see
> >>> > what others think before investing additional time.
> >>>
> >>> Pavan, are you planning to work on this for v13 CF1? Or have you lost
> >>> interest on the topic?
> >>
> >>
> >> Yes, I plan to work on it.
> >>
> >
> > I am sorry, but I am not able to find time to get back to this because
> of other high priority items. If it still remains unaddressed in the next
> few weeks, I will pick it up again. But for now, I am happy if someone
> wants to pick and finish the work.
> >
>
> Fair enough, I have marked the entry [1] in the coming CF as "Returned
> with Feedback".  I hope that is okay with you.
>
> [1] - https://commitfest.postgresql.org/23/2009/
>
>

Hi,

As Pavan mentioned in the last email that he is no longer interested in
that, so I want to
take the lead and want to finish that. It is a bug and needs to be fixed.
I have rebased and the patch with the latest master and added some test
cases (borrowed from Pavan's patch), and did some performance testing with
a table size of
700MB (10Millions rows)

COPY WITH FREEZE took 21406.692ms and VACUUM took 2478.666ms
COPY WITH FREEZE took 23095.985ms and VACUUM took 26.309ms

The performance decrease in copy with the patch is only 7%, but we get
quite adequate performance in VACUUM. In any case, this is a bug fix, so we
can ignore
the performance hit.

There are two issues left to address.

1 - Andres: It only implements what I described for heap_multi_insert, not
for plain inserts.
I wanted to see what others think before investing additional time.

In which condition we need that for plain inserts?

2 - Andres:  I think we should remove the WAL logging from
visibilitymap_set(), and
move it to a separate, heap specific, function. Right now different
tableams e.g. would have to reimplement visibilitymap_set(), so that's a
second need to separate that functionality.  Let me try to come up with
a proposal.




-- 
Ibrar Ahmed


0003-copy-freeze-should-actually-freeze-right.patch
Description: Binary data


Re: logical replication empty transactions

2020-03-05 Thread Euler Taveira
On Thu, 5 Mar 2020 at 05:45, Amit Kapila  wrote:

> Euler, can we try to update the patch based on the number of
> transactions threshold and see how it works?
>
> I will do.


-- 
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: allow trigger to get updated columns

2020-03-05 Thread Daniel Gustafsson
> On 24 Feb 2020, at 10:58, Peter Eisentraut  
> wrote:
> 
> This is a change to make the bitmap of updated columns available to a trigger 
> in TriggerData.  This is the same idea as was recently done to generated 
> columns [0]: Generic triggers such as tsvector_update_trigger can use this 
> information to skip work if the columns they are interested in haven't 
> changed.  With the generated columns change, perhaps this isn't so 
> interesting anymore, but I suspect a lot of existing installations still use 
> tsvector_update_trigger.  In any case, since I had already written the code, 
> I figured I post it here.  Perhaps there are other use cases.

I wouldn't at all be surprised if there are usecases for this in the wild, and
given the very minor impact I absolutely think it's worth doing.  The patches
both apply, compile and pass tests without warnings.

The 0001 refactoring patch seems a clear win to me.

In the 0002 patch:

+For UPDATE triggers, a bitmap set indicating the
+columns that were updated by the triggering command.  Generic trigger

Is it worth pointing out that tg_updatedcols will be NULL rather than an empty
Bitmapset for non-UPDATE triggers?  bitmapset.c treats NULL as an empty bitmap
but since a Bitmapset can be allocated but empty, maybe it's worth being
explicit to help developers?

There isn't really a test suite that excercises this IIUC, how about adding
something like the attached diff to contrib/lo?  It seemed like a lower impact
change than widening test_tsvector.

+1 on the patchset, marking this entry as Ready For Committer.

cheers ./daniel



lo_trig_test.diff
Description: Binary data


Re: Restore replication settings when modifying a field type

2020-03-05 Thread Peter Eisentraut

On 2020-02-11 00:38, Quan Zongliang wrote:

new patch attached.


I didn't like so much how the updating of the replica identity was 
hacked into the middle of ATRewriteCatalogs().  I have an alternative 
proposal in the attached patch that queues up an ALTER TABLE ... REPLICA 
IDENTITY command into the normal ALTER TABLE processing.  I have also 
added tests to the test suite.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 730a5dc46ec9722aa0416e5a85987c6385e6170c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 5 Mar 2020 10:11:13 +0100
Subject: [PATCH v4] Preserve replica identity index across ALTER TABLE rewrite

---
 src/backend/commands/tablecmds.c  | 42 +
 src/backend/utils/cache/lsyscache.c   | 23 ++
 src/include/utils/lsyscache.h |  1 +
 .../regress/expected/replica_identity.out | 46 +++
 src/test/regress/sql/replica_identity.sql | 21 +
 5 files changed, 133 insertions(+)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 7a13b97164..8edec9cbe7 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -176,6 +176,7 @@ typedef struct AlteredTableInfo
List   *changedConstraintDefs;  /* string definitions of same */
List   *changedIndexOids;   /* OIDs of indexes to rebuild */
List   *changedIndexDefs;   /* string definitions of same */
+   char   *replicaIdentityIndex;   /* index to reset as REPLICA 
IDENTITY */
 } AlteredTableInfo;
 
 /* Struct describing one new constraint to check in Phase 3 scan */
@@ -11564,6 +11565,22 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation 
rel,
return address;
 }
 
+/*
+ * Subroutine for ATExecAlterColumnType: remember that a replica identity
+ * needs to be reset.
+ */
+static void
+RememberReplicaIdentityForRebuilding(Oid indoid, AlteredTableInfo *tab)
+{
+   if (!get_index_is_replident(indoid))
+   return;
+
+   if (tab->replicaIdentityIndex)
+   elog(ERROR, "relation %u has multiple indexes marked as replica 
identity", tab->relid);
+
+   tab->replicaIdentityIndex = get_rel_name(indoid);
+}
+
 /*
  * Subroutine for ATExecAlterColumnType: remember that a constraint needs
  * to be rebuilt (which we might already know).
@@ -11582,11 +11599,16 @@ RememberConstraintForRebuilding(Oid conoid, 
AlteredTableInfo *tab)
{
/* OK, capture the constraint's existing definition string */
char   *defstring = pg_get_constraintdef_command(conoid);
+   Oid indoid;
 
tab->changedConstraintOids = 
lappend_oid(tab->changedConstraintOids,

 conoid);
tab->changedConstraintDefs = lappend(tab->changedConstraintDefs,

 defstring);
+
+   indoid = get_constraint_index(conoid);
+   if (OidIsValid(indoid))
+   RememberReplicaIdentityForRebuilding(indoid, tab);
}
 }
 
@@ -11629,6 +11651,8 @@ RememberIndexForRebuilding(Oid indoid, AlteredTableInfo 
*tab)

indoid);
tab->changedIndexDefs = lappend(tab->changedIndexDefs,

defstring);
+
+   RememberReplicaIdentityForRebuilding(indoid, tab);
}
}
 }
@@ -11737,6 +11761,24 @@ ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo 
*tab, LOCKMODE lockmode)
add_exact_object_address(, objects);
}
 
+   /*
+* Queue up command to restore replica identity index marking
+*/
+   if (tab->replicaIdentityIndex)
+   {
+   AlterTableCmd *cmd = makeNode(AlterTableCmd);
+   ReplicaIdentityStmt *subcmd = makeNode(ReplicaIdentityStmt);
+
+   subcmd->identity_type = REPLICA_IDENTITY_INDEX;
+   subcmd->name = tab->replicaIdentityIndex;
+   cmd->subtype = AT_ReplicaIdentity;
+   cmd->def = (Node *) subcmd;
+
+   /* do it after indexes and constraints */
+   tab->subcmds[AT_PASS_OLD_CONSTR] =
+   lappend(tab->subcmds[AT_PASS_OLD_CONSTR], cmd);
+   }
+
/*
 * It should be okay to use DROP_RESTRICT here, since nothing else 
should
 * be depending on these objects.
diff --git a/src/backend/utils/cache/lsyscache.c 
b/src/backend/utils/cache/lsyscache.c
index 3da90cb72a..f7b0c1408f 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ 

Re: backup manifests

2020-03-05 Thread tushar
There is one small observation if we use slash (/) with option -i then 
not getting the desired result


Steps to reproduce -
==

[centos@tushar-ldap-docker bin]$ ./pg_basebackup -D test

[centos@tushar-ldap-docker bin]$ touch test/*pg_notify*/dummy_file

--working
[centos@tushar-ldap-docker bin]$ ./pg_validatebackup   
--ignore=*pg_notify*  test
pg_validatebackup: * manifest_checksum = 
be9b72e1320c6c34c131533de19371a10dd5011940181724e43277f786026c7b

pg_validatebackup: backup successfully verified

--not working

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup   
--ignore=*pg_notify/*  test
pg_validatebackup: * manifest_checksum = 
be9b72e1320c6c34c131533de19371a10dd5011940181724e43277f786026c7b
pg_validatebackup: error: "pg_notify/dummy_file" is present on disk but 
not in the manifest


regards,

On 3/5/20 3:40 PM, tushar wrote:

Hi,

There is one scenario  where  i somehow able to run pg_validatebackup 
successfully but when i tried to start the server , it is failing


Steps to reproduce -
--create 2 base backup directory
[centos@tushar-ldap-docker bin]$ ./pg_basebackup -D db1
[centos@tushar-ldap-docker bin]$ ./pg_basebackup -D db2

--run pg_validatebackup , use backup_manifest of db1 directory 
against  db2/  . Will get an error
[centos@tushar-ldap-docker bin]$ ./pg_validatebackup -m 
db1/backup_manifest db2/
pg_validatebackup: * manifest_checksum = 
5b131aff4a4f86e2a53efd84b003a67b9f615decb0039f19033eefa6f43c1ede

pg_validatebackup: error: checksum mismatch for file "backup_label"
--copy the backup_level of db1 to db2 folder
[centos@tushar-ldap-docker bin]$ cp db1/backup_label db2/.

--run pg_validatebackup .. working fine
[centos@tushar-ldap-docker bin]$ ./pg_validatebackup -m 
db1/backup_manifest db2/
pg_validatebackup: * manifest_checksum = 
5b131aff4a4f86e2a53efd84b003a67b9f615decb0039f19033eefa6f43c1ede

pg_validatebackup: backup successfully verified
[centos@tushar-ldap-docker bin]$

--try to start the server
[centos@tushar-ldap-docker bin]$ ./pg_ctl -D db2 start -o '-p '
waiting for server to start2020-03-05 15:33:53.471 IST [24049] 
LOG:  starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by 
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2020-03-05 15:33:53.471 IST [24049] LOG:  listening on IPv6 address 
"::1", port 
2020-03-05 15:33:53.471 IST [24049] LOG:  listening on IPv4 address 
"127.0.0.1", port 
2020-03-05 15:33:53.473 IST [24049] LOG:  listening on Unix socket 
"/tmp/.s.PGSQL."
2020-03-05 15:33:53.476 IST [24050] LOG:  database system was 
interrupted; last known up at 2020-03-05 15:32:51 IST

2020-03-05 15:33:53.573 IST [24050] LOG:  invalid checkpoint record
2020-03-05 15:33:53.573 IST [24050] FATAL:  could not locate required 
checkpoint record
2020-03-05 15:33:53.573 IST [24050] HINT:  If you are restoring from a 
backup, touch 
"/home/centos/pg13_bk_mani/edb/edbpsql/bin/db2/recovery.signal" and 
add required recovery options.
    If you are not restoring from a backup, try removing the file 
"/home/centos/pg13_bk_mani/edb/edbpsql/bin/db2/backup_label".
    Be careful: removing 
"/home/centos/pg13_bk_mani/edb/edbpsql/bin/db2/backup_label" will 
result in a corrupt cluster if restoring from a backup.
2020-03-05 15:33:53.574 IST [24049] LOG:  startup process (PID 24050) 
exited with exit code 1
2020-03-05 15:33:53.574 IST [24049] LOG:  aborting startup due to 
startup process failure

2020-03-05 15:33:53.575 IST [24049] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.
[centos@tushar-ldap-docker bin]$

regards,


On 3/5/20 1:09 PM, Rajkumar Raghuwanshi wrote:

Hi,

In a negative test scenario, if I changed size to -1 in 
backup_manifest, pg_validatebackup giving

error with a random size number.

[edb@localhost bin]$ ./pg_basebackup -p 5551 -D /tmp/bold 
--manifest-checksum 'SHA256'

[edb@localhost bin]$ ./pg_validatebackup /tmp/bold
pg_validatebackup: backup successfully verified

--change a file size to -1 and generate new checksum.
[edb@localhost bin]$ vi /tmp/bold/backup_manifest
[edb@localhost bin]$ shasum -a256 /tmp/bold/backup_manifest
c3d7838cbbf991c6108f9c1ab78f673c20d8073114500f14da6ed07ede2dc44a 
 /tmp/bold/backup_manifest

[edb@localhost bin]$ vi /tmp/bold/backup_manifest

[edb@localhost bin]$ ./pg_validatebackup /tmp/bold
pg_validatebackup: error: "global/4183" has size 0 on disk but size 
*18446744073709551615* in the manifest


Thanks & Regards,
Rajkumar Raghuwanshi


On Thu, Mar 5, 2020 at 9:37 AM Suraj Kharage 
> wrote:



On Wed, Mar 4, 2020 at 7:21 PM tushar
mailto:tushar.ah...@enterprisedb.com>> wrote:

Hi,

There is a scenario in which i add something inside the
pg_tablespace directory , i am getting an error like-

pg_validatebackup: * manifest_checksum =
77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
pg_validatebackup: error:

Re: Some problems of recovery conflict wait events

2020-03-05 Thread Fujii Masao




On 2020/03/05 16:58, Masahiko Sawada wrote:

On Wed, 4 Mar 2020 at 15:21, Fujii Masao  wrote:




On 2020/03/04 14:31, Masahiko Sawada wrote:

On Wed, 4 Mar 2020 at 13:48, Fujii Masao  wrote:




On 2020/03/04 13:27, Michael Paquier wrote:

On Wed, Mar 04, 2020 at 01:13:19PM +0900, Masahiko Sawada wrote:

Yeah, so 0001 patch sets existing wait events to recovery conflict
resolution. For instance, it sets (PG_WAIT_LOCK | LOCKTAG_TRANSACTION)
to the recovery conflict on a snapshot. 0003 patch improves these wait
events by adding the new type of wait event such as
WAIT_EVENT_RECOVERY_CONFLICT_SNAPSHOT. Therefore 0001 (and 0002) patch
is the fix for existing versions and 0003 patch is an improvement for
only PG13. Did you mean even 0001 patch doesn't fit for back-patching?


Yes, it looks like a improvement rather than bug fix.



Okay, understand.


I got my eyes on this patch set.  The full patch set is in my opinion
just a set of improvements, and not bug fixes, so I would refrain from
back-backpatching.


I think that the issue (i.e., "waiting" is reported twice needlessly
in PS display) that 0002 patch tries to fix is a bug. So it should be
fixed even in the back branches.


So we need only two patches: one fixes process title issue and another
improve wait event. I've attached updated patches.


Thanks for updating the patches! I started reading 0001 patch.


Thank you for reviewing this patch.



-   /*
-* Report via ps if we have been waiting for more than 
500 msec
-* (should that be configurable?)
-*/
-   if (update_process_title && new_status == NULL &&
-   TimestampDifferenceExceeds(waitStart, 
GetCurrentTimestamp(),
-   
   500))

The patch changes ResolveRecoveryConflictWithSnapshot() and
ResolveRecoveryConflictWithTablespace() so that they always add
"waiting" into the PS display, whether wait is really necessary or not.
But isn't it better to display "waiting" in PS basically when wait is
necessary, like originally ResolveRecoveryConflictWithVirtualXIDs()
does as the above?


You're right. Will fix it.



   ResolveRecoveryConflictWithDatabase(Oid dbid)
   {
+   char*new_status = NULL;
+
+   /* Report via ps we are waiting */
+   new_status = set_process_title_waiting();

In  ResolveRecoveryConflictWithDatabase(), there seems no need to
display "waiting" in PS because no wait occurs when recovery conflict
with database happens.


Isn't the startup process waiting for other backend to terminate?


Yeah, you're right. I agree that "waiting" should be reported in this case.

Currently ResolveRecoveryConflictWithLock() and
ResolveRecoveryConflictWithBufferPin() don't call
ResolveRecoveryConflictWithVirtualXIDs and don't report "waiting"
in PS display. You changed them so that they report "waiting". I agree
to have this change. But this change is an improvement rather than
a bug fix, i.e., we should apply this change only in v13?

Of course, the other part in the patch, i.e., fixing the issue that
"waiting" is doubly reported, should be back-patched, I think,
though.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: Crash by targetted recovery

2020-03-05 Thread Fujii Masao




On 2020/03/05 12:08, Kyotaro Horiguchi wrote:

At Mon, 2 Mar 2020 20:54:04 +0900, Fujii Masao  
wrote in

And random access during StandbyMode ususally (always?) lets RecPtr go
back.  I'm not sure WaitForWALToBecomeAvailable works correctly if we
don't have a file in pg_wal and the REDO point is far back by more
than a segment from the initial checkpoint record.


It works correctly. This is why WaitForWALToBecomeAvailable() uses
fetching_ckpt argument.


Hmm. You're right. We start streaming from RedoStartLSN when
fetching_ckpt.  So that doesn't happen.


If we go back to XLOG_FROM_ARCHIVE by random access, it correctly
re-connects to the primary for the past segment.


But this can lead to unnecessary restart of walreceiver. Since
fetching_ckpt ensures that the WAL file containing the REDO
starting record exists in pg_wal, there is no need to reconnect
to the primary when reading the REDO starting record.

Is there other case where we need to go back to XLOG_FROM_ARCHIVE
by random access?


I understand that the reconnection for REDO record is useless. Ok I
take the !StandbyMode way.

The attached is the test script that is changed to count the added
test, and the slight revised main patch.


Thanks for the patch!

+   /* Wal receiver should not active when entring 
XLOG_FROM_ARCHIVE */
+   Assert(!WalRcvStreaming());

+1 to add this assertion check.

Isn't it better to always check this while trying to read WAL from
archive or pg_wal? So, what about the following change?

{
case XLOG_FROM_ARCHIVE:
case XLOG_FROM_PG_WAL:
+   /*
+* WAL receiver should not be running while 
trying to
+* read WAL from archive or pg_wal.
+*/
+   Assert(!WalRcvStreaming());
+
/* Close any old file we might have open. */
if (readFile >= 0)


+   lastSourceFailed = false; /* We haven't failed on the new 
source */

Is this really necessary? Since ReadRecord() always reset
lastSourceFailed to false, it seems not necessary.


-   else if (currentSource == 0)
+   else if (currentSource == 0 ||

Though this is a *separate topic*, 0 should be XLOG_FROM_ANY?
There are some places where 0 is used as the value of currentSource.
IMO they should be updated so that XLOG_FROM_ANY is used instead of 0.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-05 Thread Komяpa
Hi,

Thanks Laurenz for taking action on this and writing a better patch
than my initial.
This will help avoid both Mandrill-like downtimes and get Index Only
Scan just work on large telemetry databases like the one I was
responsible for back when I was in Juno.

On Thu, Mar 5, 2020 at 9:40 AM David Rowley  wrote:
>
> On Thu, 5 Mar 2020 at 04:15, Laurenz Albe  wrote:
> > I just realized that the exercise is pointless unless that
> > autovacuum also runs with FREEZE on.

> 8. Should we care when setting the insert counter back to 0 if
> auto-vacuum has skipped pages?

I believe it would be enough just to leave a comment about this in code.

> 10. I'm slightly worried about the case where we don't quite trigger a
> normal vacuum but trigger a vacuum due to INSERTs then skip cleaning
> up the indexes but proceed to leave dead index entries causing indexes
> to become bloated.  It does not seem impossible that given the right
> balance of INSERTs and UPDATE/DELETEs that this could happen every
> time and the indexes would just become larger and larger.

Can we not reset statistics about dead tuples upon index-skipping
vacuum, since we didn't really take care of them?

> 11. We probably do also need to debate if we want this on or off by
> default.   I'd have leaned towards enabling by default if I'd not
> personally witnessed the fact that people rarely* increase auto-vacuum
> to run faster than the standard cost settings. I've seen hundreds of
> servers over the years with all workers busy for days on something
> they'll never finish quickly enough.  We increased those settings 10x
> in PG12, so there will be fewer people around suffering from that now,
> but even after having reduced the vacuum_cost_delay x10 over the PG11
> settings, it's by no means fast enough for everyone.  I've mixed
> feelings about giving auto-vacuum more work to do for those people, so
> perhaps the best option is to keep this off by default so as not to
> affect the people who don't tune auto-vacuum.  They'll just suffer the
> pain all at once when they hit max freeze age instead of more
> gradually with the additional load on the workers.   At least adding
> this feature gives the people who do tune auto-vacuum some ability to
> handle read-only tables in some sane way.

That's exactly the situation we're trying to avoid with this patch.
Suffering all at once takes large production deployments down for
weeks, and that gets into news.
In current cloud setups it's plain impossible to read the whole
database at all, let alone rewrite, with IO budgets.
I say we should enable this setting by default.
If my calculations are correct, that will freeze the table once each
new gigabyte of data is written, which is usually fitting into burst
read thresholds.




-- 
Darafei Praliaskouski
Support me: http://patreon.com/komzpa




Re: backup manifests

2020-03-05 Thread tushar

Hi,

There is one scenario  where  i somehow able to run pg_validatebackup 
successfully but when i tried to start the server , it is failing


Steps to reproduce -
--create 2 base backup directory
[centos@tushar-ldap-docker bin]$ ./pg_basebackup -D db1
[centos@tushar-ldap-docker bin]$ ./pg_basebackup -D db2

--run pg_validatebackup , use backup_manifest of db1 directory against  
db2/  . Will get an error
[centos@tushar-ldap-docker bin]$ ./pg_validatebackup -m 
db1/backup_manifest db2/
pg_validatebackup: * manifest_checksum = 
5b131aff4a4f86e2a53efd84b003a67b9f615decb0039f19033eefa6f43c1ede

pg_validatebackup: error: checksum mismatch for file "backup_label"
--copy the backup_level of db1 to db2 folder
[centos@tushar-ldap-docker bin]$ cp db1/backup_label db2/.

--run pg_validatebackup .. working fine
[centos@tushar-ldap-docker bin]$ ./pg_validatebackup -m 
db1/backup_manifest db2/
pg_validatebackup: * manifest_checksum = 
5b131aff4a4f86e2a53efd84b003a67b9f615decb0039f19033eefa6f43c1ede

pg_validatebackup: backup successfully verified
[centos@tushar-ldap-docker bin]$

--try to start the server
[centos@tushar-ldap-docker bin]$ ./pg_ctl -D db2 start -o '-p '
waiting for server to start2020-03-05 15:33:53.471 IST [24049] LOG:  
starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc 
(GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2020-03-05 15:33:53.471 IST [24049] LOG:  listening on IPv6 address 
"::1", port 
2020-03-05 15:33:53.471 IST [24049] LOG:  listening on IPv4 address 
"127.0.0.1", port 
2020-03-05 15:33:53.473 IST [24049] LOG:  listening on Unix socket 
"/tmp/.s.PGSQL."
2020-03-05 15:33:53.476 IST [24050] LOG:  database system was 
interrupted; last known up at 2020-03-05 15:32:51 IST

2020-03-05 15:33:53.573 IST [24050] LOG:  invalid checkpoint record
2020-03-05 15:33:53.573 IST [24050] FATAL:  could not locate required 
checkpoint record
2020-03-05 15:33:53.573 IST [24050] HINT:  If you are restoring from a 
backup, touch 
"/home/centos/pg13_bk_mani/edb/edbpsql/bin/db2/recovery.signal" and add 
required recovery options.
    If you are not restoring from a backup, try removing the file 
"/home/centos/pg13_bk_mani/edb/edbpsql/bin/db2/backup_label".
    Be careful: removing 
"/home/centos/pg13_bk_mani/edb/edbpsql/bin/db2/backup_label" will result 
in a corrupt cluster if restoring from a backup.
2020-03-05 15:33:53.574 IST [24049] LOG:  startup process (PID 24050) 
exited with exit code 1
2020-03-05 15:33:53.574 IST [24049] LOG:  aborting startup due to 
startup process failure

2020-03-05 15:33:53.575 IST [24049] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.
[centos@tushar-ldap-docker bin]$

regards,


On 3/5/20 1:09 PM, Rajkumar Raghuwanshi wrote:

Hi,

In a negative test scenario, if I changed size to -1 in 
backup_manifest, pg_validatebackup giving

error with a random size number.

[edb@localhost bin]$ ./pg_basebackup -p 5551 -D /tmp/bold 
--manifest-checksum 'SHA256'

[edb@localhost bin]$ ./pg_validatebackup /tmp/bold
pg_validatebackup: backup successfully verified

--change a file size to -1 and generate new checksum.
[edb@localhost bin]$ vi /tmp/bold/backup_manifest
[edb@localhost bin]$ shasum -a256 /tmp/bold/backup_manifest
c3d7838cbbf991c6108f9c1ab78f673c20d8073114500f14da6ed07ede2dc44a 
 /tmp/bold/backup_manifest

[edb@localhost bin]$ vi /tmp/bold/backup_manifest

[edb@localhost bin]$ ./pg_validatebackup /tmp/bold
pg_validatebackup: error: "global/4183" has size 0 on disk but size 
*18446744073709551615* in the manifest


Thanks & Regards,
Rajkumar Raghuwanshi


On Thu, Mar 5, 2020 at 9:37 AM Suraj Kharage 
> wrote:



On Wed, Mar 4, 2020 at 7:21 PM tushar
mailto:tushar.ah...@enterprisedb.com>> wrote:

Hi,

There is a scenario in which i add something inside the
pg_tablespace directory , i am getting an error like-

pg_validatebackup: * manifest_checksum =
77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
pg_validatebackup: error:
"pg_tblspc/16385/*PG_13_202002271*/test" is present on disk
but not in the manifest

but if i remove 'PG_13_202002271 ' directory then there is no
error

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data
pg_validatebackup: * manifest_checksum =
77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
pg_validatebackup: backup successfully verified


This seems expected considering current design as we don't log the
directory entries in backup_manifest. In your case, you have
tablespace with no objects (empty tablespace) then backup_manifest
does not have any entry for this hence when you remove this
tablespace directory, validator could not detect it.

We can either document it or add the entry for directories in the
manifest. Robert may have a better 

  1   2   >