Re: [PATCH] initdb: do not exit after warn_on_mount_point

2022-09-11 Thread Tom Lane
Julien Rouhaud  writes:
> On Sun, Sep 11, 2022 at 06:22:21PM +, andrey.ara...@nixaid.com wrote:
>> Everyone using containerized postgresql image cannot use /var/lib/postgresql
>> as the mountpoint but has to use /var/lib/postgresql/data instead due to this
>> issue [4] due to [5].

> initdb had this behavior for almost 10 years, and for good reasons: it 
> prevents
> actual problems that were seen in the field.

The long and the short of this is that one person losing their data
outweighs thousands of people being very mildly inconvenienced by
having to create an extra level of directory.  I understand that you
don't think so, but you'd change your mind PDQ if it was *your* data
that got irretrievably corrupted.

We are not going to remove this check.

If anything, the fault I'd find with the existing code is that it's not
sufficiently thorough about detecting what is a mount point.  AFAICS,
neither the lost+found check nor the extra-dot-files check will trigger
on modern filesystems such as XFS.  I wonder if we could do something
like comparing the stat(2) st_dev numbers for the proposed data directory
vs. its parent directory.

regards, tom lane




Re: broken table formatting in psql

2022-09-11 Thread Pavel Stehule
po 12. 9. 2022 v 7:37 odesílatel John Naylor 
napsal:

> On Thu, Sep 8, 2022 at 12:51 PM Pavel Stehule 
> wrote:
> >
> >
> >> Does anyone want to advocate for backpatching these missing ranges to
> >> v12 and up? v12 still has a table in-line so trivial to remedy, but
> >> v13 and up use a script, so these exceptions would likely have to use
> >> hard-coded branches to keep from bringing in new changes.
> >>
> >> If so, does anyone want to advocate for including this patch in v15?
> >> It claims Unicode 14.0.0, and this would make that claim more
> >> technically correct as well as avoiding additional branches.
> >
> >
> > I think it can be fixed just in v15 and master.  This issue has no
> impact on SQL.
>
> Well, if the regressions from v11 are not important enough to
> backpatch, there is not as much of a case to backpatch the full fix to
> v15 either.
>

This is not a critical issue, really.  On second thought, I don't see the
point in releasing fresh Postgres with this bug, where there is know bugfix
- and this bugfix should be compatible (at this moment) with 16.

PostgreSQL 15 was not released yet.

Regards

Pavel

-- 
> John Naylor
> EDB: http://www.enterprisedb.com
>


Re: broken table formatting in psql

2022-09-11 Thread John Naylor
On Thu, Sep 8, 2022 at 12:51 PM Pavel Stehule  wrote:
>
>
>> Does anyone want to advocate for backpatching these missing ranges to
>> v12 and up? v12 still has a table in-line so trivial to remedy, but
>> v13 and up use a script, so these exceptions would likely have to use
>> hard-coded branches to keep from bringing in new changes.
>>
>> If so, does anyone want to advocate for including this patch in v15?
>> It claims Unicode 14.0.0, and this would make that claim more
>> technically correct as well as avoiding additional branches.
>
>
> I think it can be fixed just in v15 and master.  This issue has no impact on 
> SQL.

Well, if the regressions from v11 are not important enough to
backpatch, there is not as much of a case to backpatch the full fix to
v15 either.
-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: Handle infinite recursion in logical replication setup

2022-09-11 Thread vignesh C
On Fri, 9 Sept 2022 at 11:12, Amit Kapila  wrote:
>
> On Thu, Sep 8, 2022 at 9:32 AM vignesh C  wrote:
> >
> >
> > The attached patch has the changes to handle the same.
> >
>
> Pushed. I am not completely sure whether we want the remaining
> documentation patch in this thread in its current form or by modifying
> it. Johnathan has shown some interest in it. I feel you can start a
> separate thread for it to see if there is any interest in the same and
> close the CF entry for this work.

Thanks for pushing the patch. I have closed this entry in commitfest.
I will wait for some more time and see the response regarding the
documentation patch and then start a new thread if required.

Regards,
Vignesh




Re: [PATCH] initdb: do not exit after warn_on_mount_point

2022-09-11 Thread Julien Rouhaud
On Sun, Sep 11, 2022 at 06:22:21PM +, andrey.ara...@nixaid.com wrote:
> September 11, 2022 12:18 PM, "Julien Rouhaud"  wrote:
>
> > Hi,
> >
> > On Sat, Sep 10, 2022 at 07:14:59PM +, andrey.ara...@nixaid.com wrote:
> >
> >> Have slightly improved the logic so that it does not report an error
> >> "directory \"%s\" exists but is not empty"
> >> when it is only supposed to warn the user about the mountpoint, without
> >> exiting.
> >>
> >> To me, my patch looks like a typo fix where exit(1) should not be called on
> >> the warn_on_mount_point(), but only warn and continue as more people are
> >> mounting the device at `/var/lib/postgresql/data` (PGDATA) in the
> >> containerized world (K8s deployments, especially now in the Akash Network I
> >> am working for) for making sure their data persist.
> >
> > This definitely isn't a typo, as it's really strongly discouraged to use a
> > mount point for the data directory. You can refer to this thread [1] for 
> > more
> > explanations.
> >
> > [1]
> > https://www.postgresql.org/message-id/flat/CAKoxK+6H40imynM5P31bf0DnpN-5f5zeROjcaj6BKVAjxdEA9w@mail.
> > mail.com
>
>
> I've read the "why not using a mountpoint as PGDATA?" thread [1] as well as
> Bugzilla "postgresql-setup fails when /var/lib/pgsql/data is mount point"
> thead [2] but could not find any good reason why not to mount the PGDATA
> directly, except probably for the NFS mount point, but who does that anyway?

What about this part in Tom's original answer:

3. If, some day, the filesystem is accidentally unmounted while the database is
up, it will continue to write into files that are now getting placed in the
mount-point directory on the parent volume.  This usually results in an
unrecoverably messed-up database by the time you realize what's going wrong.
(There are horror stories about such cases in the PG community mailing list
archives, dating from before we installed the don't-use-a-mount-point defenses
in initdb.)

> Everyone using containerized postgresql image cannot use /var/lib/postgresql
> as the mountpoint but has to use /var/lib/postgresql/data instead due to this
> issue [4] due to [5].  Hence, everyone using containerized version of
> postgresql with the device (say Ceph's RBD) mounted over
> /var/lib/postgresql/data directory has to either specify:
>
> - PGDATA=/var/lib/postgresql/data/
>
> OR
>
> make sure to remove $PGDATA/lost+found directory.

initdb had this behavior for almost 10 years, and for good reasons: it prevents
actual problems that were seen in the field.

It's unfortunate that the docker postgres images didn't take that behavior into
account, which was introduced more than a year before any work started on those
images, but if you're not happy with those workarounds it's something that
should be changed in the docker images.

> Both of these hacks are only for the initdb to fail detect the mountpoint
> which, on its own, is supposed to be only a warning which is seen from the
> function name warn_on_mount_point() and its messages [6] look to be of only
> advisory nature to me.

As Tom confirmed at [1], you shouldn't assume anything about the criticality
based on the function name.  If anything, the "warn" part is only saying that
the function itself won't exit(1).  And yes this function is only adding
information on the reason why the given directory can't be used, but it doesn't
change the fact that the given directory can't be used.

[1] https://www.postgresql.org/message-id/813162.1662908...@sss.pgh.pa.us




Re: [PATCH] Clarify the comments about varlena header encoding

2022-09-11 Thread John Naylor
On Sun, Sep 11, 2022 at 5:06 PM Aleksander Alekseev
 wrote:
>
> Hi John,
>
> Many thanks for the feedback!
>
> > Or put the now-standard 0b in front.
>
> Good idea.

Now that I look at the results, though, it's distracting and not good
for readability. I'm not actually sure we need to do anything here,
but I am somewhat in favor of putting [un]aligned in parentheses, as
already discussed. Even there, in the first email you said:

> Also one can read this as "aligned, uncompressed
> data" (which again is wrong).

I'm not sure it rises to the level of "wrong", because a blob of bytes
immediately after an aligned uint32 is in fact aligned. The important
thing is: a zero byte is always either a padding byte or part of a
4-byte header, so it's the alignment of the header we really care
about.

> > I think the problem is ambiguity about what a "toast pointer" is. This 
> > comment:
> >
> > * struct varatt_external is a traditional "TOAST pointer", that is, the
>
> Right. The comment for varatt_external says that it IS a TOAST
> pointer.

Well, the word "traditional" is not very informative, but it is there.
And afterwards there is also varatt_indirect, varatt_expanded, and
varattrib_1b_e, which all mention "TOAST pointer".

> However the comments for varlena headers bit layout
> implicitly include it into a TOAST pointer, which contradicts the
> previous comments. I suggest we fix this ambiguity by explicitly
> enumerating the type tag in the comments for varlena headers.

- * 1000 1-byte length word, unaligned, TOAST pointer
+ * 0b1000 1-byte length word (unaligned), type tag, TOAST pointer

This is distracting from the point of this whole comment, which, I
will say again is: How to look at the first byte to determine what
kind of varlena we're looking at. There is no reason to mention the
type tag here, at all.

- * In TOAST pointers the va_tag field (see varattrib_1b_e) is used to discern
- * the specific type and length of the pointer datum.
+ * For the TOAST pointers the type tag (see varattrib_1b_e.va_tag field) is
+ * used to discern the specific type and length of the pointer datum.

I don't think this clarifies anything, it's just a rephrasing.

More broadly, I think the description of varlenas in this header is at
a kind of "local maximum" -- minor adjustments are more likely to make
it worse. To significantly improve clarity might require a larger
rewriting, but I'm not personally interested in taking part in that.

-- 
John Naylor
EDB: http://www.enterprisedb.com




RE: why can't a table be part of the same publication as its schema

2022-09-11 Thread houzj.f...@fujitsu.com
On Monday, September 12, 2022 1:08 AM Mark Dilger 
 wrote:
> > > On Sep 10, 2022, at 4:17 PM, Robert Haas  wrote:
> >
> >>> I don't understand why we
> >>> used this ALL TABLES IN SCHEMA language.
> >>
> >> The conversation, as I recall, was that "ADD SCHEMA foo" would only mean
> all tables in foo, until publication of other object types became supported, 
> at
> which point "ADD SCHEMA foo" would suddenly mean more than it did before.
> People might find that surprising, so the "ALL TABLES IN" was intended to
> future-proof against surprising behavioral changes.
> >
> > If I encountered this syntax in a vacuum, that's not what I would
> > think. I would think that ADD ALL TABLES IN SCHEMA meant add all the
> > tables in the schema to the publication one by one as individual
> > objects
> 
> Yes, it appears the syntax was chosen to avoid one kind of confusion, but 
> created
> another kind.  Per the docs on this feature:
> 
>   FOR ALL TABLES IN SCHEMA
>   Marks the publication as one that replicates changes for all tables in the
> specified list of schemas, including tables created in the future.
> 
> Like you, I wouldn't expect that definition, given the behavior of GRANT with
> respect to the same grammatical construction.

I'm a bit unsure if it should be compared to GRANT. Because even if we chose
"ALTER PUBLICATION p1 { ADD | DROP } SCHEMA name", it's also not
consistent with the meaning of GRANT ON SCHEMA, as GRANT ON SCHEMA doesn't
grant rights on the tables within schema if I understand correctly.

I feel we'd better compare the syntax with the existing publication command:
FOR ALL TABLES. If you create a publication FOR ALL TABLES, it means publishing
all the tables in the database *including* tables created in the future. I
think both the syntax and meaning of ALL TABLES IN SCHEMA are consistent with
the existing FOR ALL TABLES.

And the behavior is clearly documented, so personally I think it's fine.
https://www.postgresql.org/docs/devel/sql-createpublication.html
--
FOR ALL TABLES
Marks the publication as one that replicates changes for all tables in 
the database, including tables created in the future.
FOR ALL TABLES IN SCHEMA
Marks the publication as one that replicates changes for all tables in 
the specified list of schemas, including tables created in the future.
--

Besides, as mentioned(and suggested by Tom[1]), we might support publishing
SEQUENCE(or others) in the future. It would give more flexibility to user if we
have another FOR ALL SEQUENCES(or other objects) IN SCHEMA.

[1] https://www.postgresql.org/message-id/155565.1628954580%40sss.pgh.pa.us

Best regards,
Hou zj


Re: list of acknowledgments for PG15

2022-09-11 Thread Etsuro Fujita
On Thu, Sep 8, 2022 at 9:13 PM Peter Eisentraut
 wrote:
> Attached is the plain-text list of acknowledgments for the PG15 release
> notes, current through REL_15_BETA4.  Please check for problems such as
> wrong sorting, duplicate names in different variants, or names in the
> wrong order etc.  (Note that the current standard is given name followed
> by surname, independent of cultural origin.)

Thanks as usual!

I think these are Japanese names that are in the
surname-followed-by-given-name order:

Kamigishi Rei
Kawamoto Masaya
Okano Naoki

Best regards,
Etsuro Fujita




Re: Switching XLog source from archive to streaming when primary available

2022-09-11 Thread Bharath Rupireddy
On Sat, Sep 10, 2022 at 3:35 AM Nathan Bossart  wrote:
>
> Well, for wal_keep_size, using bytes makes sense.  Given you know how much
> disk space you have, you can set this parameter accordingly to avoid
> retaining too much of it for standby servers.  For your proposed parameter,
> it's not so simple.  The same setting could have wildly different timing
> behavior depending on the server.  I still think that a timeout is the most
> intuitive.

Hm. In v3 patch, I've used the timeout approach, but tracking the
duration server spends in XLOG_FROM_ARCHIVE as opposed to tracking
last failed time in streaming from primary.

> So I think it
> would be difficult for the end user to decide on a value.  However, even
> the timeout approach has this sort of problem.  If your parameter is set to
> 1 minute, but the current archive takes 5 minutes to recover, you won't
> really be testing streaming replication once a minute.  That would likely
> need to be documented.

Added a note in the docs.

On Fri, Sep 9, 2022 at 10:46 AM Kyotaro Horiguchi
 wrote:
>
> Being late for the party.

Thanks for reviewing this.

> It seems to me that the function is getting too long.  I think we
> might want to move the core part of the patch into another function.

Yeah, the WaitForWALToBecomeAvailable() (without this patch) has
around 460 LOC out of which WAL fetching from the chosen source is of
240 LOC, IMO, this code will be a candidate for a new function. I
think that part can be discussed separately.

Having said that, I moved the new code to a new function.

> I think it might be better if intentionalSourceSwitch doesn't need
> lastSourceFailed set. It would look like this:
>
> >  if (lastSourceFailed || switchSource)
> >  {
> > if (nonblocking && lastSourceFailed)
> >return XLREAD_WOULDBLOCK;

I think the above looks good, done that way in the latest patch.

> I don't think the flag first_time is needed.

Addressed this in the v4 patch.

Please review the attached v4 patch addressing above review comments.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v4-0001-Allow-standby-to-switch-WAL-source-from-archive-t.patch
Description: Binary data


Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-11 Thread Michael Paquier
On Mon, Sep 12, 2022 at 02:34:48PM +1200, Thomas Munro wrote:
> On Mon, Sep 12, 2022 at 2:27 PM Justin Pryzby  wrote:
>> Yeah ... I just realized that I've already forgotten the relevant
>> chronology.
>>
>> The io_concurrency bugfix wasn't included in 15b4, so (if I understood
>> you correctly), that might explain these symptoms - right ?
> 
> Yeah.

Could you double-check if the issues you are seeing persist after
syncing up with the latest point of REL_15_STABLE?  For now, I have
added an open item just to be on the safe side.
--
Michael


signature.asc
Description: PGP signature


Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-11 Thread Thomas Munro
On Mon, Sep 12, 2022 at 2:27 PM Justin Pryzby  wrote:
> On Mon, Sep 12, 2022 at 02:25:48PM +1200, Thomas Munro wrote:
> > On Mon, Sep 12, 2022 at 1:42 PM Justin Pryzby  wrote:
> > > But yesterday I started from initdb and restored this cluster from 
> > > backup, and
> > > started up sqlsmith, and sent some kill -9, and now got more corruption.
> > > Looks like it took ~10 induced crashes before this happened.
> >
> > $SUBJECT says 15b4, which doesn't have the fix.  Are you still using
> > maintainance_io_concurrent=0?
>
> Yeah ... I just realized that I've already forgotten the relevant
> chronology.
>
> The io_concurrency bugfix wasn't included in 15b4, so (if I understood
> you correctly), that might explain these symptoms - right ?

Yeah.




Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-11 Thread Justin Pryzby
On Mon, Sep 12, 2022 at 02:25:48PM +1200, Thomas Munro wrote:
> On Mon, Sep 12, 2022 at 1:42 PM Justin Pryzby  wrote:
> > On Mon, Sep 12, 2022 at 10:44:38AM +1200, Thomas Munro wrote:
> > > On Sat, Sep 10, 2022 at 5:44 PM Justin Pryzby  
> > > wrote:
> > > > < 2022-09-09 19:37:25.835 CDT telsasoft >ERROR:  MultiXactId 133553154 
> > > > has not been created yet -- apparent wraparound
> > >
> > > I guess what happened here is that after one of your (apparently
> > > several?) OOM crashes, crash recovery didn't run all the way to the
> > > true end of the WAL due to the maintenance_io_concurrency=0 bug.  In
> > > the case you reported, it couldn't complete an end-of-recovery
> > > checkpoint until you disabled recovery_prefetch, but that's only
> > > because of the somewhat unusual way that vismap pages work.  In
> > > another case it might have been able to (bogusly) complete a
> > > checkpoint, leaving things in an inconsistent state.
> >
> > I think you're saying is that this can be explained by the
> > io_concurrency bug in recovery_prefetch, if run under 15b3.
> 
> Well I don't know, but it's one way I could think of that you could
> have a data page referring to a multixact that isn't on disk after
> recovery (because the data page happens to have been flushed, but we
> didn't replay the WAL that would create the multixact).
> 
> > But yesterday I started from initdb and restored this cluster from backup, 
> > and
> > started up sqlsmith, and sent some kill -9, and now got more corruption.
> > Looks like it took ~10 induced crashes before this happened.
> 
> $SUBJECT says 15b4, which doesn't have the fix.  Are you still using
> maintainance_io_concurrent=0?

Yeah ... I just realized that I've already forgotten the relevant
chronology.

The io_concurrency bugfix wasn't included in 15b4, so (if I understood
you correctly), that might explain these symptoms - right ?

-- 
Justin




Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-11 Thread Thomas Munro
On Mon, Sep 12, 2022 at 1:42 PM Justin Pryzby  wrote:
> On Mon, Sep 12, 2022 at 10:44:38AM +1200, Thomas Munro wrote:
> > On Sat, Sep 10, 2022 at 5:44 PM Justin Pryzby  wrote:
> > > < 2022-09-09 19:37:25.835 CDT telsasoft >ERROR:  MultiXactId 133553154 
> > > has not been created yet -- apparent wraparound
> >
> > I guess what happened here is that after one of your (apparently
> > several?) OOM crashes, crash recovery didn't run all the way to the
> > true end of the WAL due to the maintenance_io_concurrency=0 bug.  In
> > the case you reported, it couldn't complete an end-of-recovery
> > checkpoint until you disabled recovery_prefetch, but that's only
> > because of the somewhat unusual way that vismap pages work.  In
> > another case it might have been able to (bogusly) complete a
> > checkpoint, leaving things in an inconsistent state.
>
> I think you're saying is that this can be explained by the
> io_concurrency bug in recovery_prefetch, if run under 15b3.

Well I don't know, but it's one way I could think of that you could
have a data page referring to a multixact that isn't on disk after
recovery (because the data page happens to have been flushed, but we
didn't replay the WAL that would create the multixact).

> But yesterday I started from initdb and restored this cluster from backup, and
> started up sqlsmith, and sent some kill -9, and now got more corruption.
> Looks like it took ~10 induced crashes before this happened.

$SUBJECT says 15b4, which doesn't have the fix.  Are you still using
maintainance_io_concurrent=0?




Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-11 Thread Peter Geoghegan
On Sun, Sep 11, 2022 at 6:42 PM Justin Pryzby  wrote:
> I think you're saying is that this can be explained by the
> io_concurrency bug in recovery_prefetch, if run under 15b3.
>
> But yesterday I started from initdb and restored this cluster from backup, and
> started up sqlsmith, and sent some kill -9, and now got more corruption.
> Looks like it took ~10 induced crashes before this happened.

Have you tested fsync on the system?

The symptoms here are all over the place. This assertion failure seems
like a pretty good sign that the problems happen during recovery, or
because basic guarantees needed by for crash safety aren't met:

> #2  0x00962c5c in ExceptionalCondition 
> (conditionName=conditionName@entry=0x9ce238 "P_ISLEAF(opaque) && 
> !P_ISDELETED(opaque)", errorType=errorType@entry=0x9bad97 "FailedAssertion",
> fileName=fileName@entry=0x9cdcd1 "nbtpage.c", 
> lineNumber=lineNumber@entry=1778) at assert.c:69
> #3  0x00507e34 in _bt_rightsib_halfdeadflag 
> (rel=rel@entry=0x7f4138a238a8, leafrightsib=leafrightsib@entry=53) at 
> nbtpage.c:1778
> #4  0x00507fba in _bt_mark_page_halfdead 
> (rel=rel@entry=0x7f4138a238a8, leafbuf=leafbuf@entry=13637, 
> stack=stack@entry=0x144ca20) at nbtpage.c:2121

This shows that the basic rules for page deletion have somehow
seemingly been violated. It's as if a page deletion went ahead, but
didn't work as an atomic operation -- there were some lost writes for
some but not all pages. Actually, it looks like a mix of states from
before and after both the first and the second phases of page deletion
-- so not just one atomic operation.

-- 
Peter Geoghegan




Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-11 Thread Justin Pryzby
On Mon, Sep 12, 2022 at 10:44:38AM +1200, Thomas Munro wrote:
> On Sat, Sep 10, 2022 at 5:44 PM Justin Pryzby  wrote:
> > < 2022-09-09 19:37:25.835 CDT telsasoft >ERROR:  MultiXactId 133553154 has 
> > not been created yet -- apparent wraparound
> 
> I guess what happened here is that after one of your (apparently
> several?) OOM crashes, crash recovery didn't run all the way to the
> true end of the WAL due to the maintenance_io_concurrency=0 bug.  In
> the case you reported, it couldn't complete an end-of-recovery
> checkpoint until you disabled recovery_prefetch, but that's only
> because of the somewhat unusual way that vismap pages work.  In
> another case it might have been able to (bogusly) complete a
> checkpoint, leaving things in an inconsistent state.

I think you're saying is that this can be explained by the
io_concurrency bug in recovery_prefetch, if run under 15b3.

But yesterday I started from initdb and restored this cluster from backup, and
started up sqlsmith, and sent some kill -9, and now got more corruption.
Looks like it took ~10 induced crashes before this happened.

At the moment, I have no reason to believe this issue is related to
prefetch_recovery; I am wondering about changes to vacuum.

< 2022-09-11 20:19:03.071 CDT telsasoft >ERROR:  MultiXactId 732646 has not 
been created yet -- apparent wraparound
< 2022-09-11 20:24:00.530 CDT telsasoft >ERROR:  MultiXactId 732646 has not 
been created yet -- apparent wraparound

Program terminated with signal 6, Aborted.
#0  0x7f413716b1f7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install 
glibc-2.17-196.el7_4.2.x86_64 libgcc-4.8.5-44.el7.x86_64 
libxml2-2.9.1-6.el7_9.6.x86_64 lz4-1.8.3-1.el7.x86_64 
xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0  0x7f413716b1f7 in raise () from /lib64/libc.so.6
#1  0x7f413716c8e8 in abort () from /lib64/libc.so.6
#2  0x00962c5c in ExceptionalCondition 
(conditionName=conditionName@entry=0x9ce238 "P_ISLEAF(opaque) && 
!P_ISDELETED(opaque)", errorType=errorType@entry=0x9bad97 "FailedAssertion", 
fileName=fileName@entry=0x9cdcd1 "nbtpage.c", 
lineNumber=lineNumber@entry=1778) at assert.c:69
#3  0x00507e34 in _bt_rightsib_halfdeadflag 
(rel=rel@entry=0x7f4138a238a8, leafrightsib=leafrightsib@entry=53) at 
nbtpage.c:1778
#4  0x00507fba in _bt_mark_page_halfdead (rel=rel@entry=0x7f4138a238a8, 
leafbuf=leafbuf@entry=13637, stack=stack@entry=0x144ca20) at nbtpage.c:2121
#5  0x0050af1d in _bt_pagedel (rel=rel@entry=0x7f4138a238a8, 
leafbuf=leafbuf@entry=13637, vstate=vstate@entry=0x7ffef18de8b0) at 
nbtpage.c:2004
#6  0x0050c996 in btvacuumpage (vstate=vstate@entry=0x7ffef18de8b0, 
scanblkno=scanblkno@entry=36) at nbtree.c:1342
#7  0x0050caf8 in btvacuumscan (info=info@entry=0x7ffef18deac0, 
stats=stats@entry=0x142fb70, callback=callback@entry=0x67e89b , 
callback_state=callback_state@entry=0x1461220, cycleid=)
at nbtree.c:997
#8  0x0050cc2f in btbulkdelete (info=0x7ffef18deac0, stats=0x142fb70, 
callback=0x67e89b , callback_state=0x1461220) at nbtree.c:801
#9  0x004fc64b in index_bulk_delete (info=info@entry=0x7ffef18deac0, 
istat=istat@entry=0x0, callback=callback@entry=0x67e89b , 
callback_state=callback_state@entry=0x1461220) at indexam.c:701
#10 0x0068108c in vac_bulkdel_one_index 
(ivinfo=ivinfo@entry=0x7ffef18deac0, istat=istat@entry=0x0, 
dead_items=0x1461220) at vacuum.c:2324
#11 0x004f72ae in lazy_vacuum_one_index (indrel=, 
istat=0x0, reltuples=, vacrel=vacrel@entry=0x142f100) at 
vacuumlazy.c:2726
#12 0x004f738b in lazy_vacuum_all_indexes 
(vacrel=vacrel@entry=0x142f100) at vacuumlazy.c:2328
#13 0x004f75df in lazy_vacuum (vacrel=vacrel@entry=0x142f100) at 
vacuumlazy.c:2261
#14 0x004f7f14 in lazy_scan_heap (vacrel=vacrel@entry=0x142f100) at 
vacuumlazy.c:1264
#15 0x004f895f in heap_vacuum_rel (rel=0x7f4138a67c00, 
params=0x143cbec, bstrategy=0x143ea20) at vacuumlazy.c:534
#16 0x0067f62b in table_relation_vacuum (bstrategy=, 
params=0x143cbec, rel=0x7f4138a67c00) at 
../../../src/include/access/tableam.h:1680
#17 vacuum_rel (relid=1249, relation=, 
params=params@entry=0x143cbec) at vacuum.c:2086
#18 0x0068065c in vacuum (relations=0x144a118, 
params=params@entry=0x143cbec, bstrategy=, 
bstrategy@entry=0x143ea20, isTopLevel=isTopLevel@entry=true) at vacuum.c:475
#19 0x00796a0e in autovacuum_do_vac_analyze (tab=tab@entry=0x143cbe8, 
bstrategy=bstrategy@entry=0x143ea20) at autovacuum.c:3149
#20 0x007987bf in do_autovacuum () at autovacuum.c:2472
#21 0x00798e72 in AutoVacWorkerMain (argc=argc@entry=0, 
argv=argv@entry=0x0) at autovacuum.c:1715
#22 0x00798eed in StartAutoVacWorker () at autovacuum.c:1493
#23 0x0079fe49 in StartAutovacuumWorker () at postmaster.c:5534
#24 0x007a0c44 in sigusr1_handler (postgres_signal_arg=) 
at postmaster.c:5239
#25 
#26 

Re: Background writer and checkpointer in crash recovery

2022-09-11 Thread Justin Pryzby
On Tue, Aug 03, 2021 at 02:19:22PM +1200, Thomas Munro wrote:
> On Tue, Aug 3, 2021 at 9:52 AM Thomas Munro  wrote:
> > On Tue, Aug 3, 2021 at 1:17 AM Robert Haas  wrote:
> > > That's great. I just realized that this leaves us with identical
> > > RequestCheckpoint() calls in two nearby places. Is there any reason
> > > not to further simplify as in the attached?
> >
> > LGTM.
> 
> And pushed.

Gripe: this made the "ps" display worse than before.

7ff23c6d2 Run checkpointer and bgwriter in crash recovery.

A couple years ago, I complained that during the end-of-recovery
checkpoint, the "ps" display still said "recovering N", which made
it look like it was stuck on a particular WAL file.

That led to commit df9274adf, which updated the startup process's "ps"
to say "end-of-recovery checkpoint".

But since the start process no longer does the checkpoint, it still
says:

postgres 19738 11433  5 19:33 ?00:00:01 postgres: startup recovering 
0001000C00FB
postgres 19739 11433  3 19:33 ?00:00:00 postgres: checkpointer 
performing end-of-recovery checkpoint

That looks inconsistent.  It'd be better if the startup process's "ps"
were cleared.

-- 
Justin




Re: Bump MIN_WINNT to 0x0600 (Vista) as minimal runtime in 16~

2022-09-11 Thread Michael Paquier
On Mon, Sep 12, 2022 at 09:42:25AM +1200, Thomas Munro wrote:
> When looking the function up it made sense to use the name ending in
> '...A', but when calling directly I think we shouldn't use the A
> suffix, we should let the  macros do that for us[1].  (I
> wondered for a moment if that would even make Windows and Unix code
> the same, but sadly not due to the extra NULL arguments.)

Good idea, I did not noticed this part.  This should work equally, so
done this way and applied.  I am keeping an eye on the buildfarm.
--
Michael


signature.asc
Description: PGP signature


Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-11 Thread Thomas Munro
On Sat, Sep 10, 2022 at 5:44 PM Justin Pryzby  wrote:
> < 2022-09-09 19:37:25.835 CDT telsasoft >ERROR:  MultiXactId 133553154 has 
> not been created yet -- apparent wraparound

I guess what happened here is that after one of your (apparently
several?) OOM crashes, crash recovery didn't run all the way to the
true end of the WAL due to the maintenance_io_concurrency=0 bug.  In
the case you reported, it couldn't complete an end-of-recovery
checkpoint until you disabled recovery_prefetch, but that's only
because of the somewhat unusual way that vismap pages work.  In
another case it might have been able to (bogusly) complete a
checkpoint, leaving things in an inconsistent state.




Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-11 Thread Thomas Munro
On Sat, Sep 10, 2022 at 5:01 PM Justin Pryzby  wrote:
> BTW, after a number of sigabrt's, I started seeing these during
> recovery:
>
> < 2022-09-09 19:44:04.180 CDT  >LOG:  unexpected pageaddr 1214/AF0FE000 in 
> log segment 0001121400B4, offset 1040384
> < 2022-09-09 23:20:50.830 CDT  >LOG:  unexpected pageaddr 1214/CF65C000 in 
> log segment 0001121400D8, offset 6668288

That's just what it looks like when we discover the end of the WAL by
hitting a page that hasn't been overwritten yet in a recycled WAL
segment, so the pageaddr is off by a multiple of 16MB.  Depending on
timing and chance you might be more used to seeing the error where we
hit zeroes in a partially filled page, the famous 'wanted 24, got 0',
and you can also hit a fully zero-initialised page 'invalid magic
number '.  All of these are expected, and more exotic errors are
possible with power loss torn writes or on crash of a streaming
standbys where we currently fail to zero the rest of overwritten
pages.




Re: Splitting up guc.c

2022-09-11 Thread Tom Lane
Here's a v3 that gets rid of guc_hooks.c in favor of moving the
hook functions to related modules (though some did end up in
variables.c for lack of a better idea).  I also pushed all the
hook function declarations to guc_hooks.h.  Unsurprisingly,
removal of guc.h #includes from header files led to discovery
of some surprising indirect dependencies, notably a lot of places
were evidently depending on indirect inclusions of array.h.

I think this is code-complete at this point.  I'd like to not
sit on it too long, because it'll inevitably get side-swiped
by additions of new GUCs.  On the other hand, pushing it in
the middle of a CF would presumably break other people's patches.
Maybe push it at the end of this CF, to give people a month to
rebase anything that's affected?

regards, tom lane



split-up-guc-code-3.patch.gz
Description: split-up-guc-code-3.patch.gz


Re: Bump MIN_WINNT to 0x0600 (Vista) as minimal runtime in 16~

2022-09-11 Thread Thomas Munro
On Sun, Sep 11, 2022 at 12:29 PM Michael Paquier  wrote:
> On Fri, Sep 09, 2022 at 08:11:09PM +0900, Michael Paquier wrote:
> > Based on what I can see, the Windows animals seem to have digested
> > 47bd0b3 (cygwin, MinGW and MSVC), so I think that we are good.

Great, that's a lot of nice cleanup.

> The last part that's worth adjusting is ldap_start_tls_sA(), which
> would lead to the attached simplification.

-if ((r = _ldap_start_tls_sA(*ldap, NULL, NULL, NULL, NULL))
!= LDAP_SUCCESS)
+if ((r = ldap_start_tls_sA(*ldap, NULL, NULL, NULL, NULL)) !=
LDAP_SUCCESS)

When looking the function up it made sense to use the name ending in
'...A', but when calling directly I think we shouldn't use the A
suffix, we should let the  macros do that for us[1].  (I
wondered for a moment if that would even make Windows and Unix code
the same, but sadly not due to the extra NULL arguments.)

[1] 
https://docs.microsoft.com/en-us/windows/win32/api/winldap/nf-winldap-ldap_start_tls_sa




Re: New docs chapter on Transaction Management and related changes

2022-09-11 Thread Robert Treat
On Wed, Sep 7, 2022 at 8:05 AM Simon Riggs  wrote:
>
> On Tue, 6 Sept 2022 at 21:33, Justin Pryzby  wrote:
> >
> > On Tue, Sep 06, 2022 at 04:16:02PM +0100, Simon Riggs wrote:
> > > New chapter on transaction management, plus a few related changes.
> > >
> > > Markup and links are not polished yet, so please comment initially on
> > > the topics, descriptions and wording.
> >
> I've also added further notes about prepared transactions.
>
> I attach a diff against the original patch, plus a new clean copy.
>

Some feedback on the v4 patch, hopefully useful.

+
+Transactions may be started explicitly using BEGIN and COMMIT
commands, known as
+a transaction block. A transaction will also be started and ended
implicitly for
+each request when outside of a transaction block.
+

Transactions may be managed explicitly using BEGIN and COMMIT commands, known as
a transaction block.


+Committed subtransactions are recorded as committed if the main transaction
+commits.  The word subtransaction is often abbreviated to "subxact".
+

Committed subtransactions are only recorded as committed if the main
transaction commits,
otherwise any work done in a subtransaction will be rolled back or
aborted. The word subtransaction is
often abbreviated as "subxact".

+
+Subtransactions may be started explicitly by using the SAVEPOINT
command, but may
+also be started in other ways, such as PL/pgSQL's EXCEPTION clause.
PL/Python and
+PL/TCL also support explicit subtransactions. Working with the C API, users may
+also call BeginInternalSubTransaction().
+

I think this paragraph (or something similar) should be the opening
paragraph for this section, so that readers are immediately given
context for what PostgreSQL considers to be a subtransaction.


+prepared transactions that were prepared before the last checkpoint.
In the typical
+case, shorter-lived prepared transactions are stored only in shared
memory and WAL.
+Currently-prepared transactions can be inspected using the
pg_prepared_xacts view.
+

Transactions that are currently prepared can be inspected using the
pg_prepated_xacts view.

* I thought the hyphenated wording looked odd, though I understand why
you used it. We don't use it elsewhere though (just `currently
prepared` san hyphen) so re-worded to match the other wording.


Robert Treat
https://xzilla.net




Re: Splitting up guc.c

2022-09-11 Thread Tom Lane
I wrote:
> Michael Paquier  writes:
>> One part that I have found a bit strange lately about guc.c is that we
>> have mix the core machinery with the SQL-callable parts.  What do you
>> think about the addition of a gucfuncs.c in src/backend/utils/adt/ to
>> split things a bit more?

> I might be wrong, but I think the SQL-callable stuff makes use
> of some APIs that are currently private in guc.c.  So we'd have
> to expose more API to make that possible.  Maybe that wouldn't
> be a bad thing, but it seems to be getting beyond the original
> idea here.

I tried this just to see, and it worked out better than I thought.
The key extra idea is to also pull out the functions implementing
the SET and SHOW commands, because (unsurprisingly) those are just
about in the same place dependency-wise as the SQL functions, and
they have some common subroutines.

I had to export get_config_unit_name(), config_enum_get_options(),
and _ShowOption() (here renamed to ShowGUCOption()) to make this
work.  That doesn't seem too awful.

v2 attached does this, without any further relocation of hook
functions as yet.  I now see these file sizes:

$ wc guc*c
  2629   9372  69467 guc-file.c
  6425  22282 176816 guc.c
  1048   3005  26962 guc_funcs.c
   939   2693  22915 guc_hooks.c
  4877  13163 126769 guc_tables.c
 15918  50515 422929 total
$ size guc*o
   textdata bss dec hex filename
  13653   4 112   1376935c9 guc-file.o
  46589   0 564   47153b831 guc.o
   8509   0   08509213d guc_funcs.o
   6951   0 11270631b97 guc_hooks.o
  43570   62998 216  106784   1a120 guc_tables.o

So this removes just about a thousand more lines from guc.c,
which seems worth doing.

regards, tom lane



split-up-guc-code-2.patch.gz
Description: split-up-guc-code-2.patch.gz


Re: why can't a table be part of the same publication as its schema

2022-09-11 Thread Mark Dilger



> On Sep 10, 2022, at 4:17 PM, Robert Haas  wrote:
> 
>>> I don't understand why we
>>> used this ALL TABLES IN SCHEMA language.
>> 
>> The conversation, as I recall, was that "ADD SCHEMA foo" would only mean all 
>> tables in foo, until publication of other object types became supported, at 
>> which point "ADD SCHEMA foo" would suddenly mean more than it did before.  
>> People might find that surprising, so the "ALL TABLES IN" was intended to 
>> future-proof against surprising behavioral changes.
> 
> If I encountered this syntax in a vacuum, that's not what I would
> think. I would think that ADD ALL TABLES IN SCHEMA meant add all the
> tables in the schema to the publication one by one as individual
> objects

Yes, it appears the syntax was chosen to avoid one kind of confusion, but 
created another kind.  Per the docs on this feature:

  FOR ALL TABLES IN SCHEMA
  Marks the publication as one that replicates changes for all tables in the 
specified list of schemas, including tables created in the future.

Like you, I wouldn't expect that definition, given the behavior of GRANT with 
respect to the same grammatical construction.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [PATCH] initdb: do not exit after warn_on_mount_point

2022-09-11 Thread Tom Lane
Julien Rouhaud  writes:
> On Sat, Sep 10, 2022 at 07:14:59PM +, andrey.ara...@nixaid.com wrote:
>> Have slightly improved the logic so that it does not report an error
>> "directory \"%s\" exists but is not empty"
>> when it is only supposed to warn the user about the mountpoint, without
>> exiting.

> This definitely isn't a typo, as it's really strongly discouraged to use a
> mount point for the data directory.

Absolutely.  I think maybe the problem here is that warn_on_mount_point()
is a pretty bad name for that helper function, as this is not "just a
warning".

regards, tom lane




Check SubPlan clause for nonnullable rels/Vars

2022-09-11 Thread Richard Guo
Hi hackers,

While wandering around the codes of reducing outer joins, I noticed that
when determining which base rels/Vars are forced nonnullable by given
clause, we don't take SubPlan into consideration. Does anyone happen to
know what is the concern behind that?

IMO, for SubPlans of type ALL/ANY/ROWCOMPARE, we should be able to find
additional nonnullable rels/Vars by descending through their testexpr.
As we know, ALL_SUBLINK/ANY_SUBLINK combine results across tuples
produced by the subplan using AND/OR semantics. ROWCOMPARE_SUBLINK
doesn't allow multiple tuples from the subplan. So we can tell whether
the subplan is strict or not by checking its testexpr, leveraging the
existing codes in find_nonnullable_rels/vars_walker. Below is an
example:

# explain (costs off)
select * from a left join b on a.i = b.i where b.i = ANY (select i from c
where c.j = b.j);
QUERY PLAN
---
 Hash Join
   Hash Cond: (b.i = a.i)
   ->  Seq Scan on b
 Filter: (SubPlan 1)
 SubPlan 1
   ->  Seq Scan on c
 Filter: (j = b.j)
   ->  Hash
 ->  Seq Scan on a
(9 rows)

BTW, this change would also have impact on SpecialJoinInfo, especially
for the case of identity 3, because find_nonnullable_rels() is also used
to determine strict_relids from the clause. As an example, consider

select * from a left join b on a.i = b.i
left join c on b.j = ANY (select j from c);

Now we can know the SubPlan is strict for 'b'. Thus the b/c join would
be considered to be legal.

Thanks
Richard


v1-0001-Check-SubPlan-clause-for-nonnullable-rels-Vars.patch
Description: Binary data


Re: [PATCH] initdb: do not exit after warn_on_mount_point

2022-09-11 Thread Julien Rouhaud
Hi,

On Sat, Sep 10, 2022 at 07:14:59PM +, andrey.ara...@nixaid.com wrote:
>
> Have slightly improved the logic so that it does not report an error
> "directory \"%s\" exists but is not empty"
> when it is only supposed to warn the user about the mountpoint, without
> exiting.
>
> To me, my patch looks like a typo fix where exit(1) should not be called on
> the warn_on_mount_point(), but only warn and continue as more people are
> mounting the device at `/var/lib/postgresql/data` (PGDATA) in the
> containerized world (K8s deployments, especially now in the Akash Network I
> am working for) for making sure their data persist.

This definitely isn't a typo, as it's really strongly discouraged to use a
mount point for the data directory.  You can refer to this thread [1] for more
explanations.

[1] 
https://www.postgresql.org/message-id/flat/CAKoxK%2B6H40imynM5P31bf0DnpN-5f5zeROjcaj6BKVAjxdEA9w%40mail.gmail.com




Re: [PATCH] Clarify the comments about varlena header encoding

2022-09-11 Thread Aleksander Alekseev
Hi John,

Many thanks for the feedback!

> Or put the now-standard 0b in front.

Good idea.

> I think the problem is ambiguity about what a "toast pointer" is. This 
> comment:
>
> * struct varatt_external is a traditional "TOAST pointer", that is, the

Right. The comment for varatt_external says that it IS a TOAST
pointer. However the comments for varlena headers bit layout
implicitly include it into a TOAST pointer, which contradicts the
previous comments. I suggest we fix this ambiguity by explicitly
enumerating the type tag in the comments for varlena headers.

> The patch added xxx's for the type tag in a comment about
> the header. This is more misleading than what is there now.

OK, here is another attempt. Changes compared to v1:

* "xxx xxx xxx" were removed, according to the feedback
* 0b prefix was added in order to make sure the reader will not
misread this as a hex values
* The clarification about the type tag was added
* The references to "first case", "second case", etc were removed

Hopefully it's better now.

-- 
Best regards,
Aleksander Alekseev


v2-0001-Clarify-the-comments-about-varlena-header-encodin.patch
Description: Binary data


Re: [PATCH] initdb: do not exit after warn_on_mount_point

2022-09-11 Thread andrey . arapov
Hi Tom,

I've updated the patch by adding the explanation behind and more comments. 
(please see the attachment)

Have slightly improved the logic so that it does not report an error
"directory \"%s\" exists but is not empty"
when it is only supposed to warn the user about the mountpoint, without exiting.

To me, my patch looks like a typo fix where exit(1) should not be called on the 
warn_on_mount_point(),
but only warn and continue as more people are mounting the device at 
`/var/lib/postgresql/data` (PGDATA) in the containerized world (K8s 
deployments, especially now in the Akash Network I am working for) for making 
sure their data persist.

As a workaround, we either have to `rmdir /var/lib/postgresql/data/lost+found` 
before running `docker-entrypoint.sh postgres` which in turn calls the 
`initdb`, or, alternatively we have to pass 
`PGDATA=/var/lib/postgresql/data/` while mounting persistent storage 
over `/var/lib/postgresql/data` path so that it won't exit on the very first 
run.
To me, this in itself is an odd behavior, which led me to finding this typo 
(from my point of view) to which I've made this patch.


Please let me know if it makes sense or requires more information / explanation.


Kind regards,
Andrey Arapov


September 10, 2022 5:10 PM, "Tom Lane"  wrote:

> andrey.ara...@nixaid.com writes:
> 
>> please find my first patch for PostgreSQL is attached.
> 
> You haven't explained why you think this would be a good
> change, or even a safe one.
> 
> regards, tom lane


0001-initdb-do-not-exit-when-PGDATA-PGDATA-pg_wal-is-a-mo.patch
Description: Binary data