Re: failing to build preproc.c on solaris with sun studio

2022-08-06 Thread Tom Lane
Andres Freund  writes:
> On 2022-08-06 22:55:14 -0400, Tom Lane wrote:
>> IMO, it'd be entirely reasonable for Andres to say that *he* doesn't
>> want to fix the meson build scripts for niche platform X.  Then
>> it'd be up to people who care about platform X to make that happen.
>> Given the current plan of supporting the Makefiles for some years
>> more, there wouldn't even be any great urgency in that.

> The "problem" in this case is that maintaining pgxs compatibility, as we'd
> discussed at pgcon, requires emitting stuff for all the @whatever@ things in
> Makefile.global.in, including with_gnu_ld.

Sure, but why can't you just leave that for later by hard-wiring it
to false in the meson build?  As long as you don't break the Makefile
build, no one is worse off.

I think if we want to get this past the finish line, we need to
acknowledge that the initial commit isn't going to be perfect.
The whole point of continuing to maintain the Makefiles is to
give us breathing room to fix remaining issues in a leisurely
fashion.

regards, tom lane




Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?

2022-08-06 Thread Bharath Rupireddy
On Sun, Aug 7, 2022 at 7:43 AM Thomas Munro  wrote:
>
> On Sun, Aug 7, 2022 at 1:12 PM Bharath Rupireddy
>  wrote:
> > On Sat, Aug 6, 2022 at 12:11 PM Michael Paquier  wrote:
> > Yeah. pg_pwritev_with_retry can also be part of common/file_utils.c/.h
> > so that everyone can use it.
> >
> > > > Thoughts?
> > >
> > > Makes sense to me for the WAL segment pre-padding initialization, as
> > > we still want to point to the beginning of the segment after we are
> > > done with the pre-padding, and the code has an extra lseek().
> >
> > Thanks. I attached the v1 patch, please review it.
>
> Hi Bharath,
>
> +1 for moving pg_pwritev_with_retry() to file_utils.c, but I think the
> commit message should say that is happening.  Maybe the move should
> even be in a separate patch (IMHO it's nice to separate mechanical
> change patches from new logic/work patches).

Agree. I separated out the changes.

> +/*
> + * A convenience wrapper for pg_pwritev_with_retry() that zero-fills the 
> given
> + * file of size total_sz in batches of size block_sz.
> + */
> +ssize_t
> +pg_pwritev_with_retry_and_init(int fd, int total_sz, int block_sz)
>
> Hmm, why not give it a proper name that says it writes zeroes?

Done.

> Size arguments around syscall-like facilities are usually size_t.
>
> +memset(zbuffer.data, 0, block_sz);
>
> I believe the modern fashion as of a couple of weeks ago is to tell
> the compiler to zero-initialise.  Since it's a union you'd need
> designated initialiser syntax, something like zbuffer = { .data = {0}
> }?

Hm, but we have many places still using memset(). If we were to change
these syntaxes, IMO, it must be done separately.

> +iov[i].iov_base = zbuffer.data;
> +iov[i].iov_len = block_sz;
>
> How can it be OK to use caller supplied block_sz, when
> sizeof(zbuffer.data) is statically determined?  What is the point of
> this argument?

Yes, removed block_sz function parameter.

> -dir_data->lasterrno = errno;
> +/* If errno isn't set, assume problem is no disk space */
> +dir_data->lasterrno = errno ? errno : ENOSPC;
>
> This coding pattern is used in places where we blame short writes on
> lack of disk space without bothering to retry.  But the code used in
> this patch already handles short writes: it always retries, until
> eventually, if you really are out of disk space, you should get an
> actual ENOSPC from the operating system.  So I don't think the
> guess-it-must-be-ENOSPC technique applies here.

Done.

Thanks for reviewing. PSA v2 patch-set. Also,I added a CF entry
https://commitfest.postgresql.org/39/3803/ to give the patches a
chance to get tested.

--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
From c862c9d079c3f78d2eb0405d5c4addd93847f102 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Sun, 7 Aug 2022 04:40:10 +
Subject: [PATCH v2] Move pg_pwritev_with_retry() to file_utils.c

Move pg_pwritev_with_retry() from file/fd.c to common/file_utils.c
so that non-backend code or FRONTEND cases can also use it.
---
 src/backend/storage/file/fd.c   | 65 -
 src/common/file_utils.c | 65 +
 src/include/common/file_utils.h |  7 
 src/include/storage/fd.h|  6 ---
 4 files changed, 72 insertions(+), 71 deletions(-)

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index efb34d4dcb..ac611b836f 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -95,7 +95,6 @@
 #include "common/pg_prng.h"
 #include "miscadmin.h"
 #include "pgstat.h"
-#include "port/pg_iovec.h"
 #include "portability/mem.h"
 #include "postmaster/startup.h"
 #include "storage/fd.h"
@@ -3747,67 +3746,3 @@ data_sync_elevel(int elevel)
 {
 	return data_sync_retry ? elevel : PANIC;
 }
-
-/*
- * A convenience wrapper for pwritev() that retries on partial write.  If an
- * error is returned, it is unspecified how much has been written.
- */
-ssize_t
-pg_pwritev_with_retry(int fd, const struct iovec *iov, int iovcnt, off_t offset)
-{
-	struct iovec iov_copy[PG_IOV_MAX];
-	ssize_t		sum = 0;
-	ssize_t		part;
-
-	/* We'd better have space to make a copy, in case we need to retry. */
-	if (iovcnt > PG_IOV_MAX)
-	{
-		errno = EINVAL;
-		return -1;
-	}
-
-	for (;;)
-	{
-		/* Write as much as we can. */
-		part = pwritev(fd, iov, iovcnt, offset);
-		if (part < 0)
-			return -1;
-
-#ifdef SIMULATE_SHORT_WRITE
-		part = Min(part, 4096);
-#endif
-
-		/* Count our progress. */
-		sum += part;
-		offset += part;
-
-		/* Step over iovecs that are done. */
-		while (iovcnt > 0 && iov->iov_len <= part)
-		{
-			part -= iov->iov_len;
-			++iov;
-			--iovcnt;
-		}
-
-		/* Are they all done? */
-		if (iovcnt == 0)
-		{
-			/* We don't expect the kernel to write more than requested. */
-			Assert(part == 0);
-			break;
-		}
-
-		/*
-		 * Move whatever's left to the front of our mutable copy and adjust
-		 * the leading 

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-08-06 Thread Andres Freund
Hi,

On 2022-08-07 09:24:40 +0530, Dilip Kumar wrote:
> On Sat, Aug 6, 2022 at 9:36 PM Tom Lane  wrote:
> >
> > Dilip Kumar  writes:
> > > On Fri, Aug 5, 2022 at 10:43 AM Dilip Kumar  wrote:
> > >> Yeah maybe it is not necessary to close as these unowned smgr will
> > >> automatically get closed on the transaction end.
> >
> > I do not think this is a great idea for the per-relation smgrs created
> > during RelationCopyStorageUsingBuffer.  Yeah, they'll be mopped up at
> > transaction end, but that doesn't mean that creating possibly tens of
> > thousands of transient smgrs isn't going to cause performance issues.

I was assuming that the files would get reopened at the end of the transaction
anyway, but it looks like that's not the case, unless wal_level=minimal.

Hm. CreateAndCopyRelationData() calls RelationCreateStorage() with
register_delete = false, which is ok because createdb_failure_callback will
clean things up. But that's another thing that's not great for a routine with
a general name...


> Okay, so for that we can simply call smgrcloserellocator(rlocator);
> before exiting the RelationCopyStorageUsingBuffer() right?

Yea, I think so.


> > I think RelationCopyStorageUsingBuffer needs to open and then close
> > the smgrs it uses, which means that ReadBufferWithoutRelcache is not the
> > appropriate API for it to use, either; need to go down another level.
> 
> Not sure how going down another level would help, the whole point is
> that we don't want to keep the reference of the smgr for a long time
> especially in the loop which is interruptible.

Yea, I'm not following either.


Greetings,

Andres Freund




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-08-06 Thread Dilip Kumar
On Sat, Aug 6, 2022 at 9:36 PM Tom Lane  wrote:
>
> Dilip Kumar  writes:
> > On Fri, Aug 5, 2022 at 10:43 AM Dilip Kumar  wrote:
> >> Yeah maybe it is not necessary to close as these unowned smgr will
> >> automatically get closed on the transaction end.
>
> I do not think this is a great idea for the per-relation smgrs created
> during RelationCopyStorageUsingBuffer.  Yeah, they'll be mopped up at
> transaction end, but that doesn't mean that creating possibly tens of
> thousands of transient smgrs isn't going to cause performance issues.

Okay, so for that we can simply call smgrcloserellocator(rlocator);
before exiting the RelationCopyStorageUsingBuffer() right?

> I think RelationCopyStorageUsingBuffer needs to open and then close
> the smgrs it uses, which means that ReadBufferWithoutRelcache is not the
> appropriate API for it to use, either; need to go down another level.

Not sure how going down another level would help, the whole point is
that we don't want to keep the reference of the smgr for a long time
especially in the loop which is interruptible.  So everytime we need
smgr we can call smgropen and if it is already in the smgr cache then
we will get it from there.  So I think it makes sense that when we are
exiting the function that time we can just call smgrcloserellocator()
so that if it is opened it will be closed and otherwise it will do
nothing.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Cleaning up historical portability baggage

2022-08-06 Thread Andres Freund
Hi,

On 2022-08-06 22:58:12 -0400, Tom Lane wrote:
> You could pull it out and see if the buildfarm breaks, but my money
> is on it breaking.  That HAVE_BUGGY_STRTOF stuff isn't very old.

We only recently figured out that we should use the ucrt runtime (and that it
exists, I guess).
fairywren and jacan's first runs with ucrt are from mid February:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren=2022-02-13%2007%3A11%3A46
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana=2022-02-17%2016%3A15%3A24

We probably should just throw an error if msvcrt is used. That's the old, pre
C99, microsoft C runtime, with some mingw replacement functions ontop.  I
think our tests already don't pass when it's used. See [1] for more info.

Not entirely sure how to best detect ucrt use - we could just check MSYSTEM,
but that's not determinative because one also can specify the compiler via the
prefix...


That'd still leave us with the alternative output files due to cygwin, I
think.

Greetings,

Andres Freund


[1] https://www.msys2.org/docs/environments/




Re: failing to build preproc.c on solaris with sun studio

2022-08-06 Thread Noah Misch
On Sat, Aug 06, 2022 at 08:12:54PM -0700, Andres Freund wrote:
> The "problem" in this case is that maintaining pgxs compatibility, as we'd
> discussed at pgcon, requires emitting stuff for all the @whatever@ things in
> Makefile.global.in, including with_gnu_ld. Which lead me down the rabbithole
> of trying to build on solaris, with sun studio, to see if we could just remove
> with_gnu_ld (and some others).
> 
> There's a lot of replacements that really aren't needed for pgxs, including
> with_gnu_ld (after the patch I just sent on the "baggage" thread). I tried to
> think of a way to have a 'missing' equivalent for variables filled with bogus
> contents, to trigger an error when they're used. But I don't think there's
> such a thing?

For some patterns of variable use, this works:

badvar = $(error do not use badvar)
ok:
echo hello
bad:
echo $(badvar)




Re: failing to build preproc.c on solaris with sun studio

2022-08-06 Thread Andres Freund
Hi,

On 2022-08-06 22:55:14 -0400, Tom Lane wrote:
> IMO, it'd be entirely reasonable for Andres to say that *he* doesn't
> want to fix the meson build scripts for niche platform X.  Then
> it'd be up to people who care about platform X to make that happen.
> Given the current plan of supporting the Makefiles for some years
> more, there wouldn't even be any great urgency in that.

The "problem" in this case is that maintaining pgxs compatibility, as we'd
discussed at pgcon, requires emitting stuff for all the @whatever@ things in
Makefile.global.in, including with_gnu_ld. Which lead me down the rabbithole
of trying to build on solaris, with sun studio, to see if we could just remove
with_gnu_ld (and some others).

There's a lot of replacements that really aren't needed for pgxs, including
with_gnu_ld (after the patch I just sent on the "baggage" thread). I tried to
think of a way to have a 'missing' equivalent for variables filled with bogus
contents, to trigger an error when they're used. But I don't think there's
such a thing?


I haven't "really" tried because recent-ish python fails to configure on
solaris without modifications, and patching python's configure was further
than I wanted to go, but I don't forsee much issues supporting building on
solaris with gcc.


Baring minor adjustments (for e.g. dragonflybsd vs freebsd), there's two
currently "supported" OS that require some work:
- AIX, due to the symbol import / export & linking differences
- cygwin, although calling that supported right now is a stretch... I don't
  think it'd be too hard, but ...

Greetings,

Andres Freund




Re: Cleaning up historical portability baggage

2022-08-06 Thread Tom Lane
Andres Freund  writes:
> On 2022-08-07 11:47:31 +1200, Thomas Munro wrote:
>> So what about strtof?  That's gotta be dead code too.  I gather we
>> still need commit 72880ac1's HAVE_BUGGY_STRTOF.

> Well, right now we don't refuse to build against the "wrong" runtimes, so it's
> hard to say whether you're looking at the right runtime. I don't think we need
> this if we're (as we should imo) only using the ucrt - that's microsoft's,
> which IIUC is ok?

You could pull it out and see if the buildfarm breaks, but my money
is on it breaking.  That HAVE_BUGGY_STRTOF stuff isn't very old.

regards, tom lane




Re: Cleaning up historical portability baggage

2022-08-06 Thread Andres Freund
Hi,

On 2022-08-07 14:29:20 +1200, Thomas Munro wrote:
> On Sun, Aug 7, 2022 at 1:29 PM Andres Freund  wrote:
> > 0001: __func__ is C99, so we don't need to support fallbacks
> 
> +1, and my scraped data agrees.
> 
> I believe our minimum MSVC is current 2015, and this says it has it
> (it doesn't let you select older versions in the version drop-down,
> but we don't care about older versions):

Thanks for checking.


> > 0002: windows: We've unconditionally defined HAVE_MINIDUMP_TYPE for msvc 
> > forever, we
> >   can rely on it for mingw too
> 
>   * If supported on the current platform, set up a handler to be called if
>   * the backend/postmaster crashes with a fatal signal or exception.
>   */
> -#if defined(WIN32) && defined(HAVE_MINIDUMP_TYPE)
> +#if defined(WIN32)
> 
> Personally I'd remove "If supported on the current platform, " and
> shove the rest of the comment inside the #if defined(WIN32), but
> that's just me...

I started out that way as well, but it'd actually be nice to do this on other
platforms too, and we just don't support it yet :)


> >   I checked and the relevant options (-shared, -Wl,-Bsymbolic, -Wl,-soname)
> >   work even on solaris 10 with developerstudio12.5 (not the latest)
> 
> FWIW I'd call Solaris 10 EOL'd (it's in some
> sure-pay-us-but-we-aren't-really-going-to-fix-it phase with a lifetime
> similar to the actual sun).

I'd agree - but checked so that couldn't even be an argument against :)


Greetings,

Andres Freund




Re: failing to build preproc.c on solaris with sun studio

2022-08-06 Thread Tom Lane
Noah Misch  writes:
> On Sat, Aug 06, 2022 at 06:09:27PM -0700, Andres Freund wrote:
>> And it's not a cost free thing to support, e.g. I tried to build because
>> solaris with suncc forces me to generate with_gnu_ld when generating a
>> compatible Makefile.global for pgxs with meson.

> There may be a strong argument along those lines.  Let's suppose you were to
> write that revoking sunpro support would save four weeks of Andres Freund time
> in the meson adoption project.  I bet a critical mass of people would like
> that trade.  That's orthogonal to preproc.o compilation RAM usage.

IMO, it'd be entirely reasonable for Andres to say that *he* doesn't
want to fix the meson build scripts for niche platform X.  Then
it'd be up to people who care about platform X to make that happen.
Given the current plan of supporting the Makefiles for some years
more, there wouldn't even be any great urgency in that.

regards, tom lane




Re: Cleaning up historical portability baggage

2022-08-06 Thread Andres Freund
Hi,

On 2022-08-07 11:47:31 +1200, Thomas Munro wrote:
> So what about strtof?  That's gotta be dead code too.  I gather we
> still need commit 72880ac1's HAVE_BUGGY_STRTOF.

> From a cursory glance at MinGW's implementation, it still has the
> complained-about behaviour, if I've understood the complaint, and if I'm
> looking at the right C runtime[1].

Well, right now we don't refuse to build against the "wrong" runtimes, so it's
hard to say whether you're looking at the right runtime. I don't think we need
this if we're (as we should imo) only using the ucrt - that's microsoft's,
which IIUC is ok?


> -/*
> - * strtof() is part of C99; this version is only for the benefit of obsolete
> - * platforms. As such, it is known to return incorrect values for edge cases,
> - * which have to be allowed for in variant files for regression test results
> - * for any such platform.
> - */

We can't remove the result files referenced here yet, due to the double
rounding behaviour?

Greetings,

Andres Freund




Re: Cleaning up historical portability baggage

2022-08-06 Thread Thomas Munro
On Sun, Aug 7, 2022 at 1:29 PM Andres Freund  wrote:
> 0001: __func__ is C99, so we don't need to support fallbacks

+1, and my scraped data agrees.

I believe our minimum MSVC is current 2015, and this says it has it
(it doesn't let you select older versions in the version drop-down,
but we don't care about older versions):

https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros?view=msvc-140

> 0002: windows: We've unconditionally defined HAVE_MINIDUMP_TYPE for msvc 
> forever, we
>   can rely on it for mingw too

  * If supported on the current platform, set up a handler to be called if
  * the backend/postmaster crashes with a fatal signal or exception.
  */
-#if defined(WIN32) && defined(HAVE_MINIDUMP_TYPE)
+#if defined(WIN32)

Personally I'd remove "If supported on the current platform, " and
shove the rest of the comment inside the #if defined(WIN32), but
that's just me...

> 0003: aix: aix3.2.5, aix4.1 are not even of historical interest at this point
>   - 4.1 was released before the first commit in our commit history

Wow.

> 0004: solaris: these gcc & gnu ld vs sun stuff differences seem unnecessary or
>   outdated

LGTM from a look at the current man page.

>   I checked and the relevant options (-shared, -Wl,-Bsymbolic, -Wl,-soname)
>   work even on solaris 10 with developerstudio12.5 (not the latest)

FWIW I'd call Solaris 10 EOL'd (it's in some
sure-pay-us-but-we-aren't-really-going-to-fix-it phase with a lifetime
similar to the actual sun).

> 0005: those broken system headers look to have been repaired a good while ago,
>  or, in the case of irix, we don't support the platform anymore

Nice archeology.




Re: Patch to address creation of PgStat* contexts with null parent context

2022-08-06 Thread Andres Freund
Hi,

On 2022-08-05 17:22:38 +0900, Kyotaro Horiguchi wrote:
> I think it a bit different.  Previously that memory (but for a bit
> different use, precisely) was required only when stats data is read so
> almost all server processes didn't need it.  Now, every server process
> that uses pgstats requires the two memory if it is going to write
> stats.  Even if that didn't happen until process termination, that
> memory eventually required to flush possibly remaining data.  That
> final write might be avoidable but I'm not sure it's worth the
> trouble.  As the result, calling pgstat_initialize() is effectively
> the declaration that the process requires the memory.

I don't think every process will end up calling pgstat_setup_memcxt() -
e.g. walsender, bgwriter, checkpointer probably don't? What do we gain by
creating the contexts eagerly?


> Thus I thought that we may let pgstat_initialize() promptly allocate
> the memory.

That makes some sense - but pgstat_attach_shmem() seems like a very strange
place for the call to CreateCacheMemoryContext().


I wonder if we shouldn't just use TopMemoryContext as the parent for most of
these contexts instead. CacheMemoryContext isn't actually a particularly good
fit anymore.

Greetings,

Andres Freund




Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?

2022-08-06 Thread Thomas Munro
On Sun, Aug 7, 2022 at 1:12 PM Bharath Rupireddy
 wrote:
> On Sat, Aug 6, 2022 at 12:11 PM Michael Paquier  wrote:
> Yeah. pg_pwritev_with_retry can also be part of common/file_utils.c/.h
> so that everyone can use it.
>
> > > Thoughts?
> >
> > Makes sense to me for the WAL segment pre-padding initialization, as
> > we still want to point to the beginning of the segment after we are
> > done with the pre-padding, and the code has an extra lseek().
>
> Thanks. I attached the v1 patch, please review it.

Hi Bharath,

+1 for moving pg_pwritev_with_retry() to file_utils.c, but I think the
commit message should say that is happening.  Maybe the move should
even be in a separate patch (IMHO it's nice to separate mechanical
change patches from new logic/work patches).

+/*
+ * A convenience wrapper for pg_pwritev_with_retry() that zero-fills the given
+ * file of size total_sz in batches of size block_sz.
+ */
+ssize_t
+pg_pwritev_with_retry_and_init(int fd, int total_sz, int block_sz)

Hmm, why not give it a proper name that says it writes zeroes?

Size arguments around syscall-like facilities are usually size_t.

+memset(zbuffer.data, 0, block_sz);

I believe the modern fashion as of a couple of weeks ago is to tell
the compiler to zero-initialise.  Since it's a union you'd need
designated initialiser syntax, something like zbuffer = { .data = {0}
}?

+iov[i].iov_base = zbuffer.data;
+iov[i].iov_len = block_sz;

How can it be OK to use caller supplied block_sz, when
sizeof(zbuffer.data) is statically determined?  What is the point of
this argument?

-dir_data->lasterrno = errno;
+/* If errno isn't set, assume problem is no disk space */
+dir_data->lasterrno = errno ? errno : ENOSPC;

This coding pattern is used in places where we blame short writes on
lack of disk space without bothering to retry.  But the code used in
this patch already handles short writes: it always retries, until
eventually, if you really are out of disk space, you should get an
actual ENOSPC from the operating system.  So I don't think the
guess-it-must-be-ENOSPC technique applies here.




Re: Use fadvise in wal replay

2022-08-06 Thread Bharath Rupireddy
On Sat, Aug 6, 2022 at 10:53 AM Andrey Borodin  wrote:
>
> Hi Bharath,
>
> thank you for the suggestion.
>
> > On 5 Aug 2022, at 16:02, Bharath Rupireddy 
> >  wrote:
> >
> > On Thu, Aug 4, 2022 at 9:48 PM Andrey Borodin  wrote:
> >>
> >>> On 18 Jul 2022, at 22:55, Robert Haas  wrote:
> >>>
> >>> On Thu, Jun 23, 2022 at 5:49 AM Jakub Wartak  
> >>> wrote:
> >
> > I have a fundamental question on the overall idea - How beneficial it
> > will be if the process that's reading the current WAL page only does
> > (at least attempts) the prefetching of future WAL pages? Won't the
> > benefit be higher if "some" other background process does prefetching?
>
> IMO prefetching from other thread would have negative effect.
> fadvise() call is non-blocking, startup process won't do IO. It just informs 
> kernel to schedule asynchronous page read.
> On the other hand synchronization with other process might cost more than 
> fadvise().

Hm, POSIX_FADV_WILLNEED flag makes fadvise() non-blocking.

> Anyway cost of calling fadise() once per 16 page reads is neglectable.

Agree. Why can't we just prefetch the entire WAL file once whenever it
is opened for the first time? Does the OS have any limitations on max
size to prefetch at once? It may sound aggressive, but it avoids
fadvise() system calls, this will be especially useful if there are
many WAL files to recover (crash, PITR or standby recovery),
eventually we would want the total WAL file to be prefetched.

If prefetching the entire WAL file is okay, we could further do this:
1) prefetch in XLogFileOpen() and all of segment_open callbacks, 2)
release in XLogFileClose (it's being dong right now) and all of
segment_close callbacks - do this perhaps optionally.

Also, can't we use an existing function FilePrefetch()? That way,
there is no need for a new wait event type.

Thoughts?

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/




Re: Cleaning up historical portability baggage

2022-08-06 Thread Andres Freund
Hi,

Here's another set patches for cruft I discovered going line-by-line through
the autoconf vs meson test differences. They'd all be simple to port to meson
too, but I think it's better to clean them up.

0001: __func__ is C99, so we don't need to support fallbacks

0002: windows: We've unconditionally defined HAVE_MINIDUMP_TYPE for msvc 
forever, we
  can rely on it for mingw too

0003: aix: aix3.2.5, aix4.1 are not even of historical interest at this point
  - 4.1 was released before the first commit in our commit history

0004: solaris: these gcc & gnu ld vs sun stuff differences seem unnecessary or
  outdated

  I started this because I wanted to get rid of with_gnu_ld, but there's still
  a necessary reference left unfortunately. But it still seems worth doing?

  I checked and the relevant options (-shared, -Wl,-Bsymbolic, -Wl,-soname)
  work even on solaris 10 with developerstudio12.5 (not the latest)

0005: those broken system headers look to have been repaired a good while ago,
 or, in the case of irix, we don't support the platform anymore

Greetings,

Andres Freund
>From f69c4e34d82c9441402d939e39575fa0afc949e8 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 5 Aug 2022 17:25:49 -0700
Subject: [PATCH 1/5] Rely on __func__ being supported

Previously we fell back to __FUNCTION__ and then NULL. As __func__ is in C99
that shouldn't be necessary anymore.

For some reason Solution.pm defined HAVE_FUNCNAME__FUNCTION instead of
HAVE_FUNCNAME__FUNC (originating in 4164e6636e2), there doesn't seem to be a
reason to continue with that as __func__ is supported by msvc.
---
 src/include/c.h   | 11 --
 src/include/pg_config.h.in|  6 ---
 src/include/storage/s_lock.h  |  4 +-
 src/include/utils/elog.h  |  4 +-
 config/c-compiler.m4  | 26 -
 src/backend/storage/lmgr/s_lock.c |  2 +-
 configure | 61 ---
 configure.ac  |  1 -
 src/tools/msvc/Solution.pm|  2 -
 9 files changed, 5 insertions(+), 112 deletions(-)

diff --git a/src/include/c.h b/src/include/c.h
index 8c4baeb0ec3..de9ec04d494 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -360,17 +360,6 @@ typedef void (*pg_funcptr_t) (void);
  */
 #define FLEXIBLE_ARRAY_MEMBER	/* empty */
 
-/* Which __func__ symbol do we have, if any? */
-#ifdef HAVE_FUNCNAME__FUNC
-#define PG_FUNCNAME_MACRO	__func__
-#else
-#ifdef HAVE_FUNCNAME__FUNCTION
-#define PG_FUNCNAME_MACRO	__FUNCTION__
-#else
-#define PG_FUNCNAME_MACRO	NULL
-#endif
-#endif
-
 
 /* 
  *Section 2:	bool, true, false
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 2d9a1cdc8ab..5ec31e532e3 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -158,12 +158,6 @@
 /* Define to 1 if fseeko (and presumably ftello) exists and is declared. */
 #undef HAVE_FSEEKO
 
-/* Define to 1 if your compiler understands __func__. */
-#undef HAVE_FUNCNAME__FUNC
-
-/* Define to 1 if your compiler understands __FUNCTION__. */
-#undef HAVE_FUNCNAME__FUNCTION
-
 /* Define to 1 if you have __atomic_compare_exchange_n(int *, int *, int). */
 #undef HAVE_GCC__ATOMIC_INT32_CAS
 
diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h
index 1c9f6f08954..0877cf65b0b 100644
--- a/src/include/storage/s_lock.h
+++ b/src/include/storage/s_lock.h
@@ -768,7 +768,7 @@ extern int	tas_sema(volatile slock_t *lock);
 
 #if !defined(S_LOCK)
 #define S_LOCK(lock) \
-	(TAS(lock) ? s_lock((lock), __FILE__, __LINE__, PG_FUNCNAME_MACRO) : 0)
+	(TAS(lock) ? s_lock((lock), __FILE__, __LINE__, __func__) : 0)
 #endif	 /* S_LOCK */
 
 #if !defined(S_LOCK_FREE)
@@ -855,7 +855,7 @@ init_spin_delay(SpinDelayStatus *status,
 	status->func = func;
 }
 
-#define init_local_spin_delay(status) init_spin_delay(status, __FILE__, __LINE__, PG_FUNCNAME_MACRO)
+#define init_local_spin_delay(status) init_spin_delay(status, __FILE__, __LINE__, __func__)
 extern void perform_spin_delay(SpinDelayStatus *status);
 extern void finish_spin_delay(SpinDelayStatus *status);
 
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 68ead8e8736..56398176901 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -140,7 +140,7 @@
 		if (__builtin_constant_p(elevel) && (elevel) >= ERROR ? \
 			errstart_cold(elevel, domain) : \
 			errstart(elevel, domain)) \
-			__VA_ARGS__, errfinish(__FILE__, __LINE__, PG_FUNCNAME_MACRO); \
+			__VA_ARGS__, errfinish(__FILE__, __LINE__, __func__); \
 		if (__builtin_constant_p(elevel) && (elevel) >= ERROR) \
 			pg_unreachable(); \
 	} while(0)
@@ -150,7 +150,7 @@
 		const int elevel_ = (elevel); \
 		pg_prevent_errno_in_scope(); \
 		if (errstart(elevel_, domain)) \
-			__VA_ARGS__, errfinish(__FILE__, __LINE__, PG_FUNCNAME_MACRO); \
+			__VA_ARGS__, errfinish(__FILE__, __LINE__, __func__); \
 		if (elevel_ >= ERROR) \
 			

Re: failing to build preproc.c on solaris with sun studio

2022-08-06 Thread Noah Misch
On Sat, Aug 06, 2022 at 06:09:27PM -0700, Andres Freund wrote:
> On 2022-08-06 17:59:54 -0700, Noah Misch wrote:
> > On Sat, Aug 06, 2022 at 05:43:50PM -0700, Andres Freund wrote:
> > > Sure, we can hack around it in some way. But if we need such hackery to
> > > compile postgres with a compiler, what's the point of supporting that
> > > compiler?  It's not like sunpro provides with awesome static analysis or 
> > > such.
> > 
> > To have a need to decide that, PostgreSQL would need to grow preproc.o such
> > that it causes 55% higher RAM usage, and the sunpro buildfarm members extant
> > at that time would need to have <= 32 GiB RAM.  I give a 15% chance of
> > reaching such conditions, and we don't gain much by deciding in advance.  
> > I'd
> > prefer to focus on decisions affecting more-probable outcomes.
> 
> My point wasn't about the future - *today* a compile with normal settings
> doesn't work, on a machine with a reasonable amount of ram. Who does it help
> if one person can get postgres to compile with some applied magic - normal
> users won't.

To me, 32G is on the low side of reasonable, and omitting --enable-debug isn't
that magical.  (The TMPDIR hack is optional, but I did it to lessen harm to
other users of that shared machine.)

> And it's not a cost free thing to support, e.g. I tried to build because
> solaris with suncc forces me to generate with_gnu_ld when generating a
> compatible Makefile.global for pgxs with meson.

There may be a strong argument along those lines.  Let's suppose you were to
write that revoking sunpro support would save four weeks of Andres Freund time
in the meson adoption project.  I bet a critical mass of people would like
that trade.  That's orthogonal to preproc.o compilation RAM usage.




Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?

2022-08-06 Thread Bharath Rupireddy
On Sat, Aug 6, 2022 at 12:11 PM Michael Paquier  wrote:
>
> On Fri, Aug 05, 2022 at 03:55:26PM +0530, Bharath Rupireddy wrote:
> > I noticed that dir_open_for_write() in walmethods.c uses write() for
> > WAL file initialization (note that this code is used by pg_receivewal
> > and pg_basebackup) as opposed to core using pg_pwritev_with_retry() in
> > XLogFileInitInternal() to avoid partial writes. Do we need to fix
> > this?
>
> 0d56acfb has moved pg_pwritev_with_retry to be backend-only in fd.c :/

Yeah. pg_pwritev_with_retry can also be part of common/file_utils.c/.h
so that everyone can use it.

> > Thoughts?
>
> Makes sense to me for the WAL segment pre-padding initialization, as
> we still want to point to the beginning of the segment after we are
> done with the pre-padding, and the code has an extra lseek().

Thanks. I attached the v1 patch, please review it.

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/


v1-0001-Use-pg_pwritev_with_retry-instead-of-write-in-wal.patch
Description: Binary data


Re: failing to build preproc.c on solaris with sun studio

2022-08-06 Thread Tom Lane
Noah Misch  writes:
> On Sat, Aug 06, 2022 at 05:43:50PM -0700, Andres Freund wrote:
>> Sure, we can hack around it in some way. But if we need such hackery to
>> compile postgres with a compiler, what's the point of supporting that
>> compiler?  It's not like sunpro provides with awesome static analysis or 
>> such.

> To have a need to decide that, PostgreSQL would need to grow preproc.o such
> that it causes 55% higher RAM usage, and the sunpro buildfarm members extant
> at that time would need to have <= 32 GiB RAM.  I give a 15% chance of
> reaching such conditions, and we don't gain much by deciding in advance.  I'd
> prefer to focus on decisions affecting more-probable outcomes.

I think it's the same rationale as with other buildfarm animals
representing niche systems: we make the effort to support them
in order to avoid becoming locked into a software monoculture.
There's not that many compilers in the farm besides gcc/clang/MSVC,
so I feel anyplace we can find one is valuable.

As per previous discussion, it may well be that gcc/clang are
dominating the field so thoroughly that nobody wants to develop
competitors anymore.  So in the long run this may be a dead end.
But it's hard to be sure about that.  For now, as long as
somebody's willing to do the work to support a compiler that's
not gcc/clang, we should welcome it.

regards, tom lane




Re: failing to build preproc.c on solaris with sun studio

2022-08-06 Thread Andres Freund
On 2022-08-06 17:59:54 -0700, Noah Misch wrote:
> On Sat, Aug 06, 2022 at 05:43:50PM -0700, Andres Freund wrote:
> > Sure, we can hack around it in some way. But if we need such hackery to
> > compile postgres with a compiler, what's the point of supporting that
> > compiler?  It's not like sunpro provides with awesome static analysis or 
> > such.
> 
> To have a need to decide that, PostgreSQL would need to grow preproc.o such
> that it causes 55% higher RAM usage, and the sunpro buildfarm members extant
> at that time would need to have <= 32 GiB RAM.  I give a 15% chance of
> reaching such conditions, and we don't gain much by deciding in advance.  I'd
> prefer to focus on decisions affecting more-probable outcomes.

My point wasn't about the future - *today* a compile with normal settings
doesn't work, on a machine with a reasonable amount of ram. Who does it help
if one person can get postgres to compile with some applied magic - normal
users won't.

And it's not a cost free thing to support, e.g. I tried to build because
solaris with suncc forces me to generate with_gnu_ld when generating a
compatible Makefile.global for pgxs with meson.




Re: failing to build preproc.c on solaris with sun studio

2022-08-06 Thread Noah Misch
On Sat, Aug 06, 2022 at 05:43:50PM -0700, Andres Freund wrote:
> On 2022-08-06 17:25:52 -0700, Noah Misch wrote:
> > On Sat, Aug 06, 2022 at 08:05:14PM -0400, Tom Lane wrote:
> > > Andres Freund  writes:
> > > > Yikes. And it's not like newer compiler versions are likely to be 
> > > > forthcoming
> > > > (12.6 is newest and is from 2017...). Wonder if we should just require 
> > > > gcc on
> > > > solaris... There's a decent amount of stuff we could rip out in that 
> > > > case.
> > > 
> > > Seems like it's only a matter of time before we add enough stuff to
> > > the grammar that the build fails, period.
> > 
> > I wouldn't worry about that enough to work hard in advance.  The RAM usage 
> > can
> > grow by about 55% before that's a problem.  (The 14G ulimit can tolerate a
> > raise.)  By then, the machine may be gone or have more RAM.  Perhaps even
> > Bison will have changed its code generation.  If none of those happen, I 
> > could
> > switch to gcc, hack things to use gcc for just preproc.o, etc.
> 
> Sure, we can hack around it in some way. But if we need such hackery to
> compile postgres with a compiler, what's the point of supporting that
> compiler?  It's not like sunpro provides with awesome static analysis or such.

To have a need to decide that, PostgreSQL would need to grow preproc.o such
that it causes 55% higher RAM usage, and the sunpro buildfarm members extant
at that time would need to have <= 32 GiB RAM.  I give a 15% chance of
reaching such conditions, and we don't gain much by deciding in advance.  I'd
prefer to focus on decisions affecting more-probable outcomes.




Re: failing to build preproc.c on solaris with sun studio

2022-08-06 Thread Andres Freund
Hi,

On 2022-08-06 17:25:52 -0700, Noah Misch wrote:
> On Sat, Aug 06, 2022 at 08:05:14PM -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > Yikes. And it's not like newer compiler versions are likely to be 
> > > forthcoming
> > > (12.6 is newest and is from 2017...). Wonder if we should just require 
> > > gcc on
> > > solaris... There's a decent amount of stuff we could rip out in that case.
> > 
> > Seems like it's only a matter of time before we add enough stuff to
> > the grammar that the build fails, period.
> 
> I wouldn't worry about that enough to work hard in advance.  The RAM usage can
> grow by about 55% before that's a problem.  (The 14G ulimit can tolerate a
> raise.)  By then, the machine may be gone or have more RAM.  Perhaps even
> Bison will have changed its code generation.  If none of those happen, I could
> switch to gcc, hack things to use gcc for just preproc.o, etc.

Sure, we can hack around it in some way. But if we need such hackery to
compile postgres with a compiler, what's the point of supporting that
compiler?  It's not like sunpro provides with awesome static analysis or such.

Greetings,

Andres Freund




Re: failing to build preproc.c on solaris with sun studio

2022-08-06 Thread Andres Freund
Hi,

On 2022-08-06 20:05:14 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2022-08-06 16:09:24 -0700, Noah Misch wrote:
> >> From the earliest days of wrasse, the compiler used too much RAM to build
> >> preproc.o with --enable-debug.  As of 2021-04, the compiler's "acomp" phase
> >> needed 10G in one process, and later phases needed 11.6G across two 
> >> processes.
> >> Compilation wrote 3.7G into TMPDIR.  Since /tmp consumes RAM+swap, 
> >> overriding
> >> TMPDIR relieved 3.7G of RAM pressure.  Even with those protections, wrasse
> >> intermittently reaches the 14G limit I impose (via "ulimit -v 14680064").  
> >> I
> >> had experimented with different optimization levels, but that didn't help.
>
> > Yikes. And it's not like newer compiler versions are likely to be 
> > forthcoming
> > (12.6 is newest and is from 2017...). Wonder if we should just require gcc 
> > on
> > solaris... There's a decent amount of stuff we could rip out in that case.
>
> Seems like it's only a matter of time before we add enough stuff to
> the grammar that the build fails, period.

Yea, it doesn't look too far off.


> However, I wonder why exactly it's so large, and why the backend's gram.o
> isn't an even bigger problem.  Maybe an effort to cut preproc.o's code
> size could yield dividends?

gram.c also compiles slowly and uses a lot of memory. Roughly ~8GB memory at
the peak (just watching top) and 1m40s (with debugging disabled, temp files on
disk etc).

I don't entirely know what parse.pl actually tries to achieve. The generated
output looks more different from gram.y than I'd have imagined.

It's certainly interesting that it ends up rougly 30% larger .c bison
output. Which roughly matches the difference in memory usage.


> FWIW, my late and unlamented animal gaur was also showing unhappiness with
> the size of preproc.o, manifested as a boatload of warnings like
> /var/tmp//cc0MHZPD.s:11594: Warning: .stabn: description field '109d3' too 
> big, try a different debug format
> which did not happen with gram.o.

I suspect we're going to have to do something about the gram.c size on its
own. It's already the slowest compilation step by a lot, even on modern
compilers.

Greetings,

Andres Freund




Re: logical decoding and replication of sequences

2022-08-06 Thread Noah Misch
On Thu, Apr 07, 2022 at 08:34:50PM +0200, Tomas Vondra wrote:
> I've pushed a revert af all the commits related to this - decoding of
> sequences and test_decoding / built-in replication changes.

Two July buildfarm runs failed with PANIC during standby promotion:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern=2022-07-19%2004%3A13%3A18
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern=2022-07-31%2011%3A33%3A13

The attached patch hacks things so an ordinary x86_64 GNU/Linux machine
reproduces this consistently.  "git bisect" then traced the regression to the
above revert commit (2c7ea57e56ca5f668c32d4266e0a3e45b455bef5).  The pg_ctl
test suite passes under this hack in all supported branches, and it passed on
v15 until that revert.  Would you investigate?

The buildfarm animal uses keep_error_builds.  From kept data directories, I
deduced these events:

- After the base backup, auto-analyze ran on the primary and wrote WAL.
- Standby streamed and wrote up to 0/301FFF.
- Standby received the promote signal.  Terminated streaming.  WAL page at 
0/302000 remained all-zeros.
- Somehow, end-of-recovery became a PANIC.

Key portions from buildfarm logs:

=== good run standby2 log
2022-07-21 22:55:16.860 UTC [25034912:5] LOG:  received promote request
2022-07-21 22:55:16.878 UTC [26804682:2] FATAL:  terminating walreceiver 
process due to administrator command
2022-07-21 22:55:16.878 UTC [25034912:6] LOG:  invalid record length at 
0/360: wanted 24, got 0
2022-07-21 22:55:16.878 UTC [25034912:7] LOG:  redo done at 0/328 system 
usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.42 s
2022-07-21 22:55:16.878 UTC [25034912:8] LOG:  selected new timeline ID: 2
2022-07-21 22:55:17.004 UTC [25034912:9] LOG:  archive recovery complete
2022-07-21 22:55:17.005 UTC [23724044:1] LOG:  checkpoint starting: force
2022-07-21 22:55:17.008 UTC [14549364:4] LOG:  database system is ready to 
accept connections
2022-07-21 22:55:17.093 UTC [23724044:2] LOG:  checkpoint complete: wrote 3 
buffers (2.3%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.019 s, 
sync=0.001 s, total=0.089 s; sync files=0, longest=0.000 s, average=0.000 s; 
distance=16384 kB, estimate=16384 kB
2022-07-21 22:55:17.143 UTC [27394418:1] [unknown] LOG:  connection received: 
host=[local]
2022-07-21 22:55:17.144 UTC [27394418:2] [unknown] LOG:  connection authorized: 
user=nm database=postgres application_name=003_promote.pl
2022-07-21 22:55:17.147 UTC [27394418:3] 003_promote.pl LOG:  statement: SELECT 
pg_is_in_recovery()
2022-07-21 22:55:17.148 UTC [27394418:4] 003_promote.pl LOG:  disconnection: 
session time: 0:00:00.005 user=nm database=postgres host=[local]
2022-07-21 22:55:58.301 UTC [14549364:5] LOG:  received immediate shutdown 
request
2022-07-21 22:55:58.337 UTC [14549364:6] LOG:  database system is shut down

=== failed run standby2 log, with my annotations
2022-07-19 05:28:22.136 UTC [7340406:5] LOG:  received promote request
2022-07-19 05:28:22.163 UTC [8519860:2] FATAL:  terminating walreceiver process 
due to administrator command
2022-07-19 05:28:22.166 UTC [7340406:6] LOG:  invalid magic number  in log 
segment 00010003, offset 131072
  New compared to the good run.  XLOG_PAGE_MAGIC didn't match.  This implies 
the WAL ended at a WAL page boundary.
2022-07-19 05:28:22.166 UTC [7340406:7] LOG:  redo done at 0/301F168 system 
usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.18 s
2022-07-19 05:28:22.166 UTC [7340406:8] LOG:  last completed transaction was at 
log time 2022-07-19 05:28:13.956716+00
  New compared to the good run.  The good run had no transactions to replay.  
The bad run replayed records from an auto-analyze.
2022-07-19 05:28:22.166 UTC [7340406:9] PANIC:  invalid record length at 
0/301F168: wanted 24, got 0
  More WAL overall in bad run, due to auto-analyze.  End of recovery wrongly 
considered a PANIC.
2022-07-19 05:28:22.583 UTC [8388800:4] LOG:  startup process (PID 7340406) was 
terminated by signal 6: IOT/Abort trap
2022-07-19 05:28:22.584 UTC [8388800:5] LOG:  terminating any other active 
server processes
2022-07-19 05:28:22.587 UTC [8388800:6] LOG:  shutting down due to startup 
process failure
2022-07-19 05:28:22.627 UTC [8388800:7] LOG:  database system is shut down

Let me know if I've left out details you want; I may be able to dig more out
of the buildfarm artifacts.
Author: Noah Misch 
Commit: Noah Misch 



diff --git a/src/backend/replication/walreceiver.c 
b/src/backend/replication/walreceiver.c
index 8604fd4..6232237 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -885,6 +885,12 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr, 
TimeLineID tli)
 {
int startoff;
int byteswritten;
+   boolspin_after = false;
+   /* force writing to end at a certain WAL page boundary */
+   if (recptr + nbytes > 

Re: failing to build preproc.c on solaris with sun studio

2022-08-06 Thread Noah Misch
On Sat, Aug 06, 2022 at 08:05:14PM -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2022-08-06 16:09:24 -0700, Noah Misch wrote:
> >> From the earliest days of wrasse, the compiler used too much RAM to build
> >> preproc.o with --enable-debug.  As of 2021-04, the compiler's "acomp" phase
> >> needed 10G in one process, and later phases needed 11.6G across two 
> >> processes.
> >> Compilation wrote 3.7G into TMPDIR.  Since /tmp consumes RAM+swap, 
> >> overriding
> >> TMPDIR relieved 3.7G of RAM pressure.  Even with those protections, wrasse
> >> intermittently reaches the 14G limit I impose (via "ulimit -v 14680064").  
> >> I
> >> had experimented with different optimization levels, but that didn't help.
> 
> > Yikes. And it's not like newer compiler versions are likely to be 
> > forthcoming
> > (12.6 is newest and is from 2017...). Wonder if we should just require gcc 
> > on
> > solaris... There's a decent amount of stuff we could rip out in that case.
> 
> Seems like it's only a matter of time before we add enough stuff to
> the grammar that the build fails, period.

I wouldn't worry about that enough to work hard in advance.  The RAM usage can
grow by about 55% before that's a problem.  (The 14G ulimit can tolerate a
raise.)  By then, the machine may be gone or have more RAM.  Perhaps even
Bison will have changed its code generation.  If none of those happen, I could
switch to gcc, hack things to use gcc for just preproc.o, etc.

> So there's something pretty bloated there.  It doesn't seem like
> ecpg's additional productions should justify a nigh 50% code
> size increase.

True.




Re: Cleaning up historical portability baggage

2022-08-06 Thread Tom Lane
Thomas Munro  writes:
> On Sun, Aug 7, 2022 at 11:22 AM Tom Lane  wrote:
>> Thomas Munro  writes:
>>> I also wonder if half the stuff in win32gettimeofday.c can be deleted.

> This looks good on CI (well I haven't waited for it to finish yet, but
> MSVC compiles it without warning and we're most of the way through the
> tests...).

Looks plausible from here.  A couple other thoughts:

* While you're at it you could fix the "MinW" typo just above
the extern for gettimeofday.

* I'm half tempted to add something like this to gettimeofday:

/*
 * POSIX declines to define what tzp points to, saying
 * "If tzp is not a null pointer, the behavior is unspecified".
 * Let's take this opportunity to verify that noplace in
 * Postgres tries to use any unportable behavior.
 */
Assert(tzp == NULL);

regards, tom lane




Re: failing to build preproc.c on solaris with sun studio

2022-08-06 Thread Thomas Munro
On Sun, Aug 7, 2022 at 11:52 AM Andres Freund  wrote:
> Yikes. And it's not like newer compiler versions are likely to be forthcoming
> (12.6 is newest and is from 2017...). Wonder if we should just require gcc on
> solaris... There's a decent amount of stuff we could rip out in that case.

Independently of the RAM requirements topic, I totally agree that
doing extra work to support a compiler that hasn't had a release in 5
years doesn't seem like time well spent.




Re: Cleaning up historical portability baggage

2022-08-06 Thread Thomas Munro
On Sun, Aug 7, 2022 at 11:22 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > I also wonder if half the stuff in win32gettimeofday.c can be deleted.
> > From some light googling, it looks like
> > GetSystemTimePreciseAsFileTime() can just be called directly on
> > Windows 8+ (and we now require 10+), and that kernel32.dll malarky was
> > for older systems?
>
> Yeah, Microsoft's man page for it just says to include sysinfoapi.h
> (which we aren't) and then it should work on supported versions.

This looks good on CI (well I haven't waited for it to finish yet, but
MSVC compiles it without warning and we're most of the way through the
tests...).
From 6498faf80f5f2859e9228baf85f7e7974a29f4dc Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sun, 7 Aug 2022 11:57:23 +1200
Subject: [PATCH] Simplify gettimeofday for Windows.

Previously we bothered to forward-declare a struct timezone, following
man pages on typical systems, but POSIX actually says the argument
(which we ignore anyway) is void *.  Drop a line.

Previously we did extra work to select between Windows APIs needed on
older releases, but now we can just use the higher resolution function
directly.

Discussion: https://postgr.es/m/CA%2BhUKGKwRpvGfcfq2qNVAQS2Wg1B9eA9QRhAmVSyJt1zsCN2sQ%40mail.gmail.com
---
 src/include/port/win32_port.h |  3 +-
 src/port/win32gettimeofday.c  | 59 +++
 2 files changed, 5 insertions(+), 57 deletions(-)

diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h
index 79451a00f9..12b4cf6cdb 100644
--- a/src/include/port/win32_port.h
+++ b/src/include/port/win32_port.h
@@ -181,9 +181,8 @@
 
 /* MinW has gettimeofday(), but MSVC doesn't */
 #ifdef _MSC_VER
-struct timezone;
 /* Last parameter not used */
-extern int	gettimeofday(struct timeval *tp, struct timezone *tzp);
+extern int	gettimeofday(struct timeval *tp, void *tzp);
 #endif
 
 /* for setitimer in backend/port/win32/timer.c */
diff --git a/src/port/win32gettimeofday.c b/src/port/win32gettimeofday.c
index 0464548758..a7eaad 100644
--- a/src/port/win32gettimeofday.c
+++ b/src/port/win32gettimeofday.c
@@ -28,6 +28,8 @@
 
 #include "c.h"
 
+#include 
+
 #include 
 
 /* FILETIME of Jan 1 1970 00:00:00, the PostgreSQL epoch */
@@ -40,59 +42,6 @@ static const unsigned __int64 epoch = UINT64CONST(1164447360);
 #define FILETIME_UNITS_PER_SEC	1000L
 #define FILETIME_UNITS_PER_USEC 10
 
-/*
- * Both GetSystemTimeAsFileTime and GetSystemTimePreciseAsFileTime share a
- * signature, so we can just store a pointer to whichever we find. This
- * is the pointer's type.
- */
-typedef VOID(WINAPI * PgGetSystemTimeFn) (LPFILETIME);
-
-/* One-time initializer function, must match that signature. */
-static void WINAPI init_gettimeofday(LPFILETIME lpSystemTimeAsFileTime);
-
-/* Storage for the function we pick at runtime */
-static PgGetSystemTimeFn pg_get_system_time = _gettimeofday;
-
-/*
- * One time initializer.  Determine whether GetSystemTimePreciseAsFileTime
- * is available and if so, plan to use it; if not, fall back to
- * GetSystemTimeAsFileTime.
- */
-static void WINAPI
-init_gettimeofday(LPFILETIME lpSystemTimeAsFileTime)
-{
-	/*
-	 * Because it's guaranteed that kernel32.dll will be linked into our
-	 * address space already, we don't need to LoadLibrary it and worry about
-	 * closing it afterwards, so we're not using Pg's dlopen/dlsym() wrapper.
-	 *
-	 * We'll just look up the address of GetSystemTimePreciseAsFileTime if
-	 * present.
-	 *
-	 * While we could look up the Windows version and skip this on Windows
-	 * versions below Windows 8 / Windows Server 2012 there isn't much point,
-	 * and determining the windows version is its self somewhat Windows
-	 * version and development SDK specific...
-	 */
-	pg_get_system_time = (PgGetSystemTimeFn) GetProcAddress(GetModuleHandle(TEXT("kernel32.dll")),
-			"GetSystemTimePreciseAsFileTime");
-	if (pg_get_system_time == NULL)
-	{
-		/*
-		 * The expected error from GetLastError() is ERROR_PROC_NOT_FOUND, if
-		 * the function isn't present. No other error should occur.
-		 *
-		 * We can't report an error here because this might be running in
-		 * frontend code; and even if we're in the backend, it's too early to
-		 * elog(...) if we get some unexpected error.  Also, it's not a
-		 * serious problem, so just silently fall back to
-		 * GetSystemTimeAsFileTime irrespective of why the failure occurred.
-		 */
-		pg_get_system_time = 
-	}
-
-	(*pg_get_system_time) (lpSystemTimeAsFileTime);
-}
 
 /*
  * timezone information is stored outside the kernel so tzp isn't used anymore.
@@ -101,12 +50,12 @@ init_gettimeofday(LPFILETIME lpSystemTimeAsFileTime)
  * elapsed_time().
  */
 int
-gettimeofday(struct timeval *tp, struct timezone *tzp)
+gettimeofday(struct timeval *tp, void *tzp)
 {
 	FILETIME	file_time;
 	ULARGE_INTEGER ularge;
 
-	(*pg_get_system_time) (_time);
+	GetSystemTimePreciseAsFileTime(_time);
 	ularge.LowPart = file_time.dwLowDateTime;
 	

Re: failing to build preproc.c on solaris with sun studio

2022-08-06 Thread Tom Lane
Andres Freund  writes:
> On 2022-08-06 16:09:24 -0700, Noah Misch wrote:
>> From the earliest days of wrasse, the compiler used too much RAM to build
>> preproc.o with --enable-debug.  As of 2021-04, the compiler's "acomp" phase
>> needed 10G in one process, and later phases needed 11.6G across two 
>> processes.
>> Compilation wrote 3.7G into TMPDIR.  Since /tmp consumes RAM+swap, overriding
>> TMPDIR relieved 3.7G of RAM pressure.  Even with those protections, wrasse
>> intermittently reaches the 14G limit I impose (via "ulimit -v 14680064").  I
>> had experimented with different optimization levels, but that didn't help.

> Yikes. And it's not like newer compiler versions are likely to be forthcoming
> (12.6 is newest and is from 2017...). Wonder if we should just require gcc on
> solaris... There's a decent amount of stuff we could rip out in that case.

Seems like it's only a matter of time before we add enough stuff to
the grammar that the build fails, period.

However, I wonder why exactly it's so large, and why the backend's gram.o
isn't an even bigger problem.  Maybe an effort to cut preproc.o's code
size could yield dividends?

FWIW, my late and unlamented animal gaur was also showing unhappiness with
the size of preproc.o, manifested as a boatload of warnings like
/var/tmp//cc0MHZPD.s:11594: Warning: .stabn: description field '109d3' too big, 
try a different debug format
which did not happen with gram.o.

Even on a modern Linux:

$ size src/backend/parser/gram.o
   textdata bss dec hex filename
 656568   0   0  656568   a04b8 src/backend/parser/gram.o
$ size src/interfaces/ecpg/preproc/preproc.o
   textdata bss dec hex filename
 912005 1887348  919541   e07f5 src/interfaces/ecpg/preproc/preproc.o

So there's something pretty bloated there.  It doesn't seem like
ecpg's additional productions should justify a nigh 50% code
size increase.

regards, tom lane




Re: Cleaning up historical portability baggage

2022-08-06 Thread Tom Lane
Thomas Munro  writes:
> So what about strtof?  That's gotta be dead code too.  I gather we
> still need commit 72880ac1's HAVE_BUGGY_STRTOF.  From a cursory glance
> at MinGW's implementation, it still has the complained-about
> behaviour, if I've understood the complaint, and if I'm looking at the
> right C runtime[1].

Looks plausible from here.

regards, tom lane




Re: Cleaning up historical portability baggage

2022-08-06 Thread Tom Lane
Thomas Munro  writes:
> On Sun, Aug 7, 2022 at 10:42 AM Tom Lane  wrote:
>> * There is a race condition for recently-forked children: they might not
>> * have executed setsid() yet.  So we signal the child directly as well as
>> * the group.  We assume such a child will handle the signal before trying
>> * to spawn any grandchild processes.  We also assume that signaling the
>> * child twice will not cause any problems.

> Oof.  Fixed.

Hmm ... it seems like these other callers have the same race condition.
StatementTimeoutHandler and LockTimeoutHandler account for that
correctly by issuing two kill()s, so how is it OK for pg_signal_backend
and TerminateOtherDBBackends not to?

It would likely be a good idea for all these places to mention that
they're doing that to avoid a race condition, and cross-reference
signal_child for details.  Or maybe we should promote signal_child
into a more widely used function?

regards, tom lane




Re: failing to build preproc.c on solaris with sun studio

2022-08-06 Thread Andres Freund
Hi,

On 2022-08-06 16:09:24 -0700, Noah Misch wrote:
> On Sat, Aug 06, 2022 at 02:07:24PM -0700, Andres Freund wrote:
> > I tried PG on the gcc compile farm solaris 11.31 host. When compiling with 
> > sun
> > studio I can build the backend etc, but preproc.c fails to compile:
> > 
> > ccache /opt/developerstudio12.6/bin/cc -m64 -Xa -g -v -O0 
> > -D_POSIX_PTHREAD_SEMANTICS -mt -D_REENTRANT -D_THREAD_SAFE -I../include 
> > -I../../../../src/interfaces/ecpg/include -I. -I. 
> > -I../../../../src/interfaces/ecpg/ecpglib 
> > -I../../../../src/interfaces/libpq -I../../../../src/include  
> > -D_POSIX_PTHREAD_SEMANTICS   -c -o preproc.o preproc.c
> > Assertion failed: hmap_size (phdl->fb.map) == 0, file 
> > ../src/line_num_internal.c, line 230, function twolist_proc_clear
> > Assertion failed: hmap_size (phdl->fb.map) == 0, file 
> > ../src/line_num_internal.c, line 230, function twolist_proc_clear
> > cc: Fatal error in /opt/developerstudio12.6/lib/compilers/bin/acomp
> > cc: Status 134
> > 
> > the assertion is just a consequence of running out of memory, I believe, 
> > acomp
> > is well over 20GB at that point.
> > 
> > However I see that wrasse doesn't seem to have that problem. Which leaves 
> > me a
> > bit confused, because I think that's the same machine and compiler binary.
> > 
> > Noah, did you encounter this before / do anything to avoid this?
> 
> Yes.  Drop --enable-debug, and override TMPDIR to some disk-backed location.

Thanks - that indeed helped...


> From the earliest days of wrasse, the compiler used too much RAM to build
> preproc.o with --enable-debug.  As of 2021-04, the compiler's "acomp" phase
> needed 10G in one process, and later phases needed 11.6G across two processes.
> Compilation wrote 3.7G into TMPDIR.  Since /tmp consumes RAM+swap, overriding
> TMPDIR relieved 3.7G of RAM pressure.  Even with those protections, wrasse
> intermittently reaches the 14G limit I impose (via "ulimit -v 14680064").  I
> had experimented with different optimization levels, but that didn't help.

Yikes. And it's not like newer compiler versions are likely to be forthcoming
(12.6 is newest and is from 2017...). Wonder if we should just require gcc on
solaris... There's a decent amount of stuff we could rip out in that case.

I was trying to build on solaris because I was seeing if we could get rid of
with_gnu_ld, motivated by making the meson build generate a working
Makefile.global for pgxs' benefit.

Greetings,

Andres Freund




Re: Cleaning up historical portability baggage

2022-08-06 Thread Thomas Munro
On Sat, Aug 6, 2022 at 9:08 AM Andres Freund  wrote:
> [stuff about strtoll, strtoull]

So what about strtof?  That's gotta be dead code too.  I gather we
still need commit 72880ac1's HAVE_BUGGY_STRTOF.  From a cursory glance
at MinGW's implementation, it still has the complained-about
behaviour, if I've understood the complaint, and if I'm looking at the
right C runtime[1].  But then our code says:

 * Test results on Mingw suggest that it has the same problem, though looking
 * at the code I can't figure out why.

... so which code was that referring to then?  I'm not up to speed on
how many C runtime libraries there are and how they are selected on
MSYS (I mean, the closest I've ever got to this system is flinging
patches at it on CI using Melih's patch, which, incidentally, I just
tested the attached with and it passed[2]).

[1] https://github.com/mirror/mingw-w64/blob/master/mingw-w64-crt/stdio/strtof.c
[2] https://github.com/macdice/postgres/runs/7708082971
From 6bab84477090951ad0553c16069406718770d6a4 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sun, 7 Aug 2022 10:51:39 +1200
Subject: [PATCH] Simplify replacement code for strtof.

strtof() is in C99 and all targeted systems have it.  We can remove the
configure probe and some dead code, but we still need replacement code
for a couple of systems that have known buggy implementations selected
via platform template.

diff --git a/configure b/configure
index 0e73edb9ff..5ee7c2c1a2 100755
--- a/configure
+++ b/configure
@@ -16734,19 +16734,6 @@ esac
 
 fi
 
-ac_fn_c_check_func "$LINENO" "strtof" "ac_cv_func_strtof"
-if test "x$ac_cv_func_strtof" = xyes; then :
-  $as_echo "#define HAVE_STRTOF 1" >>confdefs.h
-
-else
-  case " $LIBOBJS " in
-  *" strtof.$ac_objext "* ) ;;
-  *) LIBOBJS="$LIBOBJS strtof.$ac_objext"
- ;;
-esac
-
-fi
-
 
 
 if test "$enable_thread_safety" = yes; then
@@ -16770,8 +16757,7 @@ if test "$PORTNAME" = "win32" -o "$PORTNAME" = "cygwin"; then
 	# Cygwin and (apparently, based on test results) Mingw both
 	# have a broken strtof(), so substitute its implementation.
 	# That's not a perfect fix, since it doesn't avoid double-rounding,
-	# but we have no better options. To get that, though, we have to
-	# force the file to be compiled despite HAVE_STRTOF.
+	# but we have no better options.
 	case " $LIBOBJS " in
   *" strtof.$ac_objext "* ) ;;
   *) LIBOBJS="$LIBOBJS strtof.$ac_objext"
diff --git a/configure.ac b/configure.ac
index efd3be91cd..be52e17ea7 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1874,7 +1874,6 @@ AC_REPLACE_FUNCS(m4_normalize([
 	strlcat
 	strlcpy
 	strnlen
-	strtof
 ]))
 
 if test "$enable_thread_safety" = yes; then
@@ -1885,8 +1884,7 @@ if test "$PORTNAME" = "win32" -o "$PORTNAME" = "cygwin"; then
 	# Cygwin and (apparently, based on test results) Mingw both
 	# have a broken strtof(), so substitute its implementation.
 	# That's not a perfect fix, since it doesn't avoid double-rounding,
-	# but we have no better options. To get that, though, we have to
-	# force the file to be compiled despite HAVE_STRTOF.
+	# but we have no better options.
 	AC_LIBOBJ([strtof])
 	AC_MSG_NOTICE([On $host_os we will use our strtof wrapper.])
 fi
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 2d9a1cdc8a..c243a906c9 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -460,9 +460,6 @@
 /* Define to 1 if you have the `strsignal' function. */
 #undef HAVE_STRSIGNAL
 
-/* Define to 1 if you have the `strtof' function. */
-#undef HAVE_STRTOF
-
 /* Define to 1 if the system has the type `struct addrinfo'. */
 #undef HAVE_STRUCT_ADDRINFO
 
diff --git a/src/include/port.h b/src/include/port.h
index ad76384fb1..cec41eae71 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -387,10 +387,6 @@ extern int	getpeereid(int sock, uid_t *uid, gid_t *gid);
 extern void explicit_bzero(void *buf, size_t len);
 #endif
 
-#ifndef HAVE_STRTOF
-extern float strtof(const char *nptr, char **endptr);
-#endif
-
 #ifdef HAVE_BUGGY_STRTOF
 extern float pg_strtof(const char *nptr, char **endptr);
 #define strtof(a,b) (pg_strtof((a),(b)))
diff --git a/src/port/strtof.c b/src/port/strtof.c
index 314fcc9851..21b3f8f712 100644
--- a/src/port/strtof.c
+++ b/src/port/strtof.c
@@ -16,43 +16,7 @@
 #include 
 #include 
 
-#ifndef HAVE_STRTOF
-/*
- * strtof() is part of C99; this version is only for the benefit of obsolete
- * platforms. As such, it is known to return incorrect values for edge cases,
- * which have to be allowed for in variant files for regression test results
- * for any such platform.
- */
-
-float
-strtof(const char *nptr, char **endptr)
-{
-	int			caller_errno = errno;
-	double		dresult;
-	float		fresult;
-
-	errno = 0;
-	dresult = strtod(nptr, endptr);
-	fresult = (float) dresult;
 
-	if (errno == 0)
-	{
-		/*
-		 * Value might be in-range for double but not float.
-		 */
-		if (dresult != 0 && fresult == 0)
-			caller_errno = ERANGE;	/* underflow */
-		if (!isinf(dresult) && 

Re: Cleaning up historical portability baggage

2022-08-06 Thread Thomas Munro
On Sun, Aug 7, 2022 at 10:42 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > Did I understand correctly that the places that do kill(-pid) followed
> > by kill(pid) really only need the kill(-pid)?
>
> Uh ... did you read the comment right above signal_child?
>
>  * There is a race condition for recently-forked children: they might not
>  * have executed setsid() yet.  So we signal the child directly as well as
>  * the group.  We assume such a child will handle the signal before trying
>  * to spawn any grandchild processes.  We also assume that signaling the
>  * child twice will not cause any problems.

Oof.  Fixed.
From 7ec1aa38f30e32734eb77a6dc6448d332179dfa3 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sun, 7 Aug 2022 09:37:34 +1200
Subject: [PATCH v2] Simplify conditional code for process groups.

Teach our replacement kill() function for Windows to ignore process
groups and send directly to a single pid instead.  Now we can drop a
bunch of conditional code at call sites, and the vestigial HAVE_SETSID
macro.

While here, rename src/port/kill.c to win32kill.c, following our
convention for Windows-only fallback code.

Suggested-by: Robert Haas 
Reviewed-by: Tom Lane 
Discussion: https://postgr.es/m/CA%2BTgmob_5AUNCzyFGJX6quYSnQnKCHW6DGGJa1noofJqSu%2Bweg%40mail.gmail.com
---
 configure | 12 ++--
 configure.ac  |  2 +-
 src/backend/postmaster/postmaster.c   |  2 --
 src/backend/storage/ipc/procarray.c   |  9 +
 src/backend/storage/ipc/signalfuncs.c |  6 +-
 src/backend/utils/init/miscinit.c |  2 +-
 src/backend/utils/init/postinit.c |  4 
 src/bin/pg_ctl/pg_ctl.c   |  2 +-
 src/include/port.h|  1 -
 src/include/port/win32_port.h |  2 +-
 src/port/{kill.c => win32kill.c}  | 16 +---
 src/tools/msvc/Mkvcbuild.pm   |  3 ++-
 12 files changed, 23 insertions(+), 38 deletions(-)
 rename src/port/{kill.c => win32kill.c} (88%)

diff --git a/configure b/configure
index 0e73edb9ff..6ea8051c9c 100755
--- a/configure
+++ b/configure
@@ -16888,12 +16888,6 @@ esac
  ;;
 esac
 
-  case " $LIBOBJS " in
-  *" kill.$ac_objext "* ) ;;
-  *) LIBOBJS="$LIBOBJS kill.$ac_objext"
- ;;
-esac
-
   case " $LIBOBJS " in
   *" open.$ac_objext "* ) ;;
   *) LIBOBJS="$LIBOBJS open.$ac_objext"
@@ -16930,6 +16924,12 @@ esac
  ;;
 esac
 
+  case " $LIBOBJS " in
+  *" win32kill.$ac_objext "* ) ;;
+  *) LIBOBJS="$LIBOBJS win32kill.$ac_objext"
+ ;;
+esac
+
   case " $LIBOBJS " in
   *" win32link.$ac_objext "* ) ;;
   *) LIBOBJS="$LIBOBJS win32link.$ac_objext"
diff --git a/configure.ac b/configure.ac
index efd3be91cd..127e4ce925 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1926,13 +1926,13 @@ if test "$PORTNAME" = "win32"; then
   AC_CHECK_FUNCS(_configthreadlocale)
   AC_LIBOBJ(dirmod)
   AC_LIBOBJ(getrusage)
-  AC_LIBOBJ(kill)
   AC_LIBOBJ(open)
   AC_LIBOBJ(system)
   AC_LIBOBJ(win32dlopen)
   AC_LIBOBJ(win32env)
   AC_LIBOBJ(win32error)
   AC_LIBOBJ(win32fdatasync)
+  AC_LIBOBJ(win32kill)
   AC_LIBOBJ(win32link)
   AC_LIBOBJ(win32ntdll)
   AC_LIBOBJ(win32pread)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 81cb585891..f8bd6285b8 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -4043,7 +4043,6 @@ signal_child(pid_t pid, int signal)
 {
 	if (kill(pid, signal) < 0)
 		elog(DEBUG3, "kill(%ld,%d) failed: %m", (long) pid, signal);
-#ifdef HAVE_SETSID
 	switch (signal)
 	{
 		case SIGINT:
@@ -4057,7 +4056,6 @@ signal_child(pid_t pid, int signal)
 		default:
 			break;
 	}
-#endif
 }
 
 /*
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 0555b02a8d..950ecc764b 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -3865,15 +3865,8 @@ TerminateOtherDBBackends(Oid databaseId)
 
 			if (proc != NULL)
 			{
-/*
- * If we have setsid(), signal the backend's whole process
- * group
- */
-#ifdef HAVE_SETSID
+/* Signal the backend's whole process group. */
 (void) kill(-pid, SIGTERM);
-#else
-(void) kill(pid, SIGTERM);
-#endif
 			}
 		}
 	}
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index 6e310b14eb..13e691ec92 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -92,12 +92,8 @@ pg_signal_backend(int pid, int sig)
 	 * too unlikely to worry about.
 	 */
 
-	/* If we have setsid(), signal the backend's whole process group */
-#ifdef HAVE_SETSID
+	/* Signal the backend's whole process group */
 	if (kill(-pid, sig))
-#else
-	if (kill(pid, sig))
-#endif
 	{
 		/* Again, just a warning to allow loops */
 		ereport(WARNING,
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index bd973ba613..c65b596d1b 100644
--- a/src/backend/utils/init/miscinit.c
+++ 

Re: Cleaning up historical portability baggage

2022-08-06 Thread Tom Lane
Thomas Munro  writes:
> I also wonder if half the stuff in win32gettimeofday.c can be deleted.
> From some light googling, it looks like
> GetSystemTimePreciseAsFileTime() can just be called directly on
> Windows 8+ (and we now require 10+), and that kernel32.dll malarky was
> for older systems?

Yeah, Microsoft's man page for it just says to include sysinfoapi.h
(which we aren't) and then it should work on supported versions.

regards, tom lane




Re: Cleaning up historical portability baggage

2022-08-06 Thread Thomas Munro
On Sun, Aug 7, 2022 at 11:08 AM Tom Lane  wrote:
> Andres Freund  writes:
> > Thanks. Next in my quest for reducing autoconf vs meson pg_config.h
> > differences is GETTIMEOFDAY stuff.
>
> I just noticed that this could be simplified:
>
> #ifdef _MSC_VER
> struct timezone;
> /* Last parameter not used */
> extern int  gettimeofday(struct timeval *tp, struct timezone *tzp);
> #endif
>
> because what POSIX actually says is
>
> int gettimeofday(struct timeval *restrict tp, void *restrict tzp);
>
> and
>
> If tzp is not a null pointer, the behavior is unspecified.
>
> Indeed, we never call it with anything but a null tzp value,
> nor does our Windows fallback implementation do anything with tzp.
>
> So ISTM we should drop the bogus "struct timezone;" and declare
> this parameter as "void *tzp".

I also wonder if half the stuff in win32gettimeofday.c can be deleted.
>From some light googling, it looks like
GetSystemTimePreciseAsFileTime() can just be called directly on
Windows 8+ (and we now require 10+), and that kernel32.dll malarky was
for older systems?




Re: Making autovacuum logs indicate if insert-based threshold was the triggering condition

2022-08-06 Thread Peter Geoghegan
On Sat, Aug 6, 2022 at 3:51 PM Justin Pryzby  wrote:
> > Well, autovacuum.c should have (and/or kind of already has) 3
> > different triggering conditions. These are mutually exclusive
> > conditions -- technically autovacuum.c always launches an autovacuum
> > against a table because exactly 1 of the 3 thresholds were crossed.
>
> The issue being that both thresholds can be crossed:
>
> >> 2022-08-06 16:47:47.674 CDT autovacuum worker[12707] DEBUG:  t: VAC: 9 
> >> (THRESHOLD 50), INS: 9 (THRESHOLD 1000), anl: 18 (threshold 50)

What are the chances that both thresholds will be crossed at *exactly*
(not approximately) the same time in a real world case, where the
table isn't tiny (tiny relative to the "autovacuum_naptime quantum")?
This is a very narrow case.

Besides, the same can already be said with how autovacuum.c crosses
the XID-based antiwraparound threshold. Yet we still arbitrarily
report that it's antiwraparound in the logs, which (at least right
now) is generally assumed to mostly be about advancing relfrozenxid.
(Or maybe it's the other way around; it doesn't matter.)

It might make sense to *always* show how close we were to hitting each
of the thresholds, including the ones that we didn't end up hitting
(we may come pretty close quite often, which seems like it might
matter). But showing multiple conditions together just because the
planets aligned (we hit multiple thresholds together) emphasizes the
low-level mechanism, which is pretty far removed from anything that
matters. You might as well pick either threshold at random once this
happens -- even an expert won't be able to tell the difference.

-- 
Peter Geoghegan




Re: failing to build preproc.c on solaris with sun studio

2022-08-06 Thread Noah Misch
On Sat, Aug 06, 2022 at 02:07:24PM -0700, Andres Freund wrote:
> I tried PG on the gcc compile farm solaris 11.31 host. When compiling with sun
> studio I can build the backend etc, but preproc.c fails to compile:
> 
> ccache /opt/developerstudio12.6/bin/cc -m64 -Xa -g -v -O0 
> -D_POSIX_PTHREAD_SEMANTICS -mt -D_REENTRANT -D_THREAD_SAFE -I../include 
> -I../../../../src/interfaces/ecpg/include -I. -I. 
> -I../../../../src/interfaces/ecpg/ecpglib -I../../../../src/interfaces/libpq 
> -I../../../../src/include  -D_POSIX_PTHREAD_SEMANTICS   -c -o preproc.o 
> preproc.c
> Assertion failed: hmap_size (phdl->fb.map) == 0, file 
> ../src/line_num_internal.c, line 230, function twolist_proc_clear
> Assertion failed: hmap_size (phdl->fb.map) == 0, file 
> ../src/line_num_internal.c, line 230, function twolist_proc_clear
> cc: Fatal error in /opt/developerstudio12.6/lib/compilers/bin/acomp
> cc: Status 134
> 
> the assertion is just a consequence of running out of memory, I believe, acomp
> is well over 20GB at that point.
> 
> However I see that wrasse doesn't seem to have that problem. Which leaves me a
> bit confused, because I think that's the same machine and compiler binary.
> 
> Noah, did you encounter this before / do anything to avoid this?

Yes.  Drop --enable-debug, and override TMPDIR to some disk-backed location.

>From the earliest days of wrasse, the compiler used too much RAM to build
preproc.o with --enable-debug.  As of 2021-04, the compiler's "acomp" phase
needed 10G in one process, and later phases needed 11.6G across two processes.
Compilation wrote 3.7G into TMPDIR.  Since /tmp consumes RAM+swap, overriding
TMPDIR relieved 3.7G of RAM pressure.  Even with those protections, wrasse
intermittently reaches the 14G limit I impose (via "ulimit -v 14680064").  I
had experimented with different optimization levels, but that didn't help.




Re: Cleaning up historical portability baggage

2022-08-06 Thread Tom Lane
Andres Freund  writes:
> Thanks. Next in my quest for reducing autoconf vs meson pg_config.h
> differences is GETTIMEOFDAY stuff.

I just noticed that this could be simplified:

#ifdef _MSC_VER
struct timezone;
/* Last parameter not used */
extern int  gettimeofday(struct timeval *tp, struct timezone *tzp);
#endif

because what POSIX actually says is

int gettimeofday(struct timeval *restrict tp, void *restrict tzp);

and

If tzp is not a null pointer, the behavior is unspecified.

Indeed, we never call it with anything but a null tzp value,
nor does our Windows fallback implementation do anything with tzp.

So ISTM we should drop the bogus "struct timezone;" and declare
this parameter as "void *tzp".

regards, tom lane




Re: Making autovacuum logs indicate if insert-based threshold was the triggering condition

2022-08-06 Thread Justin Pryzby
On Sat, Aug 06, 2022 at 03:41:57PM -0700, Peter Geoghegan wrote:
> > > Note that a VACUUM that is an "automatic vacuum for inserted tuples" 
> > > cannot
> > > [...] also be a "regular" autovacuum/VACUUM
> >
> > Why not ?

I think maybe you missed my intent in trimming the "anti-wraparound" part of
your text.

My point was concerning your statement that "autovacuum for inserted tuples ..
cannot also be a regular autovacuum" (meaning triggered by dead tuples).

> Well, autovacuum.c should have (and/or kind of already has) 3
> different triggering conditions. These are mutually exclusive
> conditions -- technically autovacuum.c always launches an autovacuum
> against a table because exactly 1 of the 3 thresholds were crossed.

The issue being that both thresholds can be crossed:

>> 2022-08-06 16:47:47.674 CDT autovacuum worker[12707] DEBUG:  t: VAC: 9 
>> (THRESHOLD 50), INS: 9 (THRESHOLD 1000), anl: 18 (threshold 50)

-- 
Justin




Re: Making autovacuum logs indicate if insert-based threshold was the triggering condition

2022-08-06 Thread Peter Geoghegan
On Sat, Aug 6, 2022 at 2:50 PM Justin Pryzby  wrote:
> This sounded familiar, and it seems like I anticipated that it might be an
> issue.  Here, I was advocating for the new insert-based GUCs to default to -1,
> to have insert-based autovacuum fall back to the thresholds specified by the
> pre-existing GUCs (20% + 50), which would (in my proposal) remain be the 
> normal
> way to tune any type of vacuum.
>
> https://www.postgresql.org/message-id/20200317233218.gd26...@telsasoft.com
>
> I haven't heard of anyone who had trouble setting the necessary GUC, but I'm
> not surprised if most postgres installations are running versions before 13.

ISTM that having insert-based triggering conditions is definitely a
good idea, but what we have right now still has problems. It currently
won't work very well unless the user goes out of their way to tune
freezing to do the right thing. Typically we miss out on the
opportunity to freeze early, because without sophisticated
intervention from the user there is only a slim chance of *any*
freezing taking place outside of the inevitable antiwraparound
autovacuum.

> > Note that a VACUUM that is an "automatic vacuum for inserted tuples" cannot
> > [...] also be a "regular" autovacuum/VACUUM
>
> Why not ?

Well, autovacuum.c should have (and/or kind of already has) 3
different triggering conditions. These are mutually exclusive
conditions -- technically autovacuum.c always launches an autovacuum
against a table because exactly 1 of the 3 thresholds were crossed. My
patch makes sure that it always gives exactly one reason why
autovacuum.c decided to VACUUM, so by definition there is only one
relevant piece of information for vacuumlazy.c to report in the log.
That's fairly simple and high level, and presumably something that
users won't have much trouble understanding.

Right now antiwraparound autovacuum "implies aggressive", in that it
almost always makes vacuumlazy.c use aggressive mode, but this seems
totally arbitrary to me -- they don't have to be virtually synonymous.
I think that antiwraparound autovacuum could even be rebranded as "an
autovacuum that takes place because the table hasn't had one in a long
time". This is much less scary, and makes it clearer that autovacuum.c
shouldn't be expected to really understand what will turn out to be
important "at runtime". That's the time to make important decisions
about what work to do -- when we actually have accurate information.

My antiwraparound example is just that: an example. There is a broader
idea: we shouldn't be too confident that the exact triggering
condition autovacuum.c applied to launch an autovacuum worker turns
out to be the best reason to VACUUM, or even a good reason --
vacuumlazy.c should be able to cope with that. The user is kept in the
loop about both, by reporting the triggering condition and the details
of what really happened at runtime. Maybe lazyvacuum.c can be taught
to speed up and slow down based on the conditions it observes as it
scans the heap -- there are many possibilities.

This broader idea is pretty much what you were getting at with your
example, I think.

-- 
Peter Geoghegan




Re: Cleaning up historical portability baggage

2022-08-06 Thread Tom Lane
Thomas Munro  writes:
> Did I understand correctly that the places that do kill(-pid) followed
> by kill(pid) really only need the kill(-pid)?

Uh ... did you read the comment right above signal_child?

 * There is a race condition for recently-forked children: they might not
 * have executed setsid() yet.  So we signal the child directly as well as
 * the group.  We assume such a child will handle the signal before trying
 * to spawn any grandchild processes.  We also assume that signaling the
 * child twice will not cause any problems.

It might be that this is wrong and signaling -pid will work even if
the child hasn't yet done setsid(), but I doubt it: the kill(2) man
page is pretty clear that it'll fail if "the process group doesn't
exist".

Perhaps we could finesse that by signaling -pid first, and then
signaling pid if that fails, but offhand it seems like that has
the described race condition w.r.t. grandchild processes.

regards, tom lane




Re: Cleaning up historical portability baggage

2022-08-06 Thread Thomas Munro
On Sat, Aug 6, 2022 at 2:54 AM Robert Haas  wrote:
> On Fri, Aug 5, 2022 at 10:48 AM Tom Lane  wrote:
> > Hmm ... I agree with you that the end result could be nicer code,
> > but what's making it nicer is a pretty substantial amount of human
> > effort for each and every call site.  Is anybody stepping forward
> > to put in that amount of work?
> >
> > My proposal is to leave the call sites alone until someone feels
> > like doing that sort of detail work.
>
> My plan was to nerd-snipe Thomas Munro into doing it.[1]

Alright, well here's a patch for the setsid() stuff following Robert's
plan, which I think is a pretty good plan.

Did I understand correctly that the places that do kill(-pid) followed
by kill(pid) really only need the kill(-pid)?

I checked that user processes should never have pid 0 (that's a
special system idle process) or 1 (because they're always even,
actually it looks like they're pointers in kernel space or something
like that), since those wouldn't play nice with the coding I used
here.

I note that Windows actually *does* have process groups (in one of the
CI threads, we learned that there were at least two concepts like
that).  Some of our fake signals turn into messages to pipes, and
others turn into process termination, and in theory we could probably
also take advantage of Windows' support for its native "control C" and
"control break" signals here.  It's possible that someone could do
something to make all but the pipe ones propagate to real Windows
process groups.  That might be good for things like nuking archiver
subprocesses and the like when taking out the backend, or something
like that, but I'm not planning to look into that myself.
From f3c224082736605f373b4c34e661b6a8d26846c0 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sun, 7 Aug 2022 09:37:34 +1200
Subject: [PATCH] Simplify conditional code for process groups.

Teach our replacement kill() function for Windows to ignore process
groups and send directly to a single pid instead.  Now we can drop a
bunch of conditional code at call sites, and the vestigial HAVE_SETSID
macro.

While here, rename src/port/kill.c to win32kill.c, following our
convention for Windows-only fallback code.

Suggested-by: Robert Haas 
Discussion: https://postgr.es/m/CA%2BTgmob_5AUNCzyFGJX6quYSnQnKCHW6DGGJa1noofJqSu%2Bweg%40mail.gmail.com
---
 configure | 12 ++--
 configure.ac  |  2 +-
 src/backend/postmaster/postmaster.c   |  9 +++--
 src/backend/storage/ipc/procarray.c   |  9 +
 src/backend/storage/ipc/signalfuncs.c |  6 +-
 src/backend/utils/init/miscinit.c |  2 +-
 src/backend/utils/init/postinit.c |  6 --
 src/bin/pg_ctl/pg_ctl.c   |  2 +-
 src/include/port.h|  1 -
 src/port/{kill.c => win32kill.c}  | 16 +---
 src/tools/msvc/Mkvcbuild.pm   |  3 ++-
 11 files changed, 25 insertions(+), 43 deletions(-)
 rename src/port/{kill.c => win32kill.c} (88%)

diff --git a/configure b/configure
index 0e73edb9ff..6ea8051c9c 100755
--- a/configure
+++ b/configure
@@ -16888,12 +16888,6 @@ esac
  ;;
 esac
 
-  case " $LIBOBJS " in
-  *" kill.$ac_objext "* ) ;;
-  *) LIBOBJS="$LIBOBJS kill.$ac_objext"
- ;;
-esac
-
   case " $LIBOBJS " in
   *" open.$ac_objext "* ) ;;
   *) LIBOBJS="$LIBOBJS open.$ac_objext"
@@ -16930,6 +16924,12 @@ esac
  ;;
 esac
 
+  case " $LIBOBJS " in
+  *" win32kill.$ac_objext "* ) ;;
+  *) LIBOBJS="$LIBOBJS win32kill.$ac_objext"
+ ;;
+esac
+
   case " $LIBOBJS " in
   *" win32link.$ac_objext "* ) ;;
   *) LIBOBJS="$LIBOBJS win32link.$ac_objext"
diff --git a/configure.ac b/configure.ac
index efd3be91cd..127e4ce925 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1926,13 +1926,13 @@ if test "$PORTNAME" = "win32"; then
   AC_CHECK_FUNCS(_configthreadlocale)
   AC_LIBOBJ(dirmod)
   AC_LIBOBJ(getrusage)
-  AC_LIBOBJ(kill)
   AC_LIBOBJ(open)
   AC_LIBOBJ(system)
   AC_LIBOBJ(win32dlopen)
   AC_LIBOBJ(win32env)
   AC_LIBOBJ(win32error)
   AC_LIBOBJ(win32fdatasync)
+  AC_LIBOBJ(win32kill)
   AC_LIBOBJ(win32link)
   AC_LIBOBJ(win32ntdll)
   AC_LIBOBJ(win32pread)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 81cb585891..c50e07bf2b 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -4041,9 +4041,6 @@ PostmasterStateMachine(void)
 static void
 signal_child(pid_t pid, int signal)
 {
-	if (kill(pid, signal) < 0)
-		elog(DEBUG3, "kill(%ld,%d) failed: %m", (long) pid, signal);
-#ifdef HAVE_SETSID
 	switch (signal)
 	{
 		case SIGINT:
@@ -4051,13 +4048,13 @@ signal_child(pid_t pid, int signal)
 		case SIGQUIT:
 		case SIGSTOP:
 		case SIGKILL:
-			if (kill(-pid, signal) < 0)
-elog(DEBUG3, "kill(%ld,%d) failed: %m", (long) (-pid), signal);
+			pid = -pid;
 			break;
 		default:
 			break;
 	}
-#endif
+	if (kill(pid, signal) < 0)
+		elog(DEBUG3, "kill(%ld,%d) failed: %m", (long) pid, signal);
 }
 
 /*
diff --git 

Re: Making autovacuum logs indicate if insert-based threshold was the triggering condition

2022-08-06 Thread Justin Pryzby
On Sat, Aug 06, 2022 at 01:03:57PM -0700, Peter Geoghegan wrote:
> thresholds. It may be far from obvious which triggering condition
> autovacuum.c must have applied to trigger any given autovacuum, since
> that information isn't currently passed down to lazyvacuum.c. This
> seems like a problem to me; how are users supposed to tune
> autovacuum's thresholds without even basic feedback about how the
> thresholds get applied?

+1

This sounded familiar, and it seems like I anticipated that it might be an
issue.  Here, I was advocating for the new insert-based GUCs to default to -1,
to have insert-based autovacuum fall back to the thresholds specified by the
pre-existing GUCs (20% + 50), which would (in my proposal) remain be the normal
way to tune any type of vacuum.

https://www.postgresql.org/message-id/20200317233218.gd26...@telsasoft.com

I haven't heard of anyone who had trouble setting the necessary GUC, but I'm
not surprised if most postgres installations are running versions before 13.

> Note that a VACUUM that is an "automatic vacuum for inserted tuples" cannot
> [...] also be a "regular" autovacuum/VACUUM

Why not ?

$ ./tmp_install/usr/local/pgsql/bin/postgres -D src/test/regress/tmp_check/data 
-c log_min_messages=debug3 -c autovacuum_naptime=9s&
DROP TABLE t; CREATE TABLE t (i int); INSERT INTO t SELECT 
generate_series(1,9); DELETE FROM t; INSERT INTO t SELECT 
generate_series(1,9);

2022-08-06 16:47:47.674 CDT autovacuum worker[12707] DEBUG:  t: vac: 9 
(threshold 50), ins: 9 (threshold 1000), anl: 18 (threshold 50)

-- 
Justin




Re: optimize lookups in snapshot [sub]xip arrays

2022-08-06 Thread Nathan Bossart
On Sat, Aug 06, 2022 at 11:13:26AM -0700, Nathan Bossart wrote:
> On Fri, Aug 05, 2022 at 03:04:34PM -0700, Andres Freund wrote:
>> But mainly I'd expect to find a difference if the SIMD code were optimized a
>> further on the basis of not needing to return the offset. E.g. by
>> replacing _mm_packs_epi32 with _mm_or_si128, that's cheaper.
> 
> I haven't been able to find a significant difference between the two.  If
> anything, the _mm_packs_epi* approach actually seems to be slightly faster
> in some cases.  For something marginally more concrete, I compared the two
> in perf-top and saw the following for the relevant instructions:

Nevermind, I'm wrong.  When compiled with -O2, it uses more than just the
xmm0 and xmm1 registers, and the _mm_or_si128 approach consistently shows a
speedup of slightly more than 5%.  Patches attached.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From bdfab91f1f2fa647d1b8a888dd6f8ad61ab80523 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 3 Aug 2022 09:49:04 -0700
Subject: [PATCH v8 1/2] Introduce optimized routine for linear searches
 through an array of integers.

If SSE2 is available, this function uses it to speed up the search.  Otherwise,
it uses a simple 'for' loop.  This is a prerequisite for a follow-up commit
that will use this function to optimize [sub]xip lookups in
XidInMVCCSnapshot(), but it can be used anywhere that might benefit from such
an optimization.

It might be worthwhile to add an ARM-specific code path to this function in the
future.

Author: Nathan Bossart
Reviewed by: Andres Freund, John Naylor
Discussion: https://postgr.es/m/20220713170950.GA3116318%40nathanxps13
---
 src/include/port/pg_lfind.h | 70 +
 1 file changed, 70 insertions(+)
 create mode 100644 src/include/port/pg_lfind.h

diff --git a/src/include/port/pg_lfind.h b/src/include/port/pg_lfind.h
new file mode 100644
index 00..8a212cc06b
--- /dev/null
+++ b/src/include/port/pg_lfind.h
@@ -0,0 +1,70 @@
+/*-
+ *
+ * pg_lfind.h
+ *	  Optimized linear search routines.
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/port/pg_lfind.h
+ *
+ *-
+ */
+#ifndef PG_LFIND_H
+#define PG_LFIND_H
+
+#include "port/simd.h"
+
+/*
+ * pg_lfind32
+ *
+ * Returns true if there is an element in 'base' that equals 'key'.  Otherwise,
+ * returns false.
+ */
+static inline bool
+pg_lfind32(uint32 key, uint32 *base, uint32 nelem)
+{
+	uint32		i = 0;
+
+	/* If possible, use SSE2 intrinsics to speed up the search. */
+#ifdef USE_SSE2
+	__m128i		keys = _mm_set1_epi32(key);	/* load 4 copies of key */
+	uint32		iterations = nelem & ~0xF;	/* round down to multiple of 16 */
+
+	for (; i < iterations; i += 16)
+	{
+		/* load the next 16 values into __m128i variables */
+		__m128i vals1 = _mm_loadu_si128((__m128i *) [i]);
+		__m128i vals2 = _mm_loadu_si128((__m128i *) [i + 4]);
+		__m128i vals3 = _mm_loadu_si128((__m128i *) [i + 8]);
+		__m128i vals4 = _mm_loadu_si128((__m128i *) [i + 12]);
+
+		/* perform the comparisons */
+		__m128i result1 = _mm_cmpeq_epi32(keys, vals1);
+		__m128i result2 = _mm_cmpeq_epi32(keys, vals2);
+		__m128i result3 = _mm_cmpeq_epi32(keys, vals3);
+		__m128i result4 = _mm_cmpeq_epi32(keys, vals4);
+
+		/* shrink the results into a single variable */
+		__m128i tmp1 = _mm_or_si128(result1, result2);
+		__m128i tmp2 = _mm_or_si128(result3, result4);
+		__m128i result = _mm_or_si128(tmp1, tmp2);
+
+		/* see if there was a match */
+		if (_mm_movemask_epi8(result) != 0)
+			return true;
+	}
+#endif
+
+	/* Process the remaining elements the slow way. */
+	for (; i < nelem; i++)
+	{
+		if (key == base[i])
+			return true;
+	}
+
+	return false;
+}
+
+#endif			/* PG_LFIND_H */
-- 
2.25.1

>From 826cc190c9ad76558adc41f1cbee76fc47e78c15 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 3 Aug 2022 09:59:28 -0700
Subject: [PATCH v8 2/2] Optimize linear searches in XidInMVCCSnapshot().

This change makes use of the recently-introduced optimized linear search
routine to speed up searches through the [sub]xip arrays when possible, which
should improve performance significantly when the arrays are large.

Author: Nathan Bossart
Reviewed by: Andres Freund, John Naylor
Discussion: https://postgr.es/m/20220713170950.GA3116318%40nathanxps13
---
 src/backend/utils/time/snapmgr.c | 28 +++-
 1 file changed, 7 insertions(+), 21 deletions(-)

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 5bc2a15160..9b504c9745 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -56,6 +56,7 @@
 #include "datatype/timestamp.h"
 #include "lib/pairingheap.h"
 #include "miscadmin.h"
+#include "port/pg_lfind.h"
 #include "storage/predicate.h"
 #include 

failing to build preproc.c on solaris with sun studio

2022-08-06 Thread Andres Freund
Hi,

I tried PG on the gcc compile farm solaris 11.31 host. When compiling with sun
studio I can build the backend etc, but preproc.c fails to compile:

ccache /opt/developerstudio12.6/bin/cc -m64 -Xa -g -v -O0 
-D_POSIX_PTHREAD_SEMANTICS -mt -D_REENTRANT -D_THREAD_SAFE -I../include 
-I../../../../src/interfaces/ecpg/include -I. -I. 
-I../../../../src/interfaces/ecpg/ecpglib -I../../../../src/interfaces/libpq 
-I../../../../src/include  -D_POSIX_PTHREAD_SEMANTICS   -c -o preproc.o 
preproc.c
Assertion failed: hmap_size (phdl->fb.map) == 0, file 
../src/line_num_internal.c, line 230, function twolist_proc_clear
Assertion failed: hmap_size (phdl->fb.map) == 0, file 
../src/line_num_internal.c, line 230, function twolist_proc_clear
cc: Fatal error in /opt/developerstudio12.6/lib/compilers/bin/acomp
cc: Status 134

the assertion is just a consequence of running out of memory, I believe, acomp
is well over 20GB at that point.

However I see that wrasse doesn't seem to have that problem. Which leaves me a
bit confused, because I think that's the same machine and compiler binary.

Noah, did you encounter this before / do anything to avoid this?

Greetings,

Andres Freund




Making autovacuum logs indicate if insert-based threshold was the triggering condition

2022-08-06 Thread Peter Geoghegan
It's quite possible (and probably very common) for certain tables to
have autovacuum scheduling trigger autovacuums based on both the
"regular" bloat-orientated thresholds, and the newer insert-based
thresholds. It may be far from obvious which triggering condition
autovacuum.c must have applied to trigger any given autovacuum, since
that information isn't currently passed down to lazyvacuum.c. This
seems like a problem to me; how are users supposed to tune
autovacuum's thresholds without even basic feedback about how the
thresholds get applied?

Attached patch teaches autovacuum.c to pass the information down to
lazyvacuum.c, which includes the information in the autovacuum log.
The approach I've taken is very similar to the existing approach with
anti-wraparound autovacuum. It's pretty straightforward. Note that a
VACUUM that is an "automatic vacuum for inserted tuples" cannot also
be an antiwraparound autovacuum, nor can it also be a "regular"
autovacuum/VACUUM -- there are now 3 distinct "triggering conditions"
for autovacuum.

Although this patch doesn't touch antiwraparound autovacuums at all, I
will note in passing that I think that anti-wraparound autovacuums
should become just another triggering condition for autovacuum -- IMV
they shouldn't be special in *any* way. We'd still need to keep
antiwraparound's "cannot automatically cancel autovacuum worker"
behavior in some form, but that would become dynamic, a little like
the failsafe is today, and would trigger on its own timeline (probably
*much* later than we first trigger antiwraparound autovacuum). We'd
also need to decouple "aggressiveness" (the behaviors that we
associate with aggressive mode in vacuumlazy.c) from the condition of
the table/system when VACUUM first began -- those could all become
dynamic, too.

-- 
Peter Geoghegan


v1-0001-Show-when-autovacuum-is-insert-driven-in-log.patch
Description: Binary data


Re: optimize lookups in snapshot [sub]xip arrays

2022-08-06 Thread Nathan Bossart
On Fri, Aug 05, 2022 at 03:04:34PM -0700, Andres Freund wrote:
> But mainly I'd expect to find a difference if the SIMD code were optimized a
> further on the basis of not needing to return the offset. E.g. by
> replacing _mm_packs_epi32 with _mm_or_si128, that's cheaper.

I haven't been able to find a significant difference between the two.  If
anything, the _mm_packs_epi* approach actually seems to be slightly faster
in some cases.  For something marginally more concrete, I compared the two
in perf-top and saw the following for the relevant instructions:

_mm_packs_epi*:
0.19 │   packssdw   %xmm1,%xmm0
0.62 │   packssdw   %xmm1,%xmm0
7.14 │   packsswb   %xmm1,%xmm0

_mm_or_si128:
1.52 │   por%xmm1,%xmm0
2.05 │   por%xmm1,%xmm0
5.66 │   por%xmm1,%xmm0

I also tried a combined approach where I replaced _mm_packs_epi16 with
_mm_or_si128:
1.16 │   packssdw   %xmm1,%xmm0
1.47 │   packssdw   %xmm1,%xmm0
8.17 │   por%xmm1,%xmm0

Of course, this simplistic analysis leaves out the impact of the
surrounding instructions, but it seems to support the idea that the
_mm_packs_epi* approach might have a slight edge.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




conchuela doesn't like gnu_printf anymore

2022-08-06 Thread Tom Lane
Buildfarm animal conchuela recently started spitting a lot of warnings
like this one:

 conchuela | 2022-08-06 12:35:46 | 
/home/pgbf/buildroot/HEAD/pgsql.build/../pgsql/src/include/port.h:208:70: 
warning: 'format' attribute argument not supported: gnu_printf 
[-Wignored-attributes]

I first thought we'd broken something, but upon digging through the
buildfarm history, the oldest build showing these warnings is

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela=2022-07-18%2020%3A20%3A18

The new commits in that build don't look related, but what does look
related is that the choice of C++ compiler changed:

configure: using compiler=gcc 8.3 [DragonFly] Release/2019-02-22
configure: using CXX=ccache clang++14

vs

configure: using compiler=gcc 8.3 [DragonFly] Release/2019-02-22
configure: using CXX=g++

This is seemingly an intentional configuration change, because the
animal is reporting different config_env than before.  However,
we decide what to set PG_PRINTF_ATTRIBUTE to based on what CC
likes, and if CXX doesn't like it then you'll get these warnings.
(The warnings only appear in C++ compiles, else there'd REALLY
be a lot of them.)

Is it worth the trouble to try to set PG_PRINTF_ATTRIBUTE differently
in C and C++ builds?  I doubt it.  Probably the right fix for this
is to use matched C and C++ compilers, either both clang or both gcc.
I fear that inconsistency could lead to bigger problems than some
warnings.

regards, tom lane




Re: A cost issue in ORDER BY + LIMIT

2022-08-06 Thread Tom Lane
Paul Guo  writes:
> Postgres seems to always optimize ORDER BY + LIMIT as top-k sort.
> Recently I happened to notice
> that in this scenario the output tuple number of the sort node is not
> the same as the LIMIT tuple number.

No, it isn't, and your proposed patch is completely misguided.
The cost and rowcount estimates for a plan node are always written
on the assumption that the node is run to completion.

regards, tom lane




Re: `make check` doesn't pass on MacOS Catalina

2022-08-06 Thread Andrew Dunstan


On 2022-08-06 Sa 11:25, Tom Lane wrote:
> Andrew Dunstan  writes:
>> I came across this when I was working on setting up some Dockerfiles for
>> the buildfarm. Apparently LD_LIBRARY_PATH doesn't work on Alpine, at
>> least out of the box, as it uses a different linker, and "make check"
>> relies on it (or the moral equivalent) if "make install" hasn't been run.
> I did some quick googling on this point.  We seem not to be the only
> project having linking issues on Alpine, and yet it does support
> LD_LIBRARY_PATH according to some fairly authoritative-looking pages, eg
>
> https://www.musl-libc.org/doc/1.0.0/manual.html
>
> I suspect the situation is similar to macOS, ie there is some limitation
> somewhere on whether LD_LIBRARY_PATH gets passed through.  If memory
> serves, the problem on SIP-enabled Mac is that DYLD_LIBRARY_PATH is
> cleared upon invoking bash, so that we lose it anywhere that "make"
> invokes a shell to run a subprogram.  (Hmm ... I wonder whether ninja
> uses the shell ...)  I don't personally care at all about Alpine, but
> maybe somebody who does could dig a little harder and characterize
> the problem there better.
>
>   


We probably should care about Alpine, because it's a good distro to use
as the basis for Docker images, being fairly secure, very small, and
booting very fast.

I'll dig some more, and possibly set up a (docker based) buildfarm instance.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-08-06 Thread Tom Lane
Dilip Kumar  writes:
> On Fri, Aug 5, 2022 at 10:43 AM Dilip Kumar  wrote:
>> Yeah maybe it is not necessary to close as these unowned smgr will
>> automatically get closed on the transaction end.

I do not think this is a great idea for the per-relation smgrs created
during RelationCopyStorageUsingBuffer.  Yeah, they'll be mopped up at
transaction end, but that doesn't mean that creating possibly tens of
thousands of transient smgrs isn't going to cause performance issues.

I think RelationCopyStorageUsingBuffer needs to open and then close
the smgrs it uses, which means that ReadBufferWithoutRelcache is not the
appropriate API for it to use, either; need to go down another level.

regards, tom lane




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-08-06 Thread Tom Lane
Dilip Kumar  writes:
> PFA patches for different problems discussed in the thread

> 0001 - Fix the problem of skipping the empty block and buffer lock on
> source buffer
> 0002 - Remove fake relcache entry (same as 0001-BugfixInWalLogCreateDB.patch)
> 0003 - Optimization to avoid extending block by block

I pushed 0001, because it seems fairly critical to get that in before
beta3.  The others can stand more leisurely discussion.

I note from
https://coverage.postgresql.org/src/backend/storage/buffer/bufmgr.c.gcov.html
that the block-skipping path is actually taken in our tests (this won't be
visible there for very much longer of course).  So we actually *are*
making a corrupt copy, and we haven't noticed.  This is perhaps not too
surprising, because the only test case that I can find is in
020_createdb.pl:

$node->issues_sql_like(
[ 'createdb', '-T', 'foobar2', '-S', 'wal_log', 'foobar6' ],
qr/statement: CREATE DATABASE foobar6 STRATEGY wal_log TEMPLATE 
foobar2/,
'create database with WAL_LOG strategy');

which is, um, not exactly a robust test of whether anything happened
at all, let alone whether it was correct.  I'm not real sure that
this test would even notice if the CREATE reported failure.

regards, tom lane




Re: A cost issue in ORDER BY + LIMIT

2022-08-06 Thread Zhang Mingli
HI,

What if the the rows of t1 is less than the limit number(ex: t1 has 5 rows, 
limit 10)?
Does it matter?


Regards,
Zhang Mingli
On Aug 6, 2022, 23:38 +0800, Paul Guo , wrote:
>
> limit_tuples


A cost issue in ORDER BY + LIMIT

2022-08-06 Thread Paul Guo
Hello,

Postgres seems to always optimize ORDER BY + LIMIT as top-k sort.
Recently I happened to notice
that in this scenario the output tuple number of the sort node is not
the same as the LIMIT tuple number.

See below,

postgres=# explain analyze verbose select * from t1 order by a limit 10;
   QUERY PLAN

--
--
 Limit  (cost=69446.17..69446.20 rows=10 width=4) (actual
time=282.896..282.923 rows=10 loops=1)
   Output: a
   ->  Sort  (cost=69446.17..71925.83 rows=991862 width=4) (actual
time=282.894..282.896 rows=10 l
oops=1)
 Output: a
 Sort Key: t1.a
 Sort Method: top-N heapsort  Memory: 25kB
 ->  Seq Scan on public.t1  (cost=0.00..44649.62 rows=991862
width=4) (actual time=0.026..
195.438 rows=1001000 loops=1)
   Output: a
 Planning Time: 0.553 ms
 Execution Time: 282.961 ms
(10 rows)

We can see from the output that the LIMIT cost is wrong also due to
this since it assumes the input_rows
from the sort node is 991862 (per gdb debugging).

This could be easily fixed by below patch,

diff --git a/src/backend/optimizer/path/costsize.c
b/src/backend/optimizer/path/costsize.c
index fb28e6411a..800cf0b256 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2429,7 +2429,11 @@ cost_sort(Path *path, PlannerInfo *root,

startup_cost += input_cost;

-   path->rows = tuples;
+   if (limit_tuples > 0 && limit_tuples < tuples)
+   path->rows = limit_tuples;
+   else
+   path->rows = tuples;
+
path->startup_cost = startup_cost;
path->total_cost = startup_cost + run_cost;
 }

Withe the patch the explain output looks like this.

postgres=# explain analyze verbose select * from t1 order by a limit 10;
   QUERY PLAN

--
--
 Limit  (cost=69446.17..71925.83 rows=10 width=4) (actual
time=256.204..256.207 rows=10 loops=1)
   Output: a
   ->  Sort  (cost=69446.17..71925.83 rows=10 width=4) (actual
time=256.202..256.203 rows=10 loops
=1)
 Output: a
 Sort Key: t1.a
 Sort Method: top-N heapsort  Memory: 25kB
 ->  Seq Scan on public.t1  (cost=0.00..44649.62 rows=991862
width=4) (actual time=1.014..
169.509 rows=1001000 loops=1)
   Output: a
 Planning Time: 0.076 ms
 Execution Time: 256.232 ms
(10 rows)

Regards,
Paul




Re: `make check` doesn't pass on MacOS Catalina

2022-08-06 Thread Andres Freund
Hi,

On 2022-08-06 11:25:09 -0400, Tom Lane wrote:
> (Hmm ... I wonder whether ninja uses the shell ...)

It does, but even if it didn't, we'd use a shell somewhere below perl or
pg_regress :(.

The meson build should still work without disabling SIP, I did the necessary
hackery to set up the rpath equivalent up relatively. So both the real install
target and the tmp_install/ should find libraries within themselves.

Greetings,

Andres Freund




Re: `make check` doesn't pass on MacOS Catalina

2022-08-06 Thread Tom Lane
Andrew Dunstan  writes:
> I came across this when I was working on setting up some Dockerfiles for
> the buildfarm. Apparently LD_LIBRARY_PATH doesn't work on Alpine, at
> least out of the box, as it uses a different linker, and "make check"
> relies on it (or the moral equivalent) if "make install" hasn't been run.

I did some quick googling on this point.  We seem not to be the only
project having linking issues on Alpine, and yet it does support
LD_LIBRARY_PATH according to some fairly authoritative-looking pages, eg

https://www.musl-libc.org/doc/1.0.0/manual.html

I suspect the situation is similar to macOS, ie there is some limitation
somewhere on whether LD_LIBRARY_PATH gets passed through.  If memory
serves, the problem on SIP-enabled Mac is that DYLD_LIBRARY_PATH is
cleared upon invoking bash, so that we lose it anywhere that "make"
invokes a shell to run a subprogram.  (Hmm ... I wonder whether ninja
uses the shell ...)  I don't personally care at all about Alpine, but
maybe somebody who does could dig a little harder and characterize
the problem there better.

regards, tom lane




Re: Allocator sizeof operand mismatch (src/backend/regex/regcomp.c)

2022-08-06 Thread Zhang Mingli
On Aug 6, 2022, 22:47 +0800, Tom Lane , wrote:
> Zhang Mingli  writes:
> > I think it’s ok, re_guts is converted when  used
> > (struct guts *) re->re_guts;
> > And there is comments in regex.h
> > char *re_guts; /* `char *' is more portable than `void *' */
>
> Boy, that comment is showing its age isn't it? If we were to do
> anything about this, I'd be more inclined to change re_guts to void*.
Got it , thanks.



Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-08-06 Thread Joe Conway

On 8/6/22 02:26, Michael Paquier wrote:

On Fri, Aug 05, 2022 at 12:48:33PM +0200, Drouvot, Bertrand wrote:

On 8/2/22 11:57 PM, Jacob Champion wrote:

Thoughts from prior reviewers? Is SYSTEM_USER the way to go?


Reading through the other thread, there is a clear parallel between
both in concept to provide this information at SQL level, indeed.


I did not look in detail to this thread, but if the goal is "only" to expose
authn_id (as the subject describes) then it seems to me that SYSTEM_USER [1]
is the way to go.

[1]: https://commitfest.postgresql.org/39/3703/


However, I am not sure if the suggestion of auth_method:authn as
output generated by SYSTEM_USER would be correct according to the SQL
specification, either.  The spec being not really talkative about the
details of what an external module should be opens up for a lot of
interpretation, something that both thread are dealing with.


As I pointed out here [ 
https://www.postgresql.org/message-id/28b4a9ef-5103-f117-99e1-99ae5a86a6e8%40joeconway.com 
] both the SQL Server and Oracle interpretations are similar to the one 
provided by Bertrand's patch:


SQL Server:
  "If the current user is logged in to SQL Server by
   using Windows Authentication, SYSTEM_USER returns the
   Windows login identification name in the form:
   DOMAIN\user_login_name. However, if the current user
   is logged in to SQL Server by using SQL Server
   Authentication, SYSTEM_USER returns the SQL Server
   login identification name"

Oracle:
  "SYSTEM_USER
   Returns the name of the current data store user as
   identified by the operating system."

I am not sure how else we should interpret SYSTEM_USER -- if it isn't 
port->authn_id what else would you propose it should be?


--
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Allocator sizeof operand mismatch (src/backend/regex/regcomp.c)

2022-08-06 Thread Tom Lane
Zhang Mingli  writes:
> I think it’s ok, re_guts is converted when  used
> (struct guts *) re->re_guts;
> And there is comments in regex.h
>   char *re_guts; /* `char *' is more portable than `void *' */

Boy, that comment is showing its age isn't it?  If we were to do
anything about this, I'd be more inclined to change re_guts to void*.
But, never having seen any compiler warnings about this code,
I don't feel a strong need to do something.

regards, tom lane




Re: `make check` doesn't pass on MacOS Catalina

2022-08-06 Thread Tom Lane
Andrew Dunstan  writes:
> On 2022-08-06 Sa 06:49, Gurjeet Singh wrote:
>> There are no comments as to why one should choose to use --delay-check
>> ($delay_check). This email, and the pointer to the paragraph buried in
>> the docs, shared by Tom, are the only two ways one can understand what
>> is causing this failure, and how to get around it.

> In general we want to run "make check" as soon as possible after running
> "make" on the core code. That's why I didn't simply delay it
> unconditionally.

In general --- that is, on non-broken platforms --- "make check"
*should* work without a prior "make install".  I am absolutely
not in favor of changing the buildfarm so that it fails to detect
the problem if we break that.  But for sure it'd make sense to add
some comments to the wiki and/or sample config file explaining
that you need to set this option on systems X,Y,Z.

On macOS you need to use it if you haven't disabled SIP.
I don't have the details about any other problem platforms.

regards, tom lane




Re: Allocator sizeof operand mismatch (src/backend/regex/regcomp.c)

2022-08-06 Thread Zhang Mingli

I think it’s ok, re_guts is converted when  used

(struct guts *) re->re_guts;

And there is comments in regex.h


char *re_guts; /* `char *' is more portable than `void *' */

Regards,
Zhang Mingli
On Aug 6, 2022, 20:13 +0800, Ranier Vilela , wrote:
> Hi,
>
> About the error:
> Result of 'malloc' is converted to a pointer of type 'char', which is 
> incompatible with sizeof operand type 'struct guts'
>
> The patch attached tries to fix this.
>
> regards,
> Ranier Vilela


Re: `make check` doesn't pass on MacOS Catalina

2022-08-06 Thread Andrew Dunstan


On 2022-08-06 Sa 06:49, Gurjeet Singh wrote:
> On Tue, Apr 20, 2021 at 9:06 AM Andrew Dunstan  wrote:
>> On 4/20/21 11:02 AM, Tom Lane wrote:
>>> Aleksander Alekseev  writes:
 While trying to build PostgreSQL from source (master branch, 95c3a195) on a
 MacBook I discovered that `make check` fails:
>>> This is the usual symptom of not having disabled SIP :-(.
>>>
>>> If you don't want to do that, do "make install" before "make check".
>> FYI the buildfarm client has a '--delay-check' option that does exactly
>> this. It's useful on Alpine Linux as well as MacOS
> I was trying to set up a buildfarm animal, and this exact problem lead
> to a few hours of debugging and hair-pulling. Can the default
> behaviour be changed in buildfarm client to perform `make check` only
> after `make install`.
>
> Current buildfarm client code looks something like:
>
> make();
> make_check() unless $delay_check;
> ... other steps ...
> make_install();
> ... other steps-2...
> make_check() if $delay_check;
>
> There are no comments as to why one should choose to use --delay-check
> ($delay_check). This email, and the pointer to the paragraph buried in
> the docs, shared by Tom, are the only two ways one can understand what
> is causing this failure, and how to get around it.
>
> Naive question: What's stopping us from rewriting the code as follows.
> make();
> make_install();
> make_check();
> ... other steps ...
> ... other steps-2...
> # or move make_check() call here
>
> With a quick google search I could not find why --delay-check is
> necessary on Apline linux, as well; can you please elaborate.
>

I came across this when I was working on setting up some Dockerfiles for
the buildfarm. Apparently LD_LIBRARY_PATH doesn't work on Alpine, at
least out of the box, as it uses a different linker, and "make check"
relies on it (or the moral equivalent) if "make install" hasn't been run.

In general we want to run "make check" as soon as possible after running
"make" on the core code. That's why I didn't simply delay it
unconditionally.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [Code Comments]enum COPY_NEW_FE is removed

2022-08-06 Thread Zhang Mingli
Ok, thanks.

Michael Paquier 于2022年8月6日 周六20:17写道:

> On Sat, Aug 06, 2022 at 07:20:25PM +0800, Zhang Mingli wrote:
> > Enum COPY_NEW_FE is removed in commit 3174d69fb9.
> >
> > Should use COPY_FRONTEND instead.
> >
> > Issue exists on 15 and master.
>
> This also exists in REL_14_STABLE.  I have fixed that on HEAD, as
> that's just a comment.
> --
> Michael
>


Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2022-08-06 Thread Amit Kapila
On Fri, Aug 5, 2022 at 7:25 PM Melih Mutlu  wrote:
>>
>> Why can't it be used to sync the other tables if any?
>
>
> It can be used. But I thought it would be better not to, for example in the 
> following case:
> Let's say a sync worker starts with a table in INIT state. The worker creates 
> a new replication slot to sync that table.
> When sync of the table is completed, it will move to the next one. This time 
> the new table may be in FINISHEDCOPY state, so the worker may need to use the 
> new table's existing replication slot.
> Before the worker will move to the next table again, there will be two 
> replication slots used by the worker. We might want to keep one and drop the 
> other.
> At this point, I thought it would be better to keep the replication slot 
> created by this worker in the first place. I think it's easier to track slots 
> this way since we know how to generate the rep slots name.
> Otherwise we would need to store the replication slot name somewhere too.
>

I think there is some basic flaw in slot reuse design. Currently, we
copy the table by starting a repeatable read transaction (BEGIN READ
ONLY ISOLATION LEVEL REPEATABLE READ) and create a slot that
establishes a snapshot which is first used for copy and then LSN
returned by it is used in the catchup phase after the copy is done.
The patch won't establish such a snapshot before a table copy as it
won't create a slot each time. If this understanding is correct, I
think we need to use ExportSnapshot/ImportSnapshot functionality to
achieve it or do something else to avoid the problem mentioned.

>
>>
>> This sounds reasonable. Let's do this unless we get some better idea.
>
>
>> There is no such restriction that origins should belong to only one
>> table. What makes you think like that?
>
>
> I did not reuse origins since I didn't think it would significantly improve 
> the performance as reusing replication slots does.
> So I just kept the origins as they were, even if it was possible to reuse 
> them. Does that make sense?
>

For small tables, it could have a visible performance difference as it
involves database write operations to each time create and drop the
origin. But if we don't want to reuse then also you need to set its
origin_lsn appropriately. Currently (without this patch), after
creating the slot, we directly use the origin_lsn returned by
create_slot API whereas now it won't be the same case as the patch
doesn't create a slot every time.

-- 
With Regards,
Amit Kapila.




Re: Fix inconsistencies GUC categories

2022-08-06 Thread Michael Paquier
On Thu, Aug 04, 2022 at 08:09:51PM +0900, Shinya Kato wrote:
> I would like to unify the following with config.sgml as in a55a984.
> --
> Category is 'REPORTING AND LOGGING' and subcategory is 'PROCESS TITLE' at
> config.sgml.
> Category is 'REPORTING AND LOGGING' and subcategory is 'PROCESS TITLE' at
> pg_settings.

Yep.  I agree with these changes, even for
client_connection_check_interval.

> Category is 'PROCESS TITLE' and subcategory is none at
> postgresql.conf.sample.

Yep.  This change sounds right as well. 
--
Michael


signature.asc
Description: PGP signature


Re: [Code Comments]enum COPY_NEW_FE is removed

2022-08-06 Thread Michael Paquier
On Sat, Aug 06, 2022 at 07:20:25PM +0800, Zhang Mingli wrote:
> Enum COPY_NEW_FE is removed in commit 3174d69fb9.
> 
> Should use COPY_FRONTEND instead.
> 
> Issue exists on 15 and master.

This also exists in REL_14_STABLE.  I have fixed that on HEAD, as
that's just a comment.
--
Michael


signature.asc
Description: PGP signature


Allocator sizeof operand mismatch (src/backend/regex/regcomp.c)

2022-08-06 Thread Ranier Vilela
Hi,

About the error:
Result of 'malloc' is converted to a pointer of type 'char', which is
incompatible with sizeof operand type 'struct guts'

The patch attached tries to fix this.

regards,
Ranier Vilela
diff --git a/src/backend/regex/regcomp.c b/src/backend/regex/regcomp.c
index 473738040b..56e5ab1e84 100644
--- a/src/backend/regex/regcomp.c
+++ b/src/backend/regex/regcomp.c
@@ -389,7 +389,7 @@ pg_regcomp(regex_t *re,
 	re->re_fns = VS();
 
 	/* more complex setup, malloced things */
-	re->re_guts = VS(MALLOC(sizeof(struct guts)));
+	re->re_guts = (struct guts *) MALLOC(sizeof(struct guts));
 	if (re->re_guts == NULL)
 		return freev(v, REG_ESPACE);
 	g = (struct guts *) re->re_guts;


[Code Comments]enum COPY_NEW_FE is removed

2022-08-06 Thread Zhang Mingli
Hi,

Enum COPY_NEW_FE is removed in commit 3174d69fb9.

Should use COPY_FRONTEND instead.

Issue exists on 15 and master.

```
typedef struct CopyFromStateData

- StringInfo fe_msgbuf; /* used if copy_src == COPY_NEW_FE */
+ StringInfo fe_msgbuf; /* used if copy_src == COPY_FRONTEND */

```

Regards,
Zhang Mingli


v-0001-Fix-code-comments-COPY_FRONTEND.patch
Description: Binary data


Re: `make check` doesn't pass on MacOS Catalina

2022-08-06 Thread Gurjeet Singh
On Tue, Apr 20, 2021 at 9:06 AM Andrew Dunstan  wrote:
>
> On 4/20/21 11:02 AM, Tom Lane wrote:
> > Aleksander Alekseev  writes:
> >> While trying to build PostgreSQL from source (master branch, 95c3a195) on a
> >> MacBook I discovered that `make check` fails:
> > This is the usual symptom of not having disabled SIP :-(.
> >
> > If you don't want to do that, do "make install" before "make check".

> FYI the buildfarm client has a '--delay-check' option that does exactly
> this. It's useful on Alpine Linux as well as MacOS

I was trying to set up a buildfarm animal, and this exact problem lead
to a few hours of debugging and hair-pulling. Can the default
behaviour be changed in buildfarm client to perform `make check` only
after `make install`.

Current buildfarm client code looks something like:

make();
make_check() unless $delay_check;
... other steps ...
make_install();
... other steps-2...
make_check() if $delay_check;

There are no comments as to why one should choose to use --delay-check
($delay_check). This email, and the pointer to the paragraph buried in
the docs, shared by Tom, are the only two ways one can understand what
is causing this failure, and how to get around it.

Naive question: What's stopping us from rewriting the code as follows.
make();
make_install();
make_check();
... other steps ...
... other steps-2...
# or move make_check() call here

With a quick google search I could not find why --delay-check is
necessary on Apline linux, as well; can you please elaborate.

Best regards,
Gurjeet
http://Gurje.et




Re: Allow file inclusion in pg_hba and pg_ident files

2022-08-06 Thread Julien Rouhaud
Hi,

On Fri, Aug 05, 2022 at 09:56:29AM +0900, Michael Paquier wrote:
> On Tue, Aug 02, 2022 at 07:32:54PM +0900, Michael Paquier wrote:
> > As a quick update from my side, I intend to look and apply 0001~0003
> > (not double-checked yet) shortly.
>
> And a couple of days later, these look fine so done as of 47ab1ac and
> 718fe0a.  0002 and 0003 have been merged together.

Thanks!




Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?

2022-08-06 Thread Michael Paquier
On Fri, Aug 05, 2022 at 03:55:26PM +0530, Bharath Rupireddy wrote:
> I noticed that dir_open_for_write() in walmethods.c uses write() for
> WAL file initialization (note that this code is used by pg_receivewal
> and pg_basebackup) as opposed to core using pg_pwritev_with_retry() in
> XLogFileInitInternal() to avoid partial writes. Do we need to fix
> this?

0d56acfb has moved pg_pwritev_with_retry to be backend-only in fd.c :/

> Thoughts?

Makes sense to me for the WAL segment pre-padding initialization, as
we still want to point to the beginning of the segment after we are
done with the pre-padding, and the code has an extra lseek().
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-08-06 Thread Michael Paquier
On Fri, Aug 05, 2022 at 12:48:33PM +0200, Drouvot, Bertrand wrote:
> On 8/2/22 11:57 PM, Jacob Champion wrote:
>> Thoughts from prior reviewers? Is SYSTEM_USER the way to go?

Reading through the other thread, there is a clear parallel between
both in concept to provide this information at SQL level, indeed.

> I did not look in detail to this thread, but if the goal is "only" to expose
> authn_id (as the subject describes) then it seems to me that SYSTEM_USER [1]
> is the way to go.
> 
> [1]: https://commitfest.postgresql.org/39/3703/

However, I am not sure if the suggestion of auth_method:authn as
output generated by SYSTEM_USER would be correct according to the SQL
specification, either.  The spec being not really talkative about the
details of what an external module should be opens up for a lot of
interpretation, something that both thread are dealing with.

Anyway, we are talking about two different problems on this thread:
1) Provide the authn/SYSTEM_USER through some SQL interface, which is
what 0001 is an attempt of, SYSTEM_USER is a different attempt.
2) Move authn out of Port into its own structure, named
ClientConnectionInfo, and pass it down to the parallel workers.

SYSTEM_USER overlaps with 0001, but I see no reason to not do 0002 in
all cases.  Even if we are not sure yet of how to expose that at SQL
level, we still want to pass down this information to the parallel
workers and we still want to not have that in Port.  An extension
could also do easily the job once 0002 is done, so I see a good
argument about doing it anyway.  The name ClientConnectionInfo from
Robert looks like the compromise we have for the new structure holding
the information about the client information passed down to the
workers.
--
Michael


signature.asc
Description: PGP signature