date:20200904

Re: Implementing Incremental View Maintenance

2020-09-04 Thread Thomas Munro

Hi Nagata-san,

On Mon, Aug 31, 2020 at 5:32 PM Yugo NAGATA  wrote:
> https://wiki.postgresql.org/wiki/Incremental_View_Maintenance

Thanks for writing this!

+   /*
+* Wait for concurrent transactions which update this materialized view at
+* READ COMMITED. This is needed to see changes committed in other
+* transactions. No wait and raise an error at REPEATABLE READ or
+* SERIALIZABLE to prevent update anomalies of matviews.
+* XXX: dead-lock is possible here.
+*/
+   if (!IsolationUsesXactSnapshot())
+   LockRelationOid(matviewOid, ExclusiveLock);
+   else if (!ConditionalLockRelationOid(matviewOid, ExclusiveLock))

Could you please say a bit more about your plans for concurrency control?

Simple hand-crafted "rollup" triggers typically conflict only when
modifying the same output rows due to update/insert conflicts, or
perhaps some explicit row level locking if they're doing something
complex (unfortunately, they also very often have concurrency
bugs...).  In some initial reading about MV maintenance I did today in
the hope of understanding some more context for this very impressive
but rather intimidating patch set, I gained the impression that
aggregate-row locking granularity is assumed as a baseline for eager
incremental aggregate maintenance.  I understand that our
MVCC/snapshot scheme introduces extra problems, but I'm wondering if
these problems can be solved using the usual update semantics (the
EvalPlanQual mechanism), and perhaps also some UPSERT logic.  Why is
it not sufficient to have locked all the base table rows that you have
modified, captured the before-and-after values generated by those
updates, and also locked all the IMV aggregate rows you will read, and
in the process acquired a view of the latest committed state of the
IMV aggregate rows you will modify (possibly having waited first)?  In
other words, what other data do you look at, while computing the
incremental update, that might suffer from anomalies because of
snapshots and concurrency?  For one thing, I am aware that unique
indexes for groups would probably be necessary; perhaps some subtle
problems of the sort usually solved with predicate locks lurk there?

(Newer papers describe locking schemes that avoid even aggregate-row
level conflicts, by taking advantage of the associativity and
commutativity of aggregates like SUM and COUNT.  You can allow N
writers to update the aggregate concurrently, and if any transaction
has to roll back it subtracts what it added, not necessarily restoring
the original value, so that nobody conflicts with anyone else, or
something like that...  Contemplating an MVCC, no-rollbacks version of
that sort of thing leads to ideas like, I dunno, update chains
containing differential update trees to be compacted later... egad!)

Re: Fix for configure error in 9.5/9.6 on macOS 11.0 Big Sur

2020-09-04 Thread Jesse Zhang

On Thu, Sep 3, 2020 at 10:36 AM Tom Lane wrote:
>
> Jesse Zhang writes:
> >> Peter Eisentraut writes:
> >>> Where did the -Werror come from?
>
> > If you noticed the full invocation of clang, you'd notice that Werror is
> > nowhere on the command line, even though the error message suggests
> > otherwise. I think this is a behavior from the new AppleClang,
>
> Hmph.  If you explicitly say -Wno-error, does the error drop back down
> to being a warning?

Yeah something like that, this is what works for me:

clang -Wno-error=implicit-function-declaration c.c

Then it became a warning.

Interestingly, it seems that AppleClang incorrectly forces this warning
on us:

$ clang --verbose c.c

| Apple clang version 12.0.0 (clang-1200.0.31.1)
| Target: x86_64-apple-darwin20.0.0
| Thread model: posix
| InstalledDir: /Library/Developer/CommandLineTools/usr/bin
|  "/Library/Developer/CommandLineTools/usr/bin/clang" -cc1 -triple
x86_64-apple-macosx11.0.0 -Wdeprecated-objc-isa-usage
-Werror=deprecated-objc-isa-usage
-Werror=implicit-function-declaration -emit-obj -mrelax-all
-disable-free -disable-llvm-verifier -discard-value-names
-main-file-name c.c -mrelocation-model pic -pic-level 2 -mthread-model
posix -mframe-pointer=all -fno-strict-return -masm-verbose
-munwind-tables -target-sdk-version=11.0 -target-cpu penryn
-dwarf-column-info -debugger-tuning=lldb -target-linker-version 609 -v
-resource-dir /Library/Developer/CommandLineTools/usr/lib/clang/12.0.0
-isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk
-I/usr/local/include -internal-isystem
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/local/include
-internal-isystem
/Library/Developer/CommandLineTools/usr/lib/clang/12.0.0/include
-internal-externc-isystem
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include
-internal-externc-isystem
/Library/Developer/CommandLineTools/usr/include -Wno-reorder-init-list
-Wno-implicit-int-float-conversion -Wno-c99-designator
-Wno-final-dtor-non-final-class -Wno-extra-semi-stmt
-Wno-misleading-indentation -Wno-quoted-include-in-framework-header
-Wno-implicit-fallthrough -Wno-enum-enum-conversion
-Wno-enum-float-conversion -fdebug-compilation-dir /tmp -ferror-limit
19 -fmessage-length 193 -stack-protector 1 -fstack-check
-mdarwin-stkchk-strong-link -fblocks -fencode-extended-block-signature
-fregister-global-dtors-with-atexit -fgnuc-version=4.2.1
-fobjc-runtime=macosx-11.0.0 -fmax-type-align=16
-fdiagnostics-show-option -fcolor-diagnostics -o
/var/folders/ts/nxrsmhmd0xb5zdrlqb4jlkbrgn/T/c-e6802f.o -x c c.c
| clang -cc1 version 12.0.0 (clang-1200.0.31.1) default target
x86_64-apple-darwin20.0.0

Notice that -Werror=implicit-function-declaration up there? I spent a
few minutes digging in Apple's published fork of LLVM, they've been
forcing this error flag for quite a while, but this particular
warning-turned-error is guarded by a conditional along the lines of "is
this iOS-like" [1][2], so I cannot imagine such a code path is activated
(other than something like "goto fail;" from 2014)

>
> > I've heard reports of the same under the latest Xcode 12 on macOS
> > Catalina, but I don't have my hands on such an env.
>
> The latest thing available to the unwashed masses seems to be
> Xcode 11.7 with

Yes you're right. Xcode 12 is still beta.

>
> Smells like an Apple bug from here.  Surely they're not expecting
> that anyone will appreciate -Werror suddenly being the default.

I think you've convinced me that this is an Apple bug indeed. I'll
probably just get by with a Wno-error=implicit-function-declaration in
my CFLAGS for now.


[1] 
https://github.com/apple/llvm-project/blob/swift-5.3-DEVELOPMENT-SNAPSHOT-2020-08-04-a/clang/lib/Driver/ToolChains/Darwin.cpp#L952-L962
[2] 
https://opensource.apple.com/source/clang/clang-800.0.42.1/src/tools/clang/lib/Driver/ToolChains.cpp

Re: More tests with USING INDEX replident and dropped indexes

2020-09-04 Thread Michael Paquier

On Wed, Sep 02, 2020 at 03:00:44PM +0900, Michael Paquier wrote:
> Yeah, that's true as well.  Still, I would like to see first if people
> are fine with changing this code path to be transactional.  This way,
> we will have zero history in the tree where there was a risk for an
> inconsistent window.

So, I have begun a thread about that part with a dedicated CF entry:
https://commitfest.postgresql.org/30/2725/

While considering more this point, I think that making this code path
transactional is the correct way to go, as a first step.  Also, as
this thread is about adding more tests and that this has been done
with fe7fd4e9, I am marking the CF entry as committed.  Let's discuss
the reset of relreplident for the parent relation when its replica
identity index is dropped once the transactional part is fully
discussed, on a new thread.
--
Michael


signature.asc
Description: PGP signature

Weird corner-case failure mode for VPATH builds

2020-09-04 Thread Tom Lane

I discovered a problem today while trying to do a VPATH build on
a machine I don't use that often:

$ make
...
In file included from /home/tgl/pgsql/src/include/postgres.h:47,
 from /home/tgl/pgsql/src/common/hashfn.c:24:
/home/tgl/pgsql/src/include/utils/elog.h:71:10: fatal error: utils/errcodes.h: 
No such file or directory
   71 | #include "utils/errcodes.h"
  |  ^~
compilation terminated.

The proximate problem is that src/include/utils/errcodes.h exists
in the VPATH build directory, but it's a symlink pointing to
src/backend/utils/errcodes.h in the build directory, which does not exist.

After a bit of looking around, I think that the cause of that is this
rule in src/backend/utils/Makefile:

$(top_builddir)/src/include/utils/header-stamp: fmgr-stamp errcodes.h
prereqdir=`cd '$(dir $<)' >/dev/null && pwd` && \
cd '$(dir $@)' && for file in fmgroids.h fmgrprotos.h errcodes.h; do \
  rm -f $$file && $(LN_S) "$$prereqdir/$$file" . ; \
done
touch $@

The last build I did on this machine was a non-VPATH build a few
weeks ago.  "make distclean" from that left behind an errcodes.h
that's still up to date, so make figured it didn't have to do
anything, and the referent of the "errcodes.h" dependency here is
the errcodes.h in the source directory.  On the other hand,
pg_proc.dat changed recently, so fmgr-stamp and related files
were considered out of date and rebuilt --- in the build tree.
Thus, the "prereqdir" in the above rule ends up pointing at
src/backend/utils in the build tree, but while fmgroids.h and
fmgrprotos.h do exist there, errcodes.h does not.

There are a couple different ways we might think about fixing
this.  Maybe this specific rule is broken and needs to be fixed to
not assume that '$(dir $<)' is the same for all its dependencies.
Alternatively, maybe the problem is that the derived files
fmgr-stamp et al. should have been created in the source tree, even
during a VPATH build.  (I have a vague recollection that we do it
that way for some derived files, but maybe that's out of date.)

Of course, it might also be fair to argue that this scenario of
a source tree that has only some up-to-date derived files is
not supported/supportable.  It definitely doesn't seem worth
a huge amount of effort to fix; but on the other hand the rule
I cite above does seem to be assuming something it shouldn't.

Thoughts?

regards, tom lane

Re: WIP: BRIN multi-range indexes

2020-09-04 Thread Michael Paquier

On Fri, Aug 07, 2020 at 06:27:01PM +0200, Tomas Vondra wrote:
> Attached is an updated version of the patch series, implementing this.
> Adding the extra data types was fairly simple, because both bloom and
> minmax-multi indexes already used "struct as varlena" approach, so all
> that needed was a bunch of in/out functions and catalog records.
> 
> I've left the changes in separate patches for clarity, ultimately it'll
> get merged into the other parts.

This fails to apply per the CF bot, so please provide a rebase.
--
Michael


signature.asc
Description: PGP signature

Re: Bogus documentation for bogus geometric operators

2020-09-04 Thread Michael Paquier

On Fri, Aug 21, 2020 at 12:00:45PM +0100, Emre Hasegeli wrote:
> I prepared a patch to add <<| and |>> operators for points to
> deprecate the previous ones.

Emre, the CF bot complains that this does not apply anymore, so please
provide a rebase.  By the way, I am a bit confused to see a patch
that adds new operators on a thread whose subject is about
documentation.
--
Michael


signature.asc
Description: PGP signature

Re: Concurrent failure in create_am with buildfarm member conchuela

2020-09-04 Thread Michael Paquier

On Fri, Sep 04, 2020 at 03:38:32PM -0400, Alvaro Herrera wrote:
> Fixed in d54f99e4154.

Yes, thanks.  I bumped into the other thread this morning:
https://www.postgresql.org/message-id/839004.1599185...@sss.pgh.pa.us
The fix makes sense.
--
Michael


signature.asc
Description: PGP signature

Re: Improving connection scalability: GetSnapshotData()

2020-09-04 Thread Michael Paquier

On Fri, Sep 04, 2020 at 10:35:19AM -0700, Andres Freund wrote:
> I think it's best to close the entry. There's substantial further wins
> possible, in particular not acquiring ProcArrayLock in GetSnapshotData()
> when the cache is valid improves performance substantially. But it's
> non-trivial enough that it's probably best dealth with in a separate
> patch / CF entry.

Cool, thanks for updating the CF entry.
--
Michael


signature.asc
Description: PGP signature

Re: A micro-optimisation for walkdir()

2020-09-04 Thread Andres Freund

Hi,

On 2020-09-05 11:15:07 +1200, Thomas Munro wrote:
> On Sat, Sep 5, 2020 at 9:45 AM Andres Freund  wrote:
> > On 2020-09-02 17:51:27 +0200, Juan José Santamaría Flecha wrote:
> > > + attrib = GetFileAttributes(d->ret.d_name);
> >
> > Is this really an optimization? The benefit of Thomas' patch is that
> > that information sometimes already is there. But here you're doing a
> > separate lookup with GetFileAttributes()?
> 
> Well as discussed already, our stat() emulation on Windows does
> multiple syscalls, so it's a slight improvement.

But the patch is patching readdir(), not just walkdir(). Not all
readdir() / ReadDir() callers necessarily do a stat() and many continue
to do a stat() after the patches. So for all of those except walkdir(),
some like RemoveOldXlogFiles() sometimes being noticable cost wise, the
patch will increase the cost on windows. No?  That's quite different
from utilizing "free" information.

> However, it looks like we might be missing a further opportunity
> here...  Doesn't Windows already give us the flags we need in the
> dwFileAttributes member of the WIN32_FIND_DATA object that the
> Find{First,Next}File() functions populate?

That'd be better...

Greetings,

Andres Freund

Re: WIP: WAL prefetch (another approach)

2020-09-04 Thread Thomas Munro

On Wed, Sep 2, 2020 at 2:18 AM Tomas Vondra
 wrote:
> On Wed, Sep 02, 2020 at 02:05:10AM +1200, Thomas Munro wrote:
> >On Wed, Sep 2, 2020 at 1:14 AM Tomas Vondra
> > wrote:
> >> from the archive
> >
> >Ahh, so perhaps that's the key.
>
> Maybe. For the record, the commands look like this:
>
> archive_command = 'gzip -1 -c %p > /mnt/raid/wal-archive/%f.gz'
>
> restore_command = 'gunzip -c /mnt/raid/wal-archive/%f.gz > %p.tmp && mv 
> %p.tmp %p'

Yeah, sorry, I goofed here by not considering archive recovery
properly.  I have special handling for crash recovery from files in
pg_wal (XLRO_END, means read until you run out of files) and streaming
replication (XLRO_WALRCV_WRITTEN, means read only as far as the wal
receiver has advertised as written in shared memory), as a way to
control the ultimate limit on how far ahead to read when
maintenance_io_concurrency and max_recovery_prefetch_distance don't
limit you first.  But if you recover from a base backup with a WAL
archive, it uses the XLRO_END policy which can run out of files just
because a new file hasn't been restored yet, so it gives up
prefetching too soon, as you're seeing.  That doesn't cause any
damage, but it stops doing anything useful because the prefetcher
thinks its job is finished.

It'd be possible to fix this somehow in the two-XLogReader design, but
since I'm testing a new version that has a unified
XLogReader-with-read-ahead I'm not going to try to do that.  I've
added a basebackup-with-archive recovery to my arsenal of test
workloads to make sure I don't forget about archive recovery mode
again, but I think it's actually harder to get this wrong in the new
design.  In the meantime, if you are still interested in studying the
potential speed-up from WAL prefetching using the most recently shared
two-XLogReader patch, you'll need to unpack all your archived WAL
files into pg_wal manually beforehand.

Re: A micro-optimisation for walkdir()

2020-09-04 Thread Thomas Munro

On Sat, Sep 5, 2020 at 9:45 AM Andres Freund  wrote:
> On 2020-09-02 17:51:27 +0200, Juan José Santamaría Flecha wrote:
> > + attrib = GetFileAttributes(d->ret.d_name);
>
> Is this really an optimization? The benefit of Thomas' patch is that
> that information sometimes already is there. But here you're doing a
> separate lookup with GetFileAttributes()?

Well as discussed already, our stat() emulation on Windows does
multiple syscalls, so it's a slight improvement.  However, it looks
like we might be missing a further opportunity here...  Doesn't
Windows already give us the flags we need in the dwFileAttributes
member of the WIN32_FIND_DATA object that the Find{First,Next}File()
functions populate?

Re: A micro-optimisation for walkdir()

2020-09-04 Thread Alvaro Herrera

On 2020-Sep-04, Tom Lane wrote:

> I think that it's standard to test for such symbols by seeing
> if they're defined as macros ... not least because that's the *only*
> way to test their existence in C.

I guess since what we're doing is emulating standard readdir(), that
makes sense.

> Personally, what I'd do is lose the enum and just define the macros
> with simple integer constant values.

WFM.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: A micro-optimisation for walkdir()

2020-09-04 Thread Andres Freund

Hi,

On 2020-09-02 17:51:27 +0200, Juan José Santamaría Flecha wrote:
> Win32 could also benefit from this micro-optimisation if we expanded the
> dirent port to include d_type. Please find attached a patch that does
> so
.
>   }
>   strcpy(d->ret.d_name, fd.cFileName);/* Both strings are MAX_PATH 
> long */
>   d->ret.d_namlen = strlen(d->ret.d_name);
> + /*
> +  * The only identifed types are: directory, regular file or symbolic 
> link.
> +  * Errors are treated as a file type that could not be determined.
> +  */
> + attrib = GetFileAttributes(d->ret.d_name);
> + if (attrib == INVALID_FILE_ATTRIBUTES)
> + d->ret.d_type = DT_UNKNOWN;
> + else if ((attrib & FILE_ATTRIBUTE_DIRECTORY) != 0)
> + d->ret.d_type = DT_DIR;
> + else if ((attrib & FILE_ATTRIBUTE_REPARSE_POINT) != 0)
> + d->ret.d_type = DT_LNK;
> + else
> + d->ret.d_type = DT_REG;
>  
>   return >ret;
>  }

Is this really an optimization? The benefit of Thomas' patch is that
that information sometimes already is there. But here you're doing a
separate lookup with GetFileAttributes()?

What am I missing?

Greetings,

Andres Freund

Re: Questionable ping logic in LogicalRepApplyLoop

2020-09-04 Thread Tom Lane

Alvaro Herrera  writes:
> ... oh look!  commit 3f60f690fac1 moved last_recv_timestamp without
> realizing that ping_sent had to get the same treatment.

Hah, I wondered if something like that had happened, but I didn't
get around to excavating in the git history yet.  Thanks for doing so.

Will push a fix later.

regards, tom lane

Re: A micro-optimisation for walkdir()

2020-09-04 Thread Tom Lane

Alvaro Herrera  writes:
> On 2020-Sep-04, Juan José Santamaría Flecha wrote:
>> If will fail to detect that the patch makes the optimisation available for
>> WIN32:
>> 
>> +#if defined(DT_UNKNOWN) && defined(DT_REG) && defined(DT_DIR) &&
>> defined(DT_LNK)

> Oh, I see.  I suggest that it'd be better to change this line instead.

I think that it's standard to test for such symbols by seeing
if they're defined as macros ... not least because that's the *only*
way to test their existence in C.

Personally, what I'd do is lose the enum and just define the macros
with simple integer constant values.

regards, tom lane

Re: PATCH: Batch/pipelining support for libpq

2020-09-04 Thread Alvaro Herrera

On 2020-Aug-31, Matthieu Garrigues wrote:

> It seems like this patch is nearly finished. I fixed all the remaining
> issues. I'm also asking a confirmation of the test scenarios you want
> to see in the next version of the patch.

Hmm, apparently nobody noticed that this patch is not registered in the
current commitfest.

Since it was submitted ahead of the deadline, I'm going to add it there.
(The patch has also undergone a lot of review already; it's certainly
not a newcomer.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Range checks of pg_test_fsync --secs-per-test and pg_test_timing --duration

2020-09-04 Thread Peter Eisentraut


On 2020-08-06 08:27, Michael Paquier wrote:

As $subject says, pg_test_fsync and pg_test_timing don't really check
the range of option values specified.  It is possible for example to
make pg_test_fsync run an infinite amount of time, and pg_test_timing
does not handle overflows with --duration at all.

These are far from being critical issues, but let's fix them at least
on HEAD.  So, please see the attached, where I have also added some
basic TAP tests for both tools.


According to the POSIX standard, atoi() is not required to do any error 
checking, and if you want error checking, you should use strtol().


And if you do that, you might as well change the variables to unsigned 
and use strtoul(), and then drop the checks for <=0.  I would allow 0. 
It's not very useful, but it's not harmful and could be applicable in 
testing.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Redudant initilization

2020-09-04 Thread Ranier Vilela

Em sex., 4 de set. de 2020 às 14:40, Tom Lane  escreveu:

> Bruce Momjian  writes:
> > I have to say, I am kind of stumped why compilers do not warn of such
> > cases, and why we haven't gotten reports about these cases before.
>
> I was just experimenting with clang's "scan-build" tool.  It finds
> all of the cases you just fixed, and several dozen more beside.
> Quite a few are things that, as a matter of style, we should *not*
> change, for instance
>
> rewriteHandler.c:2807:5: warning: Value stored to 'outer_reloids' is never
> read
> outer_reloids =
> list_delete_last(outer_reloids);
> ^
>  ~~~
>
There are some like this, in the analyzes that I did.
Even when it is the last action of the function.


> Failing to update the list pointer here would just be asking for bugs.
> However, I see some that look like genuine oversights; will go fix.
>
Thanks for fixing this.


> (I'm not sure how much I trust scan-build overall.  It produces a
> whole bunch of complaints about null pointer dereferences, for instance.
> If those aren't 99% false positives, we'd be crashing constantly.
> It's also dog-slow.  But it might be something to try occasionally.)
>
I believe it would be very beneficial.

Attached is a patch I made in March/2020, but due to problems,
it was sent but did not make the list.
Would you mind taking a look?
Certainly, if accepted, rebasing would have to be done.

regards,
Ranier Vilela


variables_assigned_unused_value.patch
Description: Binary data

Re: BUG #16419: wrong parsing BC year in to_date() function

2020-09-04 Thread Peter Eisentraut


On 2020-09-04 21:45, David G. Johnston wrote:
On Thu, Sep 3, 2020 at 6:21 PM Bruce Momjian > wrote:


On Wed, Jul 15, 2020 at 09:26:53AM -0700, David G. Johnston wrote:

 > Whether to actually change the behavior of to_date is up for
debate though I
 > would presume it would not be back-patched.

OK, so, looking at this thread, we have to_date() treating -1 as -2 BC,
make_date() treating -1 as 1 BC, and we have Oracle, which to_date() is
supposed to match, making -1 as 1 BC.

Because we already have the to_date/make_date inconsistency, and the -1
to -2 BC mapping is confusing, and doesn't match Oracle, I think the
clean solution is to change PG 14 to treat -1 as 1 BC, and document the
incompatibility in the release notes.


I agree that someone else should write another patch to fix the behavior 
for v14.  Still suggest committing the proposed patch to master and all 
supported versions to document the existing behavior correctly.  The fix 
patch can work from that.


Adding support for negative years in make_timestamp seems pretty 
straightforward; see attached patch.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From ef9b0cee4aee60268d4c72e8452e74e7bbc18ed1 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 4 Sep 2020 23:02:09 +0200
Subject: [PATCH] Support negative years in make_timestamp()

make_date() already supported it; adding it was trivial.

Bug: #16419
Discussion: 
https://www.postgresql.org/message-id/flat/16419-d8d9db0a7553f...@postgresql.org
---
 src/backend/utils/adt/timestamp.c   | 14 +-
 src/test/regress/expected/date.out  |  2 ++
 src/test/regress/expected/timestamp.out | 11 ++-
 src/test/regress/sql/date.sql   |  1 +
 src/test/regress/sql/timestamp.sql  |  5 -
 5 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/src/backend/utils/adt/timestamp.c 
b/src/backend/utils/adt/timestamp.c
index 5fe304cea7..4128e3a739 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -556,17 +556,21 @@ make_timestamp_internal(int year, int month, int day,
TimeOffset  date;
TimeOffset  time;
int dterr;
+   boolbc = false;
Timestamp   result;
 
tm.tm_year = year;
tm.tm_mon = month;
tm.tm_mday = day;
 
-   /*
-* Note: we'll reject zero or negative year values.  Perhaps negatives
-* should be allowed to represent BC years?
-*/
-   dterr = ValidateDate(DTK_DATE_M, false, false, false, );
+   /* Handle negative years as BC */
+   if (tm.tm_year < 0)
+   {
+   bc = true;
+   tm.tm_year = -tm.tm_year;
+   }
+
+   dterr = ValidateDate(DTK_DATE_M, false, false, bc, );
 
if (dterr != 0)
ereport(ERROR,
diff --git a/src/test/regress/expected/date.out 
b/src/test/regress/expected/date.out
index 4cdf1635f2..c03329fb5a 100644
--- a/src/test/regress/expected/date.out
+++ b/src/test/regress/expected/date.out
@@ -1439,6 +1439,8 @@ select make_time(8, 20, 0.0);
 (1 row)
 
 -- should fail
+select make_date(0, 7, 15);
+ERROR:  date field value out of range: 0-07-15
 select make_date(2013, 2, 30);
 ERROR:  date field value out of range: 2013-02-30
 select make_date(2013, 13, 1);
diff --git a/src/test/regress/expected/timestamp.out 
b/src/test/regress/expected/timestamp.out
index 5f97505a30..9655116090 100644
--- a/src/test/regress/expected/timestamp.out
+++ b/src/test/regress/expected/timestamp.out
@@ -1704,9 +1704,18 @@ SELECT '' AS to_char_12, to_char(d, 'FF1 FF2 FF3 FF4 FF5 
FF6  ff1 ff2 ff3 ff4 ff
 (4 rows)
 
 -- timestamp numeric fields constructor
-SELECT make_timestamp(2014,12,28,6,30,45.887);
+SELECT make_timestamp(2014, 12, 28, 6, 30, 45.887);
 make_timestamp
 --
  Sun Dec 28 06:30:45.887 2014
 (1 row)
 
+SELECT make_timestamp(-44, 3, 15, 12, 30, 15);
+   make_timestamp
+-
+ Fri Mar 15 12:30:15 0044 BC
+(1 row)
+
+-- should fail
+select make_timestamp(0, 7, 15, 12, 30, 15);
+ERROR:  date field value out of range: 0-07-15
diff --git a/src/test/regress/sql/date.sql b/src/test/regress/sql/date.sql
index 1c3adf70ce..77f94ed27b 100644
--- a/src/test/regress/sql/date.sql
+++ b/src/test/regress/sql/date.sql
@@ -335,6 +335,7 @@ CREATE TABLE DATE_TBL (f1 date);
 select make_date(-44, 3, 15);
 select make_time(8, 20, 0.0);
 -- should fail
+select make_date(0, 7, 15);
 select make_date(2013, 2, 30);
 select make_date(2013, 13, 1);
 select make_date(2013, 11, -1);
diff --git a/src/test/regress/sql/timestamp.sql 
b/src/test/regress/sql/timestamp.sql
index 7b58c3cfa5..727ee50084 100644
--- a/src/test/regress/sql/timestamp.sql
+++ b/src/test/regress/sql/timestamp.sql
@@ -240,4 +240,7 @@ CREATE TABLE

Re: Any objections to implementing LogicalDecodeMessageCB for pgoutput?

2020-09-04 Thread Cary Huang

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Hi

I have tried the patch and it functions as described. The attached tap test 
case is comprehensive and is passing. However, the patch does not apply well on 
the current master; I had to checkout to a much earlier commit to be able to 
patch correctly. The patch will need to be rebased to the current master.

Thanks

Cary Huang
-
HighGo Software Inc. (Canada)
cary.hu...@highgo.ca
www.highgo.ca

Re: A micro-optimisation for walkdir()

2020-09-04 Thread Alvaro Herrera

On 2020-Sep-04, Juan José Santamaría Flecha wrote:

> If will fail to detect that the patch makes the optimisation available for
> WIN32:
> 
> +#if defined(DT_UNKNOWN) && defined(DT_REG) && defined(DT_DIR) &&
> defined(DT_LNK)

Oh, I see.  I suggest that it'd be better to change this line instead.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Questionable ping logic in LogicalRepApplyLoop

2020-09-04 Thread Alvaro Herrera

On 2020-Sep-04, Tom Lane wrote:

> While playing around with clang's scan-build I noticed this warning:
> 
> worker.c:2281:7: warning: Value stored to 'ping_sent' is never read
> ping_sent = true;
> ^   
> 
> At first I thought it was a harmless unnecessary update, but looking
> closer I wonder whether it isn't telling us there is a logic bug here.
> Specifically, I wonder why the "ping_sent" variable is local to the
> loop starting at line 2084, rather than having the same lifespan as
> "last_recv_timestamp".  Do we really want to forget that we sent a
> ping anytime we have to wait for more data?

Ah ... maybe this bug is the reason why the bug fixed by 470687b4a5bb
did not affect logical replication.

> In short, we could remove the ping_sent variable entirely without
> changing this code's behavior.  I'm not 100% sure what semantics
> it's trying to achieve, but I don't think it's achieving them.

I imagine that moving the variable one block-scope outwards (together
with last_recv_timestamp) is what was intended.

... oh look!  commit 3f60f690fac1 moved last_recv_timestamp without
realizing that ping_sent had to get the same treatment.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

2020-09-04 Thread Peter Eisentraut


On 2020-08-05 05:00, David Rowley wrote:

The 5GB scaled TPC-H test does show some performance gains from the v4
patch and shows an obvious regression from removing the unlikely()
calls too.

Based, mostly on the TPC-H results where performance did improve close
to 2%, I'm starting to think it would be a good idea just to go for
the v4 patch.  It means that future hot elog/ereport calls should make
it into the cold path.


Something based on the v4 patch makes sense.

I would add DEBUG1 back into the conditional, like

if (__builtin_constant_p(elevel) && ((elevel) >= ERROR || (elevel) <= 
DEBUG1) ? \


Also, for the __has_attribute handling, I'd prefer the style that Andres 
illustrated earlier, using:


#ifndef __has_attribute
#define __has_attribute(attribute) 0
#endif

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: A micro-optimisation for walkdir()

2020-09-04 Thread Juan José Santamaría Flecha

On Fri, Sep 4, 2020 at 10:28 PM Alvaro Herrera 
wrote:

> On 2020-Sep-04, Juan José Santamaría Flecha wrote:
>
> > On Fri, Sep 4, 2020 at 9:37 PM Alvaro Herrera 
> > wrote:
> >
> > > On 2020-Sep-04, Thomas Munro wrote:
>
> > > >
> > > > +/* File types for 'd_type'. */
> > > > +enum
> > > > +  {
> > > > + DT_UNKNOWN = 0,
> > > > +# define DT_UNKNOWN  DT_UNKNOWN
> > >
> > > Uhm ... what do these #defines do?  They look a bit funny.
> > >
> > > Would it make sense to give this enum a name, and then use that name in
> > > struct dirent's definition, instead of unsigned char?
> >
> > They mimic POSIX dirent.h. I would rather stick to that.
>
> Ah ... they do?
>
> If you remove the #define lines, what happens to your patch?
>

If will fail to detect that the patch makes the optimisation available for
WIN32:

+#if defined(DT_UNKNOWN) && defined(DT_REG) && defined(DT_DIR) &&
defined(DT_LNK)

Regards,

Juan José Santamaría Flecha

Re: A micro-optimisation for walkdir()

2020-09-04 Thread Alvaro Herrera

On 2020-Sep-04, Juan José Santamaría Flecha wrote:

> On Fri, Sep 4, 2020 at 9:37 PM Alvaro Herrera 
> wrote:
> 
> > On 2020-Sep-04, Thomas Munro wrote:

> > >
> > > +/* File types for 'd_type'. */
> > > +enum
> > > +  {
> > > + DT_UNKNOWN = 0,
> > > +# define DT_UNKNOWN  DT_UNKNOWN
> >
> > Uhm ... what do these #defines do?  They look a bit funny.
> >
> > Would it make sense to give this enum a name, and then use that name in
> > struct dirent's definition, instead of unsigned char?
> 
> They mimic POSIX dirent.h. I would rather stick to that.

Ah ... they do?

If you remove the #define lines, what happens to your patch?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Comments related to " buffer descriptors“ cache line size"

2020-09-04 Thread Thomas Munro

On Sat, Sep 5, 2020 at 5:28 AM Bruce Momjian  wrote:
> On Fri, Sep  4, 2020 at 11:31:55PM +0800, Kelly Min wrote:
> > I found a comment error in buffer descriptors comment. The cache line size 
> > is
> > 64 bytes, not 64 bits
>
> Thanks, fix applied to all active branches.

I find it a bit strange that our PG_CACHE_LINE_SIZE macro is defined
to be 128.  Whether that exaggerated number (unless you're on POWER,
where it's true) makes sense depends on what your goals are, as this
usage of hard-coded 64 shows...

Re: BUG #16419: wrong parsing BC year in to_date() function

2020-09-04 Thread Bruce Momjian

On Fri, Sep  4, 2020 at 12:45:36PM -0700, David G. Johnston wrote:
> On Thu, Sep 3, 2020 at 6:21 PM Bruce Momjian  wrote:
> 
> On Wed, Jul 15, 2020 at 09:26:53AM -0700, David G. Johnston wrote:
> 
> > Whether to actually change the behavior of to_date is up for debate
> though I
> > would presume it would not be back-patched.
> 
> OK, so, looking at this thread, we have to_date() treating -1 as -2 BC,
> make_date() treating -1 as 1 BC, and we have Oracle, which to_date() is
> supposed to match, making -1 as 1 BC.
> 
> Because we already have the to_date/make_date inconsistency, and the -1
> to -2 BC mapping is confusing, and doesn't match Oracle, I think the
> clean solution is to change PG 14 to treat -1 as 1 BC, and document the
> incompatibility in the release notes.
> 
> 
> I agree that someone else should write another patch to fix the behavior for
> v14.  Still suggest committing the proposed patch to master and all supported
> versions to document the existing behavior correctly.  The fix patch can work
> from that.

I think we need to apply the patches to all branches at the same time. 
I am not sure we want to document a behavior we know will change in PG
14.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: [Patch] Add missing libraries to Libs.private of libpq.pc

2020-09-04 Thread Peter Eisentraut


On 2020-07-10 21:47, Peter Eisentraut wrote:

On 2020-04-08 11:38, Sandro Mani wrote:

The following patch, which we added to build mingw-postgresql on Fedora,
adds some missing libraries to Libs.private of libpq.pc, discovered when
attempting to statically link with libpq:

-lz: is required by -lcrypto


I think the correct fix for that would be to add libssl to libpq's
Requires.private.


For that, I propose the attached patch.


-liconv: is required by -lintl (though possibly depends on whether
gettext was compiled with iconv support)


I think the solution here would be to have gettext provide a pkg-config 
file.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 358e23268fbd42989a64d3326678c2c83326aa77 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 4 Sep 2020 22:03:49 +0200
Subject: [PATCH] Add libpq's openssl dependencies to pkg-config file

Add libssl and libcrypto to libpq.pc's Requires.private.  This allow
static linking to work if those libssl or libcrypto themselves have
dependencies in their *.private fields, such as -lz in some cases.

Reported-by: Sandro Mani 
Discussion: 
https://www.postgresql.org/message-id/flat/837d1dcf-2fca-ee6e-0d7e-6bce1a1ba...@gmail.com
---
 src/interfaces/libpq/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/interfaces/libpq/Makefile b/src/interfaces/libpq/Makefile
index d4919970f8..4ac5f4b340 100644
--- a/src/interfaces/libpq/Makefile
+++ b/src/interfaces/libpq/Makefile
@@ -89,6 +89,8 @@ SHLIB_PREREQS = submake-libpgport
 
 SHLIB_EXPORTS = exports.txt
 
+PKG_CONFIG_REQUIRES_PRIVATE = libssl libcrypto
+
 all: all-lib
 
 # Shared library stuff
-- 
2.28.0

Re: A micro-optimisation for walkdir()

2020-09-04 Thread Juan José Santamaría Flecha

On Fri, Sep 4, 2020 at 9:37 PM Alvaro Herrera 
wrote:

> On 2020-Sep-04, Thomas Munro wrote:
>
> > @@ -10,6 +10,7 @@ struct dirent
> >  {
> >   longd_ino;
> >   unsigned short d_reclen;
> > + unsigned char d_type;
> >   unsigned short d_namlen;
> >   chard_name[MAX_PATH];
> >  };
> > @@ -20,4 +21,26 @@ DIR   *opendir(const char *);
> >  struct dirent *readdir(DIR *);
> >  int  closedir(DIR *);
> >
> > +/* File types for 'd_type'. */
> > +enum
> > +  {
> > + DT_UNKNOWN = 0,
> > +# define DT_UNKNOWN  DT_UNKNOWN
>
> Uhm ... what do these #defines do?  They look a bit funny.
>
> Would it make sense to give this enum a name, and then use that name in
> struct dirent's definition, instead of unsigned char?
>

They mimic POSIX dirent.h. I would rather stick to that.

Regards,

Juan José Santamaría Flecha

Re: BUG #16419: wrong parsing BC year in to_date() function

2020-09-04 Thread David G. Johnston

On Thu, Sep 3, 2020 at 6:21 PM Bruce Momjian  wrote:

> On Wed, Jul 15, 2020 at 09:26:53AM -0700, David G. Johnston wrote:
>
> > Whether to actually change the behavior of to_date is up for debate
> though I
> > would presume it would not be back-patched.
>
> OK, so, looking at this thread, we have to_date() treating -1 as -2 BC,
> make_date() treating -1 as 1 BC, and we have Oracle, which to_date() is
> supposed to match, making -1 as 1 BC.
>
> Because we already have the to_date/make_date inconsistency, and the -1
> to -2 BC mapping is confusing, and doesn't match Oracle, I think the
> clean solution is to change PG 14 to treat -1 as 1 BC, and document the
> incompatibility in the release notes.
>

I agree that someone else should write another patch to fix the behavior
for v14.  Still suggest committing the proposed patch to master and all
supported versions to document the existing behavior correctly.  The fix
patch can work from that.

David J.

Questionable ping logic in LogicalRepApplyLoop

2020-09-04 Thread Tom Lane

While playing around with clang's scan-build I noticed this warning:

worker.c:2281:7: warning: Value stored to 'ping_sent' is never read
ping_sent = true;
^   

At first I thought it was a harmless unnecessary update, but looking
closer I wonder whether it isn't telling us there is a logic bug here.
Specifically, I wonder why the "ping_sent" variable is local to the
loop starting at line 2084, rather than having the same lifespan as
"last_recv_timestamp".  Do we really want to forget that we sent a
ping anytime we have to wait for more data?

In fact, looking closer, it appears to me that

(1) The "ping_sent = false" at line 2124 is also dead code, because
ping_sent could never be anything but false at this point;

(2) The "if (!ping_sent)" at line 2274 is also dead code, because
ping_sent is still never anything but false at that point.

In short, we could remove the ping_sent variable entirely without
changing this code's behavior.  I'm not 100% sure what semantics
it's trying to achieve, but I don't think it's achieving them.

regards, tom lane

Re: Concurrent failure in create_am with buildfarm member conchuela

2020-09-04 Thread Alvaro Herrera

Hello,

On 2020-Sep-04, Michael Paquier wrote:

> conchuela has just reported the following error:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela=2020-09-03%2023%3A00%3A36

Fixed in d54f99e4154.

Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: A micro-optimisation for walkdir()

2020-09-04 Thread Alvaro Herrera

On 2020-Sep-04, Thomas Munro wrote:

> @@ -10,6 +10,7 @@ struct dirent
>  {
>   longd_ino;
>   unsigned short d_reclen;
> + unsigned char d_type;
>   unsigned short d_namlen;
>   chard_name[MAX_PATH];
>  };
> @@ -20,4 +21,26 @@ DIR   *opendir(const char *);
>  struct dirent *readdir(DIR *);
>  int  closedir(DIR *);
>  
> +/* File types for 'd_type'. */
> +enum
> +  {
> + DT_UNKNOWN = 0,
> +# define DT_UNKNOWN  DT_UNKNOWN

Uhm ... what do these #defines do?  They look a bit funny.

Would it make sense to give this enum a name, and then use that name in
struct dirent's definition, instead of unsigned char?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Disk-based hash aggregate's cost model

2020-09-04 Thread Jeff Davis

On Fri, 2020-09-04 at 21:01 +0200, Tomas Vondra wrote:
> Wouldn't it be enough to just use a slot with smaller tuple
> descriptor?
> All we'd need to do is creating the descriptor in ExecInitAgg after
> calling find_hash_columns, and using it for rslot/wslot, and then
> "mapping" the attributes in hashagg_spill_tuple (which already almost
> does that, to the extra cost should be 0) and when reading the
> spilled
> tuples.

That's a good point, it's probably not much code to make it work.

> So I'm not quite buying the argument that this would make
> measurable difference ...

I meant "projection of all input tuples" (i.e. CP_SMALL_TLIST) has a
cost. If we project only at spill time, it should be fine.

Regards,
Jeff Davis

Re: Improving connection scalability: GetSnapshotData()

2020-09-04 Thread Andres Freund

On 2020-09-04 11:53:04 -0700, Andres Freund wrote:
> There's a seperate benchmark that I found to be quite revealing that's
> far less dependent on scheduler behaviour. Run two pgbench instances:
> 
> 1) With a very simply script '\sleep 1s' or such, and many connections
>(e.g. 100,1000,5000). That's to simulate connections that are
>currently idle.
> 2) With a normal pgbench read only script, and low client counts.
> 
> Before the changes 2) shows a very sharp decline in performance when the
> count in 1) increases. Afterwards its pretty much linear.
> 
> I think this benchmark actually is much more real world oriented - due
> to latency and client side overheads it's very normal to have a large
> fraction of connections idle in read mostly OLTP workloads.
> 
> Here's the result on my workstation (2x Xeon Gold 5215 CPUs), testing
> 1f42d35a1d6144a23602b2c0bc7f97f3046cf890 against
> 07f32fcd23ac81898ed47f88beb569c631a2f223 which are the commits pre/post
> connection scalability changes.
> 
> I used fairly short pgbench runs (15s), and the numbers are the best of
> three runs. I also had emacs and mutt open - some noise to be
> expected. But I also gotta work ;)
> 
> | Idle Connections | Active Connections | TPS pre | TPS post |
> |-:|---:|:|-:|
> |0 |  1 |   33599 |33406 |
> |  100 |  1 |   31088 |33279 |
> | 1000 |  1 |   29377 |33434 |
> | 2500 |  1 |   27050 |33149 |
> | 5000 |  1 |   21895 |33903 |
> |1 |  1 |   16034 |33140 |
> |0 | 48 | 1042005 |  1125104 |
> |  100 | 48 |  986731 |  1103584 |
> | 1000 | 48 |  854230 |  1119043 |
> | 2500 | 48 |  716624 |  1119353 |
> | 5000 | 48 |  553657 |  1119476 |
> |1 | 48 |  369845 |  1115740 |

Attached in graph form.

Greetings,

Andres Freund

Re: pg_dump bug for extension owned tables

2020-09-04 Thread Fabrízio de Royes Mello

On Fri, Sep 4, 2020 at 3:00 PM Andrew Dunstan <
andrew.duns...@2ndquadrant.com> wrote:
>
>
> Thanks, Committed. Further investigation shows this was introduced in
> release 12, so that's how far back I went.
>

Thanks!

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: Disk-based hash aggregate's cost model

2020-09-04 Thread Tomas Vondra


On Fri, Sep 04, 2020 at 11:31:36AM -0700, Jeff Davis wrote:

On Fri, 2020-09-04 at 14:56 +0200, Tomas Vondra wrote:

Those charts show that the CP_SMALL_TLIST resulted in smaller temp
files
(per EXPLAIN ANALYZE the difference is ~25%) and also lower query
durations (also in the ~25% range).


I was able to reproduce the problem, thank you.

Only two attributes are needed, so the CP_SMALL_TLIST projected schema
only needs a single-byte null bitmap.

But if just setting the attributes to NULL rather than projecting them,
the null bitmap size is based on all 16 attributes, bumping the bitmap
size to two bytes.

MAXALIGN(23 + 1) = 24
MAXALIGN(23 + 2) = 32

I think that explains it. It's not ideal, but projection has a cost as
well, so I don't think we necessarily need to do something here.

If we are motivated to improve this in v14, we could potentially have a
different schema for spilled tuples, and perform real projection at
spill time. But I don't know if that's worth the extra complexity.



Thanks for the investigation and explanation.

Wouldn't it be enough to just use a slot with smaller tuple descriptor?
All we'd need to do is creating the descriptor in ExecInitAgg after
calling find_hash_columns, and using it for rslot/wslot, and then
"mapping" the attributes in hashagg_spill_tuple (which already almost
does that, to the extra cost should be 0) and when reading the spilled
tuples. So I'm not quite buying the argument that this would make
measurable difference ...

That being said, I won't insist on fixing this in v13 - at least we know
what the issue is and we can fix it later. The costing seems like a more
serious open item.

OTOH I don't think this example is particularly extreme, and I wouldn't
be surprised if we se even worse examples in practice - tables tend to
be quite wide and aggregation of just a few columns seems likely.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: report expected contrecord size

2020-09-04 Thread Alvaro Herrera

On 2020-Sep-03, Tom Lane wrote:

> Alvaro Herrera  writes:
> > On 2020-Sep-03, Tom Lane wrote:
> >> Uh ... is it really possible for gotlen to be more than total_len?
> >> (I've not looked at the surrounding code here, but that seems weird.)
> 
> > Well, as I understand, total_len comes from one page, and gotlen comes
> > from the continuation record(s) in the next page(s) of WAL.  So if
> > things are messed up, it could happen.  (This *is* the code that
> > validates the record, so it can't make too many assumptions.)
> 
> Got it.  No further objections.

Many thanks for looking!  Pushed now.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Improving connection scalability: GetSnapshotData()

2020-09-04 Thread Andres Freund

Hi,

On 2020-09-04 18:24:12 +0300, Konstantin Knizhnik wrote:
> Reported results looks very impressive.
> But I tried to reproduce them and didn't observed similar behavior.
> So I am wondering what can be the difference and what I am doing wrong.

That is odd - I did reproduce it on quite a few systems by now.

> Configuration file has the following differences with default postgres config:
> 
> max_connections = 1   # (change requires restart)
> shared_buffers = 8GB  # min 128kB

I also used huge_pages=on / configured them on the OS level. Otherwise
TLB misses will be a significant factor.

Does it change if you initialize the test database using
PGOPTIONS='-c vacuum_freeze_min_age=0' pgbench -i -s 100
or run a manual VACUUM FREEZE; after initialization?

> I have tried two different systems.
> First one is IBM Power2 server with 384 cores and 8Tb of RAM.
> I run the same read-only pgbench test as you. I do not think that size of the 
> database is matter, so I used scale 100 -
> it seems to be enough to avoid frequent buffer conflicts.
> Then I run the same scripts as you:
>
>  for ((n=100; n < 1000; n+=100)); do echo $n; pgbench -M prepared -c $n -T 
> 100 -j $n -M prepared -S -n postgres ;  done
>  for ((n=1000; n <= 5000; n+=1000)); do echo $n; pgbench -M prepared -c $n -T 
> 100 -j $n -M prepared -S -n postgres ;  done
>
>
> I have compared current master with version of Postgres prior to your commits 
> with scalability improvements: a9a4a7ad56

Hm, it'd probably be good to compare commits closer to the changes, to
avoid other changes showing up.

Hm - did you verify if all the connections were actually established?
Particularly without the patch applied? With an unmodified pgbench, I
sometimes saw better numbers, but only because only half the connections
were able to be established, due to ProcArrayLock contention.

See 
https://www.postgresql.org/message-id/20200227180100.zyvjwzcpiokfsqm2%40alap3.anarazel.de

There also is the issue that pgbench numbers for inclusive/exclusive are
just about meaningless right now:
https://www.postgresql.org/message-id/20200227202636.qaf7o6qcajsudoor%40alap3.anarazel.de
(reminds me, need to get that fixed)

One more thing worth investigating is whether your results change
significantly when you start the server using
numactl --interleave=all .
Especially on larger systems the results otherwise can vary a lot from
run-to-run, because the placement of shared buffers matters a lot.

> So I have repeated experiments at Intel server.
> It has 160 cores Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz and 256Gb of RAM.
>
> The same database, the same script, results are the following:
>
> Clients   old/inc old/exl new/inc new/exl
> 1000  1105750 1163292 1206105 1212701
> 2000  1050933 1124688 1149706 1164942
> 3000  1063667 1195158 1118087 1144216
> 4000  1040065 1290432 1107348 1163906
> 5000  943813  1258643 1103790 1160251

> I have separately show results including/excluding connection connections 
> establishing,
> because in new version there are almost no differences between them,
> but for old version gap between them is noticeable.
>
> Configuration file has the following differences with default postgres config:
>
> max_connections = 1   # (change requires restart)
> shared_buffers = 8GB  # min 128kB

>
> This results contradict with yours and makes me ask the following questions:

> 1. Why in your case performance is almost two times larger (2 millions vs 1)?
> The hardware in my case seems to be at least not worser than yours...
> May be there are some other improvements in the version you have tested which 
> are not yet committed to master?

No, no uncommitted changes, except for the pgbench stuff mentioned
above. However I found that the kernel version matters a fair bit, it's
pretty easy to run into kernel scalability issues in a workload that is
this heavy scheduler dependent.

Did you connect via tcp or unix socket? Was pgbench running on the same
machine? It was locally via unix socket for me (but it's also observable
via two machines, just with lower overall throughput).

Did you run a profile to see where the bottleneck is?

There's a seperate benchmark that I found to be quite revealing that's
far less dependent on scheduler behaviour. Run two pgbench instances:

1) With a very simply script '\sleep 1s' or such, and many connections
   (e.g. 100,1000,5000). That's to simulate connections that are
   currently idle.
2) With a normal pgbench read only script, and low client counts.

Before the changes 2) shows a very sharp decline in performance when the
count in 1) increases. Afterwards its pretty much linear.

I think this benchmark actually is much more real world oriented - due
to latency and client side overheads it's very normal

Re: Disk-based hash aggregate's cost model

2020-09-04 Thread Jeff Davis

On Fri, 2020-09-04 at 14:56 +0200, Tomas Vondra wrote:
> Those charts show that the CP_SMALL_TLIST resulted in smaller temp
> files
> (per EXPLAIN ANALYZE the difference is ~25%) and also lower query
> durations (also in the ~25% range).

I was able to reproduce the problem, thank you.

Only two attributes are needed, so the CP_SMALL_TLIST projected schema
only needs a single-byte null bitmap.

But if just setting the attributes to NULL rather than projecting them,
the null bitmap size is based on all 16 attributes, bumping the bitmap
size to two bytes.

MAXALIGN(23 + 1) = 24
MAXALIGN(23 + 2) = 32

I think that explains it. It's not ideal, but projection has a cost as
well, so I don't think we necessarily need to do something here.

If we are motivated to improve this in v14, we could potentially have a
different schema for spilled tuples, and perform real projection at
spill time. But I don't know if that's worth the extra complexity.

Regards,
Jeff Davis

Re: Global snapshots

2020-09-04 Thread Alexey Kondratov


Hi,

On 2020-07-27 09:44, Andrey V. Lepikhov wrote:

On 7/27/20 11:22 AM, tsunakawa.ta...@fujitsu.com wrote:


US8356007B2 - Distributed transaction management for database systems 
with multiversioning - Google Patents

https://patents.google.com/patent/US8356007


If it is, can we circumvent this patent?



Thank you for the research (and previous links too).
I haven't seen this patent before. This should be carefully studied.


I had a look on the patch set, although it is quite outdated, especially 
on 0003.


Two thoughts about 0003:

First, IIUC atomicity of the distributed transaction in the postgres_fdw 
is achieved by the usage of 2PC. I think that this postgres_fdw 2PC 
support should be separated from global snapshots. It could be useful to 
have such atomic distributed transactions even without a proper 
visibility, which is guaranteed by the global snapshot. Especially 
taking into account the doubts about Clock-SI and general questions 
about algorithm choosing criteria above in the thread.


Thus, I propose to split 0003 into two parts and add a separate GUC 
'postgres_fdw.use_twophase', which could be turned on independently from 
'postgres_fdw.use_global_snapshots'. Of course if the latter is enabled, 
then 2PC should be forcedly turned on as well.


Second, there are some problems with errors handling in the 0003 (thanks 
to Arseny Sher for review).


+error:
+   if (!res)
+   {
+   sql = psprintf("ABORT PREPARED '%s'", 
fdwTransState->gid);
+   BroadcastCmd(sql);
+   elog(ERROR, "Failed to PREPARE transaction on remote 
node");
+   }

It seems that we should never reach this point, just because 
BroadcastStmt will throw an ERROR if it fails to prepare transaction on 
the foreign server:


+   if (PQresultStatus(result) != expectedStatus ||
+   (handler && !handler(result, arg)))
+   {
+elog(WARNING, "Failed command %s: status=%d, expected status=%d", 
sql, PQresultStatus(result), expectedStatus);

+   pgfdw_report_error(ERROR, result, entry->conn, 
true, sql);
+   allOk = false;
+   }

Moreover, It doesn't make much sense to try to abort prepared xacts, 
since if we failed to prepare it somewhere, then some foreign servers 
may become unavailable already and this doesn't provide us a 100% 
guarantee of clean up.


+   /* COMMIT open transaction of we were doing 2PC */
+   if (fdwTransState->two_phase_commit &&
+   (event == XACT_EVENT_PARALLEL_COMMIT || event == 
XACT_EVENT_COMMIT))
+   {
+   BroadcastCmd(psprintf("COMMIT PREPARED '%s'", 
fdwTransState->gid));
+   }

At this point, the host (local) transaction is already committed and 
there is no way to abort it gracefully. However, BroadcastCmd may rise 
an ERROR that will cause a PANIC, since it is non-recoverable state:


PANIC:  cannot abort transaction 487, it was already committed

Attached is a patch, which implements a plain 2PC in the postgres_fdw 
and adds a GUC 'postgres_fdw.use_twophase'. Also it solves these errors 
handling issues above and tries to add proper comments everywhere. I 
think, that 0003 should be rebased on the top of it, or it could be a 
first patch in the set, since it may be used independently. What do you 
think?



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres CompanyFrom debdffade7abcdbf29031bda6c8359a89776ad36 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Fri, 7 Aug 2020 16:50:57 +0300
Subject: [PATCH] Add postgres_fdw.use_twophase GUC to use 2PC for transactions
 involving several servers.

---
 contrib/postgres_fdw/connection.c   | 234 +---
 contrib/postgres_fdw/postgres_fdw.c |  17 ++
 contrib/postgres_fdw/postgres_fdw.h |   2 +
 3 files changed, 228 insertions(+), 25 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 08daf26fdf0..d18fdd1f94e 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -66,6 +66,20 @@ typedef struct ConnCacheEntry
  */
 static HTAB *ConnectionHash = NULL;
 
+/*
+ * FdwTransactionState
+ *
+ * Holds number of open remote transactions and shared state
+ * needed for all connection entries.
+ */
+typedef struct FdwTransactionState
+{
+	char	   *gid;
+	int			nparticipants;
+	bool		two_phase_commit;
+} FdwTransactionState;
+static FdwTransactionState *fdwTransState;
+
 /* for assigning cursor numbers and prepared statement numbers */
 static unsigned int cursor_number = 0;
 static unsigned int prep_stmt_number = 0;
@@ -73,6 +87,9 @@ static unsigned int prep_stmt_number = 0;
 /* tracks whether any work is needed in callback functions */
 static bool xact_got_connection = false;

Re: pg_dump bug for extension owned tables

2020-09-04 Thread Andrew Dunstan

On Thu, Aug 6, 2020 at 4:12 PM Fabrízio de Royes Mello
 wrote:
>
>
> On Mon, Jul 13, 2020 at 6:18 PM Fabrízio de Royes Mello 
>  wrote:
>>
>>
>> On Mon, Jul 13, 2020 at 5:05 PM Andrew Dunstan 
>>  wrote:
>> >
>> >
>> > On 7/13/20 2:46 PM, Fabrízio de Royes Mello wrote:
>> > >
>> > >
>> > > >
>> > > > yeah, that's the fix I came up with too. The only thing I added was
>> > > > "Assert(tbinfo->attgenerated);" at about line 2097.
>> > > >
>> > >
>> > > Cool.
>> > >
>> > > >
>> > > > Will wait for your TAP tests.
>> > > >
>> > >
>> > > Actually I've sent it already but it seems to have gone to the
>> > > moderation queue.
>> > >
>> > > Anyway attached with your assertion and TAP tests.
>> > >
>> > >
>> >
>> >
>> >
>> > Thanks, that all seems fine. The TAP test changes are a bit of a pain in
>> > the neck before release 11, so I think I'll just do those back that far,
>> > but the main fix for all live branches.
>> >
>>
>> Sounds good to me.
>>
>
> Just added to the next commitfest [1] to make sure we'll not lose it.
>
> Regards,
>
> [1] https://commitfest.postgresql.org/29/2671/
>


Thanks, Committed. Further investigation shows this was introduced in
release 12, so that's how far back I went.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Redudant initilization

2020-09-04 Thread Tom Lane

Bruce Momjian  writes:
> I have to say, I am kind of stumped why compilers do not warn of such
> cases, and why we haven't gotten reports about these cases before.

I was just experimenting with clang's "scan-build" tool.  It finds
all of the cases you just fixed, and several dozen more beside.
Quite a few are things that, as a matter of style, we should *not*
change, for instance

rewriteHandler.c:2807:5: warning: Value stored to 'outer_reloids' is never read
outer_reloids = list_delete_last(outer_reloids);
^   ~~~

Failing to update the list pointer here would just be asking for bugs.
However, I see some that look like genuine oversights; will go fix.

(I'm not sure how much I trust scan-build overall.  It produces a
whole bunch of complaints about null pointer dereferences, for instance.
If those aren't 99% false positives, we'd be crashing constantly.
It's also dog-slow.  But it might be something to try occasionally.)

regards, tom lane

Re: Improving connection scalability: GetSnapshotData()

2020-09-04 Thread Andres Freund

Hi,

On 2020-09-03 17:18:29 +0900, Michael Paquier wrote:
> On Sun, Aug 16, 2020 at 02:26:57PM -0700, Andres Freund wrote:
> > So we get some builfarm results while thinking about this.
> 
> Andres, there is an entry in the CF for this thread:
> https://commitfest.postgresql.org/29/2500/
> 
> A lot of work has been committed with 623a9ba, 73487a6, 5788e25, etc.
> Now that PGXACT is done, how much work is remaining here?

I think it's best to close the entry. There's substantial further wins
possible, in particular not acquiring ProcArrayLock in GetSnapshotData()
when the cache is valid improves performance substantially. But it's
non-trivial enough that it's probably best dealth with in a separate
patch / CF entry.

Closed.

Greetings,

Andres Freund

Re: Change a constraint's index - ALTER TABLE ... ALTER CONSTRAINT ... USING INDEX ...

2020-09-04 Thread Alvaro Herrera

On 2020-Sep-04, Laurenz Albe wrote:

> On Fri, 2020-09-04 at 10:41 -0400, Alvaro Herrera wrote:
> > > The value I see in this is:
> > > - replacing a primary key index
> > > - replacing the index behind a constraint targeted by a foreign key
> > 
> > But why is this better than using REINDEX CONCURRENTLY?
> 
> It is not better, but it can be used to replace a constraint index
> with an index with a different INCLUDE clause, which is something
> that cannot easily be done otherwise.

I can see that there is value in having an index that serves both a
uniqueness constraint and coverage purposes.  But this seems a pretty
roundabout way to get that -- I think you should have to do "CREATE
UNIQUE INDEX ... INCLUDING ..." instead.  That way, the fact that this
is a Postgres extension remains clear.

55432 14devel 24138=# create table foo (a int not null, b int not null, c int);
CREATE TABLE
Duración: 1,775 ms
55432 14devel 24138=# create unique index on foo (a, b) include (c);
CREATE INDEX
Duración: 1,481 ms
55432 14devel 24138=# create table bar (a int not null, b int not null, foreign 
key (a, b) references foo (a, b));  

CREATE TABLE
Duración: 2,559 ms

Now you have a normal index that you can reindex in the normal way, if you need
it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Comments related to " buffer descriptors“ cache line size"

2020-09-04 Thread Bruce Momjian

On Fri, Sep  4, 2020 at 11:31:55PM +0800, Kelly Min wrote:
> hello, hackers.
> 
> 
> I found a comment error in buffer descriptors comment. The cache line size is
> 64 bytes, not 64 bits

Thanks, fix applied to all active branches.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: LogwrtResult contended spinlock

2020-09-04 Thread Andres Freund

Hi,

On 2020-09-04 10:05:45 -0700, Andres Freund wrote:
> On 2020-09-03 14:34:52 -0400, Alvaro Herrera wrote:
> > Looking at patterns like this
> > 
> > if (XLogCtl->LogwrtRqst.Write < EndPos)
> > XLogCtl->LogwrtRqst.Write = EndPos;
> > 
> > It seems possible to implement with
> > 
> > do {
> > XLogRecPtr  currwrite;
> > 
> > currwrite = pg_atomic_read_u64(LogwrtRqst.Write);
> > if (currwrite > EndPos)
> > break;  // already done by somebody else
> > if (pg_atomic_compare_exchange_u64(LogwrtRqst.Write,
> >currwrite, EndPos))
> > break;  // successfully updated
> > } while (true);
> > 
> > This assumes that LogwrtRqst.Write never goes backwards, so it doesn't
> > seem good material for a general routine.
> > 
> > This *seems* correct to me, though this is muddy territory to me.  Also,
> > are there better ways to go about this?
> 
> Hm, I was thinking that we'd first go for reading it without a spinlock,
> but continuing to write it as we currently do.
> 
> But yea, I can't see an issue with what you propose here. I personally
> find do {} while () weird and avoid it when not explicitly useful, but
> that's extremely minor, obviously.

Re general routine: On second thought, it might actually be worth having
it. Even just for LSNs - there's plenty places where it's useful to
ensure a variable is at least a certain size.  I think I would be in
favor of a general helper function.

Greetings,

Andres Freund

Re: LogwrtResult contended spinlock

2020-09-04 Thread Andres Freund

Hi,

On 2020-09-03 14:34:52 -0400, Alvaro Herrera wrote:
> Looking at patterns like this
> 
>   if (XLogCtl->LogwrtRqst.Write < EndPos)
>   XLogCtl->LogwrtRqst.Write = EndPos;
> 
> It seems possible to implement with
> 
> do {
>   XLogRecPtr  currwrite;
> 
> currwrite = pg_atomic_read_u64(LogwrtRqst.Write);
>   if (currwrite > EndPos)
> break;  // already done by somebody else
> if (pg_atomic_compare_exchange_u64(LogwrtRqst.Write,
>  currwrite, EndPos))
> break;  // successfully updated
> } while (true);
> 
> This assumes that LogwrtRqst.Write never goes backwards, so it doesn't
> seem good material for a general routine.
> 
> This *seems* correct to me, though this is muddy territory to me.  Also,
> are there better ways to go about this?

Hm, I was thinking that we'd first go for reading it without a spinlock,
but continuing to write it as we currently do.

But yea, I can't see an issue with what you propose here. I personally
find do {} while () weird and avoid it when not explicitly useful, but
that's extremely minor, obviously.

Greetings,

Andres Freund

Re: [PATCH] Redudant initilization

2020-09-04 Thread Ranier Vilela

Em sex., 4 de set. de 2020 às 11:01, Bruce Momjian 
escreveu:

> On Fri, Sep  4, 2020 at 09:39:45AM -0300, Ranier Vilela wrote:
> > Em qui., 3 de set. de 2020 às 23:57, Bruce Momjian 
> escreveu:
> >
> > On Wed, Apr  1, 2020 at 08:57:18AM -0300, Ranier Vilela wrote:
> > > Hi,
> > > New patch with yours suggestions.
> >
> > Patch applied to head, thanks.
> >
> > Thank you Bruce.
>
> I have to say, I am kind of stumped why compilers do not warn of such
> cases, and why we haven't gotten reports about these cases before.
>
I believe it is because, syntactically, there is no error.

I would like to thank Kyotaro Horiguchi,
my thanks for your review.

regards,
Ranier Vilela

Re: Rare deadlock failure in create_am test

2020-09-04 Thread Tom Lane

Alvaro Herrera  writes:
> On 2020-Sep-03, Tom Lane wrote:
>> I'm inclined to think that the best fix is to put
>> 
>> begin;
>> lock table [fast_emp4000];
>> ...
>> commit;
>> 
>> around the DROP CASCADE.

> Yeah, sounds good.

Done.

regards, tom lane

[PATCH] Comments related to " buffer descriptors“ cache line size"

2020-09-04 Thread Kelly Min

hello, hackers.

I found a comment error in buffer descriptors comment. The cache line size
is 64 bytes, not 64 bits


buffer-descriptor.patch
Description: Binary data

Re: On login trigger: take three

2020-09-04 Thread Konstantin Knizhnik

Sorry, attached version of the patch is missing changes in one file. 
Here is it correct patch.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/ref/create_trigger.sgml b/doc/src/sgml/ref/create_trigger.sgml
index 289dd1d..b3c1fc2 100644
--- a/doc/src/sgml/ref/create_trigger.sgml
+++ b/doc/src/sgml/ref/create_trigger.sgml
@@ -27,7 +27,7 @@ PostgreSQL documentation
  
 
 CREATE [ CONSTRAINT ] TRIGGER name { BEFORE | AFTER | INSTEAD OF } { event [ OR ... ] }
-ON table_name
+ON { table_name | database_name }
 [ FROM referenced_table_name ]
 [ NOT DEFERRABLE | [ DEFERRABLE ] [ INITIALLY IMMEDIATE | INITIALLY DEFERRED ] ]
 [ REFERENCING { { OLD | NEW } TABLE [ AS ] transition_relation_name } [ ... ] ]
@@ -41,6 +41,7 @@ CREATE [ CONSTRAINT ] TRIGGER name
 UPDATE [ OF column_name [, ... ] ]
 DELETE
 TRUNCATE
+CONNECTION
 
  
 
diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml
index 4a0e746..22caf4c 100644
--- a/doc/src/sgml/trigger.sgml
+++ b/doc/src/sgml/trigger.sgml
@@ -71,6 +71,23 @@

 

+ You can also define on connection trigger:
+ 
+   CREATE EVENT TRIGGER mytrigger
+   AFTER CONNECTION ON mydatabase
+   EXECUTE {PROCEDURE | FUNCTION} myproc();
+ 
+ On connection triggers can only be defined for self database.
+ It can be only AFTER trigger,
+ and may not contain WHERE clause or list of columns.
+ Any number of triggers with unique names can be defined for the database.
+ On connection triggers can be dropped by specifying name of the database:
+ 
+   DROP TRIGGER mytrigger ON mydatabase;
+ 
+   
+
+   
 The trigger function must be defined before the trigger itself can be
 created.  The trigger function must be declared as a
 function taking no arguments and returning type trigger.
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 6dfe1be..da57951 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -1433,11 +1433,23 @@ get_object_address_relobject(ObjectType objtype, List *object,
 
 	/* Extract relation name and open relation. */
 	relname = list_truncate(list_copy(object), nnames - 1);
-	relation = table_openrv_extended(makeRangeVarFromNameList(relname),
-	 AccessShareLock,
-	 missing_ok);
 
-	reloid = relation ? RelationGetRelid(relation) : InvalidOid;
+	address.objectId = InvalidOid;
+	if (objtype == OBJECT_TRIGGER && list_length(relname) == 1)
+	{
+		reloid = get_database_oid(strVal(linitial(relname)), true);
+		if (OidIsValid(reloid))
+			address.objectId = get_trigger_oid(reloid, depname, true);
+	}
+
+	if (!OidIsValid(address.objectId))
+	{
+		relation = table_openrv_extended(makeRangeVarFromNameList(relname),
+		 AccessShareLock,
+		 missing_ok);
+
+		reloid = relation ? RelationGetRelid(relation) : InvalidOid;
+	}
 
 	switch (objtype)
 	{
@@ -1449,8 +1461,11 @@ get_object_address_relobject(ObjectType objtype, List *object,
 			break;
 		case OBJECT_TRIGGER:
 			address.classId = TriggerRelationId;
-			address.objectId = relation ?
-get_trigger_oid(reloid, depname, missing_ok) : InvalidOid;
+			if (!OidIsValid(address.objectId))
+			{
+address.objectId = relation ?
+	get_trigger_oid(reloid, depname, missing_ok) : InvalidOid;
+			}
 			address.objectSubId = 0;
 			break;
 		case OBJECT_TABCONSTRAINT:
diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index 81f0380..f4f4825 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -104,10 +104,13 @@ RemoveObjects(DropStmt *stmt)
 
 		/* Check permissions. */
 		namespaceId = get_object_namespace();
-		if (!OidIsValid(namespaceId) ||
-			!pg_namespace_ownercheck(namespaceId, GetUserId()))
+		if ((relation != NULL || stmt->removeType != OBJECT_TRIGGER) &&
+			(!OidIsValid(namespaceId) ||
+			 !pg_namespace_ownercheck(namespaceId, GetUserId(
+		{
 			check_object_ownership(GetUserId(), stmt->removeType, address,
    object, relation);
+		}
 
 		/*
 		 * Make note if a temporary namespace has been accessed in this
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 672fccf..25bc132 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -167,7 +167,7 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
 	char	   *qual;
 	Datum		values[Natts_pg_trigger];
 	bool		nulls[Natts_pg_trigger];
-	Relation	rel;
+	Relation	rel = NULL;
 	AclResult	aclresult;
 	Relation	tgrel;
 	SysScanDesc tgscan;
@@ -179,6 +179,7 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
 	char		internaltrigname[NAMEDATALEN];
 	char	   *trigname;
 	Oid			constrrelid = InvalidOid;
+	Oid targetid = InvalidOid;
 	ObjectAddress myself,
 referenced;
 	char	   *oldtablename = NULL;
@@ -188,119

Re: Add session statistics to pg_stat_database

2020-09-04 Thread Laurenz Albe

On Tue, 2020-08-11 at 13:53 +0200, I wrote:
> On Thu, 2020-07-23 at 18:16 +0500, Ahsan Hadi wrote:
> 
> > On Wed, Jul 8, 2020 at 4:17 PM Laurenz Albe  
> > wrote:
> > > Here is a patch that adds the following to pg_stat_database:
> > > - number of connections
> >
> > Is it expected behaviour to not count idle connections? The connection is 
> > included after it is aborted but not while it was idle.
> 
> Currently, the patch counts connections when they close.
> 
> I could change the behavior that they are counted immediately.

I have changed the code so that connections are counted immediately.

Attached is a new version.

Yours,
Laurenz Albe
From 6d9bfbd682a9f4723f030fdc461f731175f55f44 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Fri, 4 Sep 2020 17:30:24 +0200
Subject: [PATCH] Add session statistics to pg_stat_database

If "track_counts" is active, track the following per database:
- number of connections
- number of sessions that were not disconnected regularly
- total time spent in database sessions
- total time spent executing queries
- total idle in transaction time

This is useful to check if connection pooling is working.
It also helps to estimate the size of the connection pool
required to keep the database busy, which depends on the
percentage of the transaction time that is spent idling.
---
 doc/src/sgml/monitoring.sgml |  46 +
 src/backend/catalog/system_views.sql |   5 +
 src/backend/postmaster/pgstat.c  | 146 ++-
 src/backend/tcop/postgres.c  |   5 +
 src/backend/utils/adt/pgstatfuncs.c  |  78 ++
 src/include/catalog/pg_proc.dat  |  20 
 src/include/pgstat.h |  29 +-
 src/test/regress/expected/rules.out  |   5 +
 8 files changed, 332 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 673a0e73e4..aa5e22d213 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3514,6 +3514,52 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
  
 
+ 
+  
+   session_time double precision
+  
+  
+   Time spent in database sessions in this database, in milliseconds.
+  
+ 
+
+ 
+  
+   active_time double precision
+  
+  
+   Time spent executing SQL statements in this database, in milliseconds.
+  
+ 
+
+ 
+  
+   idle_in_transaction_time double precision
+  
+  
+   Time spent idling while in a transaction in this database, in milliseconds.
+  
+ 
+
+ 
+  
+   connections bigint
+  
+  
+   Number of connections established to this database.
+  
+ 
+
+ 
+  
+   aborted_sessions bigint
+  
+  
+   Number of database sessions to this database that did not end
+   with a regular client disconnection.
+  
+ 
+
  
   
stats_reset timestamp with time zone
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ed4f3f142d..d8b28c7600 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -912,6 +912,11 @@ CREATE VIEW pg_stat_database AS
 pg_stat_get_db_checksum_last_failure(D.oid) AS checksum_last_failure,
 pg_stat_get_db_blk_read_time(D.oid) AS blk_read_time,
 pg_stat_get_db_blk_write_time(D.oid) AS blk_write_time,
+pg_stat_get_db_session_time(D.oid) AS session_time,
+pg_stat_get_db_active_time(D.oid) AS active_time,
+pg_stat_get_db_idle_in_transaction_time(D.oid) AS idle_in_transaction_time,
+pg_stat_get_db_connections(D.oid) AS connections,
+pg_stat_get_db_aborted_sessions(D.oid) AS aborted_sessions,
 pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset
 FROM (
 SELECT 0 AS oid, NULL::name AS datname
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 5f4b168fd1..12a7543554 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -247,6 +247,12 @@ static int	pgStatXactCommit = 0;
 static int	pgStatXactRollback = 0;
 PgStat_Counter pgStatBlockReadTime = 0;
 PgStat_Counter pgStatBlockWriteTime = 0;
+static TimestampTz pgStatActiveStart = DT_NOBEGIN;
+static PgStat_Counter pgStatActiveTime = 0;
+static TimestampTz pgStatTransactionIdleStart = DT_NOBEGIN;
+static PgStat_Counter pgStatTransactionIdleTime = 0;
+static bool pgStatSessionReported = false;
+bool pgStatSessionDisconnected = false;
 
 /* Record that's written to 2PC state file when pgstat state is persisted */
 typedef struct TwoPhasePgStatRecord
@@ -326,6 +332,7 @@ static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
 static void pgstat_send_slru(void);
 static HTAB *pgstat_collect_oids(Oid catalogid, AttrNumber anum_oid);
+static void

Re: Change a constraint's index - ALTER TABLE ... ALTER CONSTRAINT ... USING INDEX ...

2020-09-04 Thread Laurenz Albe

On Fri, 2020-09-04 at 10:41 -0400, Alvaro Herrera wrote:
> > The value I see in this is:
> > - replacing a primary key index
> > - replacing the index behind a constraint targeted by a foreign key
> 
> But why is this better than using REINDEX CONCURRENTLY?

It is not better, but it can be used to replace a constraint index
with an index with a different INCLUDE clause, which is something
that cannot easily be done otherwise.

For exclusion constraints it is pretty useless, and for unique
constraints it can be worked around with CREATE UNIQUE INDEX CONCURRENTLY.

Admitted, the use case is pretty narrow, and I am not sure if it is
worth adding code and SQL syntax for that.

Yours,
Laurenz Albe

Re: Improving connection scalability: GetSnapshotData()

2020-09-04 Thread Konstantin Knizhnik




On 03.09.2020 11:18, Michael Paquier wrote:

On Sun, Aug 16, 2020 at 02:26:57PM -0700, Andres Freund wrote:

So we get some builfarm results while thinking about this.

Andres, there is an entry in the CF for this thread:
https://commitfest.postgresql.org/29/2500/

A lot of work has been committed with 623a9ba, 73487a6, 5788e25, etc.
Now that PGXACT is done, how much work is remaining here?
--
Michael


Andres,
First of all a lot of thanks for this work.
Improving Postgres connection scalability is very important.

Reported results looks very impressive.
But I tried to reproduce them and didn't observed similar behavior.
So I am wondering what can be the difference and what I am doing wrong.

I have tried two different systems.
First one is IBM Power2 server with 384 cores and 8Tb of RAM.
I run the same read-only pgbench test as you. I do not think that size of the 
database is matter, so I used scale 100 -
it seems to be enough to avoid frequent buffer conflicts.
Then I run the same scripts as you:

 for ((n=100; n < 1000; n+=100)); do echo $n; pgbench -M prepared -c $n -T 100 
-j $n -M prepared -S -n postgres ;  done
 for ((n=1000; n <= 5000; n+=1000)); do echo $n; pgbench -M prepared -c $n -T 
100 -j $n -M prepared -S -n postgres ;  done


I have compared current master with version of Postgres prior to your commits 
with scalability improvements: a9a4a7ad56

For all number of connections older version shows slightly better results, for 
example for 500 clients: 475k TPS vs. 450k TPS for current master.

This is quite exotic server and I do not have currently access to it.
So I have repeated experiments at Intel server.
It has 160 cores Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz and 256Gb of RAM.

The same database, the same script, results are the following:

Clients old/inc old/exl new/inc new/exl
10001105750 1163292 1206105 1212701
20001050933 1124688 1149706 1164942
30001063667 1195158 1118087 1144216
40001040065 1290432 1107348 1163906
5000943813  1258643 1103790 1160251

I have separately show results including/excluding connection connections 
establishing,
because in new version there are almost no differences between them,
but for old version gap between them is noticeable.

Configuration file has the following differences with default postgres config:

max_connections = 1 # (change requires restart)
shared_buffers = 8GB# min 128kB


This results contradict with yours and makes me ask the following questions:

1. Why in your case performance is almost two times larger (2 millions vs 1)?
The hardware in my case seems to be at least not worser than yours...
May be there are some other improvements in the version you have tested which 
are not yet committed to master?

2. You wrote: This is on a machine with 2
Intel(R) Xeon(R) Platinum 8168, but virtualized (2 sockets of 18 cores/36 
threads)

According to Intel specification Intel® Xeon® Platinum 8168 Processor has 24 
cores:
https://ark.intel.com/content/www/us/en/ark/products/120504/intel-xeon-platinum-8168-processor-33m-cache-2-70-ghz.html

And at your graph we can see almost linear increase of speed up to 40 
connections.

But most suspicious word for me is "virtualized". What is the actual hardware 
and how it is virtualized?

Do you have any idea why in my case master version (with your commits) behaves 
almost the same as non-patched version?
Below is yet another table showing scalability from 10 to 100 connections and 
combining your results (first two columns) and my results (last two columns):


Clients old master  pgxact-split-cache  current master
revision 9a4a7ad56
10  367883  375682  358984
347067
20  748000  810964  668631
630304
30  999231  1288276 920255
848244
40  991672  1573310 1100745
970717
50
1017561 1715762 1193928
1008755
60
993943  1789698 1255629
917788
70
971379  1819477 1277634
873022
80
966276  1842248 1266523
830197
90
901175  1847823 1255260
736550
100
803175  1865795 1241143
736756


May be it is because of more complex architecture of my server?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [PATCH]Fix ja.po error

2020-09-04 Thread Alvaro Herrera

On 2020-Aug-19, Lu, Chenyang wrote:

> Ping: sorry, did Alvaro and Peter forget this email?( Maybe didn't see this 
> email~ ), I found that the patch of ja.po has not been applied to the 
> Translation Repository.

Apologies.  I have pushed this to all branches of the translation repo
now.

The bogus 'msgstr' was not identical in Postgres 11 and back -- I think
the only difference was one extra whitespace.  I suppose that's not
important, so I used the translation as provided with no change.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Allow continuations in "pg_hba.conf" files

2020-09-04 Thread Tom Lane

Fabien COELHO  writes:
> I notice that buf.data is not freed. I guess that the server memory 
> management will recover it.

Yeah, it's in the transient context holding all of the results of
reading the file.  I considered pfree'ing it at the end of the
function, but I concluded there's no point.  The space will be
recycled when the context is destroyed, and since we're not (IIRC)
going to allocate anything more in that context, nothing would be
gained by freeing it earlier --- it'd just stay as unused memory
within the context.

regards, tom lane

Re: Ideas about a better API for postgres_fdw remote estimates

2020-09-04 Thread Tomas Vondra


On Thu, Sep 03, 2020 at 10:14:41AM +0500, Andrey V. Lepikhov wrote:

On 8/31/20 6:19 PM, Ashutosh Bapat wrote:

On Mon, Aug 31, 2020 at 3:36 PM Andrey V. Lepikhov
 wrote:


Thanks for this helpful feedback.

I think the patch has some other problems like it works only for
regular tables on foreign server but a foreign table can be pointing
to any relation like a materialized view, partitioned table or a
foreign table on the foreign server all of which have statistics
associated with them. I didn't look closely but it does not consider
that the foreign table may not have all the columns from the relation
on the foreign server or may have different names. But I think those
problems are kind of secondary. We have to agree on the design first.


In accordance with discussion i made some changes in the patch:
1. The extract statistic routine moved into the core.
2. Serialized stat contains 'version' field to indicate format of 
statistic received.
3. ANALYZE and VACUUM ANALYZE uses this approach only in the case of 
implicit analysis of the relation.


I am currently keeping limitation of using the approach for regular 
relations only, because i haven't studied the specifics of another 
types of relations.

But I don't know any reason to keep this limit in the future.

The patch in attachment is very raw. I publish for further substantive 
discussion.




Thanks for working on this. I briefly looked at the patch today, and I
have some comments/feedback:

1) I wonder why deparseGetStatSql looks so different from e.g.
deparseAnalyzeSizeSql - no deparseStringLiteral on relname, no cast to
pg_catalog.regclass, function name not qualified with pg_catalog, ...


2) I'm a bit annoyed by the amount of code added to analyze.c only to
support output/input in JSON format. I'm no expert, but I don't recall
explain needing this much new stuff (OTOH it just produces json, it does
not need to read it). Maybe we also need to process wider range of data
types here. But the code is almost perfectly undocumented :-(


3) Why do we need to change vacuum_rel this way?


4) I wonder if we actually want/need to simply output pg_statistic data
verbatim like this. Is postgres_fdw actually going to benefit from it? I
kinda doubt that, and my assumption was that we'd return only a small
subset of the data, needed by get_remote_estimate.

This has a couple of issues. Firstly, it requires the knowledge of what
the stakind constants in pg_statistic mean and how to interpret it - but
OK, it's true that does not change very often (or at all). Secondly, it
entirely ignores extended statistics - OK, we might extract those too,
but it's going to be much more complex. And finally it entirely ignores
costing on the remote node. Surely we can't just apply local
random_page_cost or whatever, because those may be entirely different.
And we don't know if the remote is going to use index etc.

So is extracting data from pg_statistic the right approach?


5) I doubt it's enough to support relnames - we also need to estimate
joins, so this needs to support plain queries I think. At least that's
what Tom envisioned in his postgres_fdw_support(query text) proposal.


6) I see you've included a version number in the data - why not to just
check 



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Change a constraint's index - ALTER TABLE ... ALTER CONSTRAINT ... USING INDEX ...

2020-09-04 Thread Alvaro Herrera

On 2020-Sep-04, Laurenz Albe wrote:

> The value I see in this is:
> - replacing a primary key index
> - replacing the index behind a constraint targeted by a foreign key

But why is this better than using REINDEX CONCURRENTLY?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Division in dynahash.c due to HASH_FFACTOR

2020-09-04 Thread Alvaro Herrera

On 2020-Sep-04, Jakub Wartak wrote:

> After removing HASH_FFACTOR PostgreSQL still compiles...  Would
> removing it break some external API/extensions ?

FWIW, HASH_FFACTOR has *never* been used in Postgres core code.

https://postgr.es/m/20160418180711.55ac82c0@fujitsu already reported
that this flag was unused a few years ago.  And a search through
codesearch.debian.net shows that no software packaged by Debian uses
that flag either.

*If* any third-party extension is using HASH_FFACTOR, it can easily be
unbroken by removing the flag or #defining it to zero; the removal would
only be a problem if anyone showed that there is a performance loss by
using the default fillfactor for some dynahash table instead of their
custom value.

I think if we know that there's a 4% performance increase by removing
the division that comes with an ability to use a custom fillfactor that
nobody uses or has ever used, it makes sense to remove that ability.

... however, if we're really tense about improving some hash table's
performance, it might be worth trying to use simplehash.h instead of
dynahash.c for it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Rare deadlock failure in create_am test

2020-09-04 Thread Alvaro Herrera

On 2020-Sep-03, Tom Lane wrote:

> So it's not hard to understand the problem: DROP of an index AM, cascading
> to an index, will need to acquire lock on the index and then lock on the
> index's table.  Any other operation on the table, like say autovacuum,
> is going to acquire locks in the other direction.

Oh, of course.

> I'm inclined to think that the best fix is to put
> 
> begin;
> lock table [fast_emp4000];
> ...
> commit;
> 
> around the DROP CASCADE.

Yeah, sounds good.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Report error position in partition bound check

2020-09-04 Thread Ashutosh Bapat

On Fri, 10 Jul 2020 at 23:31, Alexandra Wang 
wrote:

>
>
> Thank you Daniel. Here's the rebased patch. I also squashed the two
> patches into one so it's easier to review.
>
> Thanks for rebasing patch. It applies cleanly still. Here are some comments
@@ -3320,7 +3338,9 @@ make_one_partition_rbound(PartitionKey key, int
index, List *datums, bool lower)
  * partition_rbound_cmp
  *
  * Return for two range bounds whether the 1st one (specified in datums1,

I think it's better to reword it as. "For two range bounds decide whether
...

- * kind1, and lower1) is <, =, or > the bound specified in *b2.
+ * kind1, and lower1) is <, =, or > the bound specified in *b2. 0 is
returned if
+ * equal and the 1-based index of the first mismatching bound if unequal;
+ * multiplied by -1 if the 1st bound is smaller.

This sentence makes sense after the above correction. I liked this change,
requires very small changes in other parts.


 /*
@@ -3495,7 +3518,7 @@ static int
 partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
Oid *partcollation,
PartitionBoundInfo boundinfo,
-   PartitionRangeBound *probe, bool *is_equal)
+   PartitionRangeBound *probe, bool *is_equal, int32
*cmpval)

Please update the prologue explaining the new argument.

After this change, the patch will be ready for a committer.
-- 
Best Wishes,
Ashutosh

Re: Allow continuations in "pg_hba.conf" files

2020-09-04 Thread Fabien COELHO




Hello Tom,


Accordingly, I borrowed some code from that thread and present
the attached revision.  I also added some test coverage, since
that was lacking before, and wordsmithed docs and comments slightly.


Hearing no comments, pushed that way.


Thanks for the fixes and improvements!

I notice that buf.data is not freed. I guess that the server memory 
management will recover it.


--
Fabien.

Re: [PATCH] Redudant initilization

2020-09-04 Thread Bruce Momjian

On Fri, Sep  4, 2020 at 09:39:45AM -0300, Ranier Vilela wrote:
> Em qui., 3 de set. de 2020 às 23:57, Bruce Momjian  
> escreveu:
> 
> On Wed, Apr  1, 2020 at 08:57:18AM -0300, Ranier Vilela wrote:
> > Hi,
> > New patch with yours suggestions.
> 
> Patch applied to head, thanks.
> 
> Thank you Bruce.

I have to say, I am kind of stumped why compilers do not warn of such
cases, and why we haven't gotten reports about these cases before.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Rare deadlock failure in create_am test

2020-09-04 Thread Tom Lane

Thomas Munro  writes:
> On Fri, Sep 4, 2020 at 2:13 PM Tom Lane  wrote:
>> I'm inclined to think that the best fix is to put
>> 
>> begin;
>> lock table tenk2;
>> ...
>> commit;

> Makes sense, except s/tenk2/fast_emp4000/, no?

Sorry, thinko ...

regards, tom lane

Re: Ideas about a better API for postgres_fdw remote estimates

2020-09-04 Thread Ashutosh Bapat

On Thu, 3 Sep 2020 at 10:44, Andrey V. Lepikhov 
wrote:

> On 8/31/20 6:19 PM, Ashutosh Bapat wrote:
> > On Mon, Aug 31, 2020 at 3:36 PM Andrey V. Lepikhov
> >  wrote:
> >>
> >> Thanks for this helpful feedback.
> > I think the patch has some other problems like it works only for
> > regular tables on foreign server but a foreign table can be pointing
> > to any relation like a materialized view, partitioned table or a
> > foreign table on the foreign server all of which have statistics
> > associated with them. I didn't look closely but it does not consider
> > that the foreign table may not have all the columns from the relation
> > on the foreign server or may have different names. But I think those
> > problems are kind of secondary. We have to agree on the design first.
> >
> In accordance with discussion i made some changes in the patch:
> 1. The extract statistic routine moved into the core.
>

Bulk of the patch implements the statistics conversion to and fro json
format. I am still not sure whether we need all of that code here. Can we
re-use pg_stats view? That is converting some of the OIDs to names. I agree
with anyarray but if that's a problem here it's also a problem for pg_stats
view, isn't it? If we can reduce the stats handling code to a minimum or
use it for some other purpose as well e.g. pg_stats enhancement, the code
changes required will be far less compared to the value that this patch
provides.

-- 
Best Wishes,
Ashutosh

Re: Disk-based hash aggregate's cost model

2020-09-04 Thread Tomas Vondra


On Thu, Sep 03, 2020 at 05:53:43PM -0700, Jeff Davis wrote:

On Tue, 2020-09-01 at 23:19 +0200, Tomas Vondra wrote:

FWIW any thoughts about the different in temp size compared to
CP_SMALL_TLIST?


Are you referring to results from a while ago? In this thread I don't
see what you're referring to.

I tried in a simple case on REL_13_STABLE, with and without the
CP_SMALL_TLIST change, and I saw only a tiny difference. Do you have a
current case that shows a larger difference?



I'm referring to the last charts in the message from July 27, comparing
behavior with CP_SMALL_TLIST fix vs. master (which reverted/replaced the
CP_SMALL_TLIST bit).

Those charts show that the CP_SMALL_TLIST resulted in smaller temp files
(per EXPLAIN ANALYZE the difference is ~25%) and also lower query
durations (also in the ~25% range).

I can repeat those tests, if needed.

[1] 
https://www.postgresql.org/message-id/20200724012248.y77rpqc73agrsvb3@development


The only thing I can think of that might change is the size of the null
bitmap or how fields are aligned.



Maybe. Not sure.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Get memory contexts of an arbitrary backend process

2020-09-04 Thread Tomas Vondra

On Fri, Sep 04, 2020 at 11:47:30AM +0900, Kasahara Tatsuhito wrote:

On Fri, Sep 4, 2020 at 2:40 AM Tom Lane  wrote:

Kasahara Tatsuhito  writes:
> Yes, but it's not only for future expansion, but also for the
> usability and the stability of this feature.
> For example, if you want to read one dumped file multiple times and analyze 
it,
> you will want the ability to just read the dump.

If we design it to make that possible, how are we going to prevent disk
space leaks from never-cleaned-up dump files?

In my thought, with features such as a view that allows us to see a
list of dumped files,
it would be better to have a function that simply deletes the dump
files associated with a specific PID,
or to delete all dump files.
Some files may be dumped with unexpected delays, so I think the
cleaning feature will be necessary.
( Also, as the pgsql_tmp file, it might better to delete dump files
when PostgreSQL start.)

Or should we try to delete the dump file as soon as we can read it?

IMO making the cleanup a responsibility of the users (e.g. by exposing
the list of dumped files through a view and expecting users to delete
them in some way) is rather fragile.

I don't quite see what's the point of designing it this way. It was
suggested this improves stability and usability of this feature, but
surely making it unnecessarily complex contradicts both points?

IMHO if the user needs to process the dump repeatedly, what's preventing
him/her from storing it in a file, or something like that? At that point
it's clear it's up to them to remove the file. So I suggest to keep the
feature as simple as possible - hand the dump over and delete.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Redudant initilization

2020-09-04 Thread Ranier Vilela

Em qui., 3 de set. de 2020 às 23:57, Bruce Momjian 
escreveu:

> On Wed, Apr  1, 2020 at 08:57:18AM -0300, Ranier Vilela wrote:
> > Hi,
> > New patch with yours suggestions.
>
> Patch applied to head, thanks.
>
Thank you Bruce.

regards,
Ranier Vilela

Re: Fix for configure error in 9.5/9.6 on macOS 11.0 Big Sur

2020-09-04 Thread Peter Eisentraut


On 2020-09-04 07:52, Tom Lane wrote:

Peter Eisentraut  writes:

I suppose backpatching the patch that fixed this would be appropriate.


[ confused ... ]  Back-patching what patch?


Commit 1c0cf52b39ca3a9a79661129cff918dc000a55eb was mentioned at the 
beginning of the thread.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Division in dynahash.c due to HASH_FFACTOR

2020-09-04 Thread Tomas Vondra


On Fri, Sep 04, 2020 at 07:01:41AM +, Jakub Wartak wrote:


Greetins hackers,

I have mixed feelings if this welcome contribution as the potential
gain is relatively small in my tests, but still I would like to point
out that HASH_FFACTOR functionality from dynahash.c could be removed or
optimized (default fill factor is always 1, there's not a single place
that uses custom custom fill factor other than DEF_FFACTOR=1 inside
PostgreSQL repository). Because the functionality is present there
seems to be division for every buffer access [BufTableLookup()] / or
every smgropen() call (everything call to hash_search() is affected,
provided it's not ShmemInitHash/HASH_PARTITION). This division is
especially visible via perf on single process StartupXLOG WAL recovery
process on standby in heavy duty 100% CPU conditions , as the top1 is
inside hash_search:

   0x00888751 <+449>:   idiv   r8
   0x00888754 <+452>:   cmp    rax,QWORD PTR [r15+0x338] <<-- in perf 
annotate shows as 30-40%, even on default -O2, probably CPU pipelining for idiv above

I've made a PoC test to skip that division assuming ffactor would be gone:
  if (!IS_PARTITIONED(hctl) && !hashp->frozen &&
-                       hctl->freeList[0].nentries / (long) (hctl->max_bucket + 1) >= 
hctl->ffactor &&
+                       hctl->freeList[0].nentries >= (long) (hctl->max_bucket + 1) 
&&

For a stream of WAL 3.7GB I'm getting consistent improvement of ~4%, (yes I 
know it's small, that's why I'm having mixed feelings):
gcc -O3: 104->100s
gcc -O2: 108->104s
pgbench -S -c 16 -j 4 -T 30 -M prepared: stays more or less the same (-s 100), 
so no positive impact there



Hmm, so it's the division that makes the difference? In that case,
wouldn't it make sense to just compute a threshold every time the hash
table is resized, and then do just the comparison. That is, compute

   nentries_threshold = (long) (hctl->max_bucket + 1) * hctl->ffactor;

and then do just

   hctl->freeList[0].nentries >= nentries_threshold

Of course, this assumes the threshold is calculated only rarely, maybe
that's not the case.

Also, what if you lower the fill factor, e.g. to 0.8? Does that improve
performance or not? I can't find any discussion about this in the
archives, but the dynamic hashing paper [1] seems to suggest it does
make a difference (see the tables at the end, but I haven't re-read the
paper recently so maybe it's irrelevant). Anyway, it'd be foolish to
remove the ffactor parameter to save on division only to lose the
ability to lower the factor and save more than that ...


As for the 4% - it's not much, but it's also not negligible. I'm sure
we've done micro-optimizations for smaller gains. The big question is
whether this is statistically significant, or whether it's just due to
random effects. It could easily be caused by changes in layout of the
binary, for example - that can happen quite easily. See [2] and [3].

The talk actually mentions a tool meant to eliminate this bias by
randomization, but I've never tried using it on PostgreSQL so I'm not
sure how compatible it is :-(


[1] https://www.csd.uoc.gr/~hy460/pdf/Dynamic%20Hash%20Tables.pdf
[2] 
https://www.cis.upenn.edu/~cis501/previous/fall2016/papers/producing-wrong-data.pdf
[3] https://www.youtube.com/watch?v=r-TLSBdHe1A


After removing HASH_FFACTOR PostgreSQL still compiles...  Would
removing it break some external API/extensions ? I saw several
optimization for the "idiv" where it could be optimized e.g. see
https://github.com/ridiculousfish/libdivide  Or maybe there is some
other idea to expose bottlenecks of BufTableLookup() ? I also saw
codepath PinBuffer()->GetPrivateRefCountEntry() -> dynahash that could
be called pretty often I have no idea what kind of pgbench stresstest
could be used to demonstrate the gain (or lack of it).

-Jakub Wartak.



I don't think whit would break a lot of stuff, but I'm kinda dubious
it's a measurable improvement for common workloads ...


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Change a constraint's index - ALTER TABLE ... ALTER CONSTRAINT ... USING INDEX ...

2020-09-04 Thread Laurenz Albe

On Mon, 2020-08-10 at 09:29 +0300, Anna Akenteva wrote:
> On 2020-07-07 01:08, Tom Lane wrote:
> 
> > Alvaro Herrera  writes:
> > > On 2020-Jul-05, Anna Akenteva wrote:
> > > > -- Swapping primary key's index for an equivalent index,
> > > > -- but with INCLUDE-d attributes.
> > > > CREATE UNIQUE INDEX new_idx ON target_tbl (id) INCLUDE (info);
> > > > ALTER TABLE target_tbl ALTER CONSTRAINT target_tbl_pkey USING INDEX
> > > > new_idx;
> > > > ALTER TABLE referencing_tbl ALTER CONSTRAINT 
> > > > referencing_tbl_id_ref_fkey
> > > > USING INDEX new_idx;
> > > How is this state represented by pg_dump?
> > Even if it's possible to represent, I think we should flat out reject
> > this "feature".  Primary keys that aren't primary keys don't seem like
> > a good idea.  For one thing, it won't be possible to describe the
> > constraint accurately in the information_schema.
> 
> 
> Do you think it could still be a good idea if we only swap the 
> relfilenodes of indexes, as it was suggested in [1]? The original use 
> case was getting rid of index bloat, which is now solved by REINDEX 
> CONCURRENTLY, but this feature still has its own use case of adding 
> INCLUDE-d columns to constraint indexes.

How can you just swap the filenodes if "indnatts" and "indkey" is
different, since one index has an INCLUDE clause?

I think that the original proposal is better, except that foreign key
dependencies should be changed along with the primary or unique index,
so that everything is consistent once the command is done.

Then the ALTER CONSTRAINT from that replaces the index referenced
by a foreign key becomes unnecessary and should be removed.

The value I see in this is:
- replacing a primary key index
- replacing the index behind a constraint targeted by a foreign key

Some code comments:

+   
+ALTER CONSTRAINT constraint_name [USING INDEX

Re: Creating foreign key on partitioned table is too slow

2020-09-04 Thread Ashutosh Bapat

On Fri, Sep 4, 2020 at 12:05 AM Tom Lane  wrote:
>
> Amit Langote  writes:
> >> Fwiw, I am fine with applying the memory-leak fix in all branches down
> >> to v12 if we are satisfied with the implementation.
>
> > I have revised the above patch slightly to introduce a variable for
> > the condition whether to use a temporary memory context.
>
> This CF entry has been marked "ready for committer", which I find
> inappropriate since there doesn't seem to be any consensus about
> whether we need it.
>
> I tried running the original test case under HEAD.  I do not see
> any visible memory leak, which I think indicates that 5b9312378 or
> some other fix has taken care of the leak since the original report.
> However, after waiting awhile and noting that the ADD FOREIGN KEY
> wasn't finishing, I poked into its progress with a debugger and
> observed that each iteration of RI_Initial_Check() was taking about
> 15 seconds.  I presume we have to do that for each partition,
> which leads to the estimate that it'll take 34 hours to finish this
> ... and that's with no data in the partitions, god help me if
> there'd been a lot.
>
> Some quick "perf" work says that most of the time seems to be
> getting spent in the planner, in get_eclass_for_sort_expr().
> So this is likely just a variant of performance issues we've
> seen before.  (I do wonder why we're not able to prune the
> join to just the matching PK partition, though.)
>

Consider this example
postgres=# create table t1 (a int, b int, CHECK (a between 100 and 150));
CREATE TABLE
postgres=# create table part(a int, b int) partition by range(a);
CREATE TABLE
postgres=# create table part_p1 partition of part for values from (0) to (50);
CREATE TABLE
postgres=# create table part_p2 partition of part for values from (50) to (100);
CREATE TABLE
postgres=# create table part_p3 partition of part for values from
(100) to (150);
CREATE TABLE
postgres=# create table part_p4 partition of part for values from
(150) to (200);
CREATE TABLE
postgres=# explain (costs off) select * from t1 r1, part r2 where r1.a = r2.a;
  QUERY PLAN
--
 Hash Join
   Hash Cond: (r2.a = r1.a)
   ->  Append
 ->  Seq Scan on part_p1 r2_1
 ->  Seq Scan on part_p2 r2_2
 ->  Seq Scan on part_p3 r2_3
 ->  Seq Scan on part_p4 r2_4
   ->  Hash
 ->  Seq Scan on t1 r1
(9 rows)

Given that t1.a can not have any value less than 100 and greater than
150, any row in t1 won't have its joining partner in part_p1 and
part_p2. So those two partitions can be pruned. But I think we don't
consider the constraints on table when joining two tables to render a
join empty or even prune partitions. That would be a good optimization
which will improve this case as well.

But further to that, I think when we add constraint on the partition
table which translates to constraints on individual partitions, we
should check the entire partitioned relation rather than individual
partitions. If we do that, we won't need to plan query for every
partition. If the foreign key happens to be partition key e.g in star
schema, this will use partitionwise join to further improve query
performance. Somewhere in future, we will be able to repartition the
foreign key table by foreign key and perform partitionwise join.

-- 
Best Wishes,
Ashutosh Bapat

Re: autovac issue with large number of tables

2020-09-04 Thread Kasahara Tatsuhito

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
 wrote:
> > I wonder if we could have table_recheck_autovac do two probes of the stats
> > data.  First probe the existing stats data, and if it shows the table to
> > be already vacuumed, return immediately.  If not, *then* force a stats
> > re-read, and check a second time.
> Does the above mean that the second and subsequent table_recheck_autovac()
> will be improved to first check using the previous refreshed statistics?
> I think that certainly works.
>
> If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
  - SET autovacuum = off
  - CREATE tables with 100 rows
  - DELETE 90 rows for each tables
  - SET autovacuum = on and restart PostgreSQL
  - Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
  - CREATE brank tables
  - SELECT all of these tables (for generate stats)
  - SET autovacuum_freeze_max_age to low values and restart PostgreSQL
  - Consumes a lot of XIDs by using txid_curent()
  - Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 1 and 2.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===
[1.Normal VACUUM case]
 tables:1000
  autovacuum_max_workers 1:   (HEAD) 20 sec VS (with patch)  20 sec
  autovacuum_max_workers 2:   (HEAD) 18 sec VS (with patch)  16 sec
  autovacuum_max_workers 3:   (HEAD) 18 sec VS (with patch)  16 sec
  autovacuum_max_workers 5:   (HEAD) 19 sec VS (with patch)  17 sec
  autovacuum_max_workers 10:  (HEAD) 19 sec VS (with patch)  17 sec

 tables:5000
  autovacuum_max_workers 1:   (HEAD) 77 sec VS (with patch)  78 sec
  autovacuum_max_workers 2:   (HEAD) 61 sec VS (with patch)  43 sec
  autovacuum_max_workers 3:   (HEAD) 38 sec VS (with patch)  38 sec
  autovacuum_max_workers 5:   (HEAD) 45 sec VS (with patch)  37 sec
  autovacuum_max_workers 10:  (HEAD) 43 sec VS (with patch)  35 sec

 tables:1
  autovacuum_max_workers 1:   (HEAD) 152 sec VS (with patch)  153 sec
  autovacuum_max_workers 2:   (HEAD) 119 sec VS (with patch)   98 sec
  autovacuum_max_workers 3:   (HEAD)  87 sec VS (with patch)   78 sec
  autovacuum_max_workers 5:   (HEAD) 100 sec VS (with patch)   66 sec
  autovacuum_max_workers 10:  (HEAD)  97 sec VS (with patch)   56 sec

 tables:2
  autovacuum_max_workers 1:   (HEAD) 338 sec VS (with patch)  339 sec
  autovacuum_max_workers 2:   (HEAD) 231 sec VS (with patch)  229 sec
  autovacuum_max_workers 3:   (HEAD) 220 sec VS (with patch)  191 sec
  autovacuum_max_workers 5:   (HEAD) 234 sec VS (with patch)  147 sec
  autovacuum_max_workers 10:  (HEAD) 320 sec VS (with patch)  113 sec

[2.Anti wrap round VACUUM case]
 tables:1000
  autovacuum_max_workers 1:   (HEAD) 19 sec VS (with patch) 18 sec
  autovacuum_max_workers 2:   (HEAD) 14 sec VS (with patch) 15 sec
  autovacuum_max_workers 3:   (HEAD) 14 sec VS (with patch) 14 sec
  autovacuum_max_workers 5:   (HEAD) 14 sec VS (with patch) 16 sec
  autovacuum_max_workers 10:  (HEAD) 16 sec VS (with patch) 14 sec

 tables:5000
  autovacuum_max_workers 1:   (HEAD) 69 sec VS (with patch) 69 sec
  autovacuum_max_workers 2:   (HEAD) 66 sec VS (with patch) 47 sec
  autovacuum_max_workers 3:   (HEAD) 59 sec VS (with patch) 37 sec
  autovacuum_max_workers 5:   (HEAD) 39 sec VS (with patch) 28 sec
  autovacuum_max_workers 10:  (HEAD) 39 sec VS (with patch) 29 sec

 tables:1
  autovacuum_max_workers 1:   (HEAD) 139 sec VS (with patch) 138 sec
  autovacuum_max_workers 2:   (HEAD) 130 sec VS (with patch)  86 sec
  autovacuum_max_workers 3:   (HEAD) 120 sec VS (with patch)  68 sec
  autovacuum_max_workers 5:   (HEAD)  96 sec VS (with patch)  41 sec
  autovacuum_max_workers 10:  (HEAD)  90 sec VS (with patch)  39 sec

 tables:2
  autovacuum_max_workers 1:   (HEAD) 313 sec VS (with patch) 331 sec

Re: Rare deadlock failure in create_am test

2020-09-04 Thread Thomas Munro

On Fri, Sep 4, 2020 at 2:13 PM Tom Lane  wrote:
> (There might be more such failures, but I only looked back six months,
> and these two are all I found in that window.)

Right:

 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela=2020-09-03%2023:00:36
 | HEAD  | DragonflyBSD
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing=2020-04-08%2004:38:34
   | HEAD  | Debian
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern=2020-03-30%2013:54:05
  | HEAD  | AIX
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison=2020-03-24%2022:00:23
| HEAD  | Raspbian
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae=2020-03-03%2016:10:06
 | HEAD  | Debian
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin=2019-12-11%2023:59:06
   | HEAD  | macOS
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin=2019-12-11%2022:09:07
   | HEAD  | macOS
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren=2019-12-10%2003:46:12
 | REL9_6_STABLE |
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill=2019-07-10%2023:19:12
  | REL9_6_STABLE | AIX
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dangomushi=2018-06-23%2018:00:15
| HEAD  | Arch

> I'm inclined to think that the best fix is to put
>
> begin;
> lock table tenk2;
> ...
> commit;

Makes sense, except s/tenk2/fast_emp4000/, no?

Re: INSERT ON CONFLICT and RETURNING

2020-09-04 Thread Konstantin Knizhnik





On 03.09.2020 19:56, Geoff Winkless wrote:

On Mon, 31 Aug 2020 at 14:53, Konstantin Knizhnik
 wrote:

If we are doing such query:

INSERT INTO jsonb_schemas (schema) VALUES (obj_schema)
ON CONFLICT (schema) DO UPDATE schema=jsonb_schemas.schema RETURNING id


Then as far as I understand no extra lookup is used to return ID:

The conflict resolution checks the unique index on (schema) and
decides whether or not a conflict will exist. For DO NOTHING it
doesn't have to get the actual row from the table; however in order
for it to return the ID it would have to go and get the existing row
from the table. That's the "extra lookup", as you term it. The only
difference from doing it with RETURNING id versus WITH... COALESCE()
as you described is the simpler syntax.

Sorry, but there is no exrta lookup in this case.
By "lookup" I mean index search.
What we are doing in case ON CONFLICT SELECT is just fetching tuple from 
the buffer.

So we are not even loading any data from the disk.

By in case

   with ins as (insert into jsonb_schemas (schema) values (obj_schema) 
on conflict(schema) do nothing returning id)
   select coalesce((select id from ins),(select id from jsonb_schemas 
where   schema=obj_schema));


we actually execute extra subquery: select id from jsonb_schemas where 
schema=obj_schema:


explain with ins as (insert into jsonb_schemas (schema) values ('some') 
on conflict(schema) do nothing returning id) select coalesce((select id 
from ins),(select id from jsonb_schemas where schema='some'));

   QUERY PLAN

 Result  (cost=8.21..8.21 rows=1 width=4)
   CTE ins
 ->  Insert on jsonb_schemas  (cost=0.00..0.01 rows=1 width=36)
   Conflict Resolution: NOTHING
   Conflict Arbiter Indexes: jsonb_schemas_pkey
   ->  Result  (cost=0.00..0.01 rows=1 width=36)
   InitPlan 2 (returns $2)
 ->  CTE Scan on ins  (cost=0.00..0.02 rows=1 width=4)
   InitPlan 3 (returns $3)
 ->  Index Scan using jsonb_schemas_pkey on jsonb_schemas 
jsonb_schemas_1  (cost=0.15..8.17 rows=1 width=4)

   Index Cond: (schema = '\x736f6d65'::bytea)

Is it critical?
At my system average time of executing this query is 104 usec, and with 
ON CONFLICT SELECT fix - 82 usec.
The difference is no so large, because we in any case insert speculative 
tuple.

But it is incorrect to say that "it's not inherently any less efficient."


I'm not saying the simpler syntax isn't nice, mind you. I was just
pointing out that it's not inherently any less efficient.

Geoff




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: POC: rational number type (fractions)

2020-09-04 Thread Heikki Linnakangas

The discussion diverged somewhat to PGXN and extensions in general, but 
the consensus seems to be that this should (continue to) be an extension 
rather than a core feature. I agree that as an extension this is pretty 
cool. I'll mark this as rejected in the commitfest app.


Looking at the patch, it looks well-structured and commented.

- Heikki

Division in dynahash.c due to HASH_FFACTOR

2020-09-04 Thread Jakub Wartak


Greetins hackers,

I have mixed feelings if this welcome contribution as the potential gain is 
relatively small in my tests, but still I would like to point out that 
HASH_FFACTOR functionality from dynahash.c could be removed or optimized 
(default fill factor is always 1, there's not a single place that uses custom 
custom fill factor other than DEF_FFACTOR=1 inside PostgreSQL repository). 
Because the functionality is present there seems to be division for every 
buffer access [BufTableLookup()] / or every smgropen() call (everything call to 
hash_search() is affected, provided it's not ShmemInitHash/HASH_PARTITION). 
This division is especially visible via perf on single process StartupXLOG WAL 
recovery process on standby in heavy duty 100% CPU conditions , as the top1 is 
inside hash_search:
   0x00888751 <+449>:   idiv   r8
   0x00888754 <+452>:   cmp    rax,QWORD PTR [r15+0x338] <<-- in perf 
annotate shows as 30-40%, even on default -O2, probably CPU pipelining for idiv 
above

I've made a PoC test to skip that division assuming ffactor would be gone:
   if (!IS_PARTITIONED(hctl) && !hashp->frozen &&
-                       hctl->freeList[0].nentries / (long) (hctl->max_bucket + 
1) >= hctl->ffactor &&
+                       hctl->freeList[0].nentries >= (long) (hctl->max_bucket 
+ 1) &&

For a stream of WAL 3.7GB I'm getting consistent improvement of ~4%, (yes I 
know it's small, that's why I'm having mixed feelings):
gcc -O3: 104->100s
gcc -O2: 108->104s
pgbench -S -c 16 -j 4 -T 30 -M prepared: stays more or less the same (-s 100), 
so no positive impact there

After removing HASH_FFACTOR PostgreSQL still compiles...  Would removing it 
break some external API/extensions ? I saw several optimization for the "idiv" 
where it could be optimized e.g. see 
https://github.com/ridiculousfish/libdivide  Or maybe there is some other idea 
to expose bottlenecks of BufTableLookup() ? I also saw codepath 
PinBuffer()->GetPrivateRefCountEntry() -> dynahash that could be called pretty 
often I have no idea what kind of pgbench stresstest could be used to 
demonstrate the gain (or lack of it).

-Jakub Wartak.

Re: [PATCH] Detect escape of ErrorContextCallback stack pointers (and from PG_TRY() )

2020-09-04 Thread Craig Ringer

On Fri, 4 Sep 2020 at 14:13, Craig Ringer  wrote:

>
> I actually had a pretty good look around for static analysis options to
> see if I could find anything that might help us out before I landed up with
> this approach.
>

Apparently not good enough.

https://clang.llvm.org/docs/analyzer/checkers.html#core-stackaddressescape-c

Using a test program:

return_stack_escape.c:14:3: warning: Address of stack memory associated
with local variable 'g' is still referred to by the global variable
'guard_ptr' upon returning to the caller.  This will be a dangling reference
return do_fail;
^~
1 warning generated.

so ... that's interesting. I'll need to do some checking and verify that
it's effective on the actual problem I originally had, but if so, I shall
proceed with kicking myself now.

Handily, the same thing can be used to detect PG_TRY() escapes.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise

Re: Compatible defaults for LEAD/LAG

2020-09-04 Thread Pavel Stehule

po 31. 8. 2020 v 7:05 odesílatel Pavel Stehule 
napsal:

>
>
> ne 30. 8. 2020 v 23:59 odesílatel Tom Lane  napsal:
>
>> Pavel Stehule  writes:
>> > This is nice example of usage of anycompatible type (that is consistent
>> > with other things in Postgres), but standard says something else.
>> > It can be easily solved with https://commitfest.postgresql.org/28/2081/
>> -
>> > but Tom doesn't like this patch.
>> > I am more inclined to think so this feature should be implemented
>> > differently - there is no strong reason to go against standard or
>> against
>> > the implementations of other databases (and increase the costs of
>> porting).
>> > Now the implementation is limited, but allowed behaviour is 100% ANSI.
>>
>> I don't particularly buy this argument.  The case at hand is what to do
>> if we have, say,
>>
>> select lag(integer_column, 1, 1.2) over ...
>>
>> The proposed patch would result in the output being of type numeric,
>> and any rows using the default would show "1.2".  The spec says that
>> the right thing is to return integer, and we should round the default
>> to "1" to make that work.  But
>>
>> (1) I doubt that anybody actually writes such things;
>>
>> (2) For anyone who does write it, the spec's behavior fails to meet
>> the principle of least surprise.  It is not normally the case that
>> any information-losing cast would be applied silently within an
>> expression.
>>
>
> postgres=# create table foo(a int);
> CREATE TABLE
> postgres=# insert into foo values(1.1);
> INSERT 0 1
>
> postgres=# create table foo(a int default 1.1);
> CREATE TABLE
> postgres=# insert into foo values(default);
> INSERT 0 1
> postgres=# select * from foo;
> ┌───┐
> │ a │
> ╞═══╡
> │ 1 │
> └───┘
> (1 row)
>
> It is maybe strange, but it is not an unusual pattern in SQL. I think it
> can be analogy with default clause
>
> DECLARE a int DEFAULT 1.2;
>
> The default value doesn't change a type of variable. This is maybe too
> stupid example. There can be other little bit more realistic
>
> CREATE OR REPLACE FUNCTION foo(a numeric, b numeric, ...
> DECLARE x int DEFAULT a;
> BEGIN
>   ...
>
> I am afraid about performance - if default value can change type, then
> some other things can stop work - like using index.
>
> For *this* case we don't speak about some operations between two
> independent operands or function arguments. We are speaking about
> the value and about a *default* for the value.
>
>
>> So this deviation from spec doesn't bother me; we have much bigger ones.
>>
>
> ok,  if it is acceptable for other people, I can accept it too - I
> understand well so it is a corner case and there is some consistency with
> other Postgres features.
>
> Maybe this difference should be mentioned in documentation.
>

I thought more about this problem, and I think so ANSI specification is
semantically fully correct - it is consistent with application of default
value elsewhere in SQL environment.

In this case the optional argument is not "any" expression. It is the
default value for some expression . In other cases we usually use forced
explicit cast.

Unfortunately we do not have good tools for generic implementation of this
situation. In other cases there the functions have special support in
parser for this case (example xmltable)

I see few possibilities how to finish and close this issue:

1. use anycompatible type and add note to documentation so expression of
optional argument can change a result type, and so this is Postgres
specific - other databases and ANSI SQL disallow this.
It is the most simple solution, and it is good enough for this case, that
is not extra important.

2. choose the correct type somewhere inside the parser - for these two
functions.

3. introduce and implement some generic solution for this case - it can be
implemented on C level via some function helper or on syntax level

   CREATE OR REPLACE FUNCTION lag(a anyelement, offset int, default defval
a%type) ...

"defval argname%type" means for caller's expression "CAST(defval AS
typeof(argname))"

@3 can be a very interesting and useful feature, but it needs an agreement
and harder work
@2 this is 100% correct solution without hard work (but I am not sure if
there can be an agreement on this implementation)
@1 it is good enough for this issue without harder work and probably there
we can find an agreement simply.

Regards

Pavel

Re: [PATCH] Detect escape of ErrorContextCallback stack pointers (and from PG_TRY() )

2020-09-04 Thread Craig Ringer

On Thu, 3 Sep 2020 at 22:28, Tom Lane  wrote:

> Craig Ringer  writes:
> > The attached patch series adds support for detecting coding errors where
> a
> > stack-allocated ErrorContextCallback is not popped from the
> > error_context_stack before the variable leaves scope.
>
> So my immediate thoughts about this are
>
> (1) It's mighty invasive for what it accomplishes.  AFAIK we have
> had few of this class of bug, and so I'm not excited about touching
> every callback use in the tree to add defenses.  It's also not great
> that future code additions won't be protected unless they remember
> to add these magic annotations.  The PG_TRY application is better since
> it's wrapped into the existing macro.
>

I agree with you there. I'd actually like to change how we set up
errcontext callbacks anyway, as I think they're ugly, verbose and
error-prone.

See patch 6 for a RFC on that.

With regards to PG_TRY() you may note that patch 5 adds a similar check to
detect escapes from PG_TRY() / PG_CATCH() bodies.

(2) I don't like that this is adding runtime overhead to try to detect
> such bugs.
>

Right. Only in cassert builds, though.

Frankly I'd be happy enough even having it as something I can turn on when
I wanted it. I've had a heck of a time tracking down errcontext escapes in
the past. I wrote it for an extension, then figured I'd submit it to core
and see if anyone wanted it.

Even if we don't adopt it in PostgreSQL, now it's out there to help out
others debugging similar issues.

(3) I think it's a complete failure that such a bug will only be
> detected if the faulty code path is actually taken.  I think it's
> quite likely that any such bugs that are lurking are in "can't
> happen", or at least hard-to-hit, corner cases --- if they were in
> routinely tested paths, we'd have noticed them by now.  So it'd be far
> more helpful if we had a static-analysis way to detect such issues.
>
> Thinking about (3), I wonder whether there's a way to instruct Coverity
> to detect such errors.
>

I think calling it a "complete failure" is ridiculous. By the same
rationale, we shouldn't bother with Assert(...) at all. But I agree that
it's definitely far from as good as having a statically verifiable check
would be, and this *should* be something we can statically verify.

I actually had a pretty good look around for static analysis options to see
if I could find anything that might help us out before I landed up with
this approach.

The sticking point is the API we use. By using auto stack variables (and
doing so in pure C where they cannot have destructors) and using direct
assignments to globals, we cannot use the majority of static analysis
annotations since they tend to be aimed at function-based APIs. And for
some reason most static analysis tools appear to be poor at discovering
escapes of pointers to stack variables leaving scope, at least unless you
use annotated functions that record ownership transfers. Which we can't
because ... direct assignment.

It's one of the reasons I want to wrap the errcontext API per patch 6 in
the above series. The current API is extremely verbose, hard to validate
and check, and difficult to apply static analysis to.

If we had error context setup/teardown macros we could implement them using
static inline functions and annotate them as appropriate for the target
toolchain and available static analysis tools.

So what do you think of patch 6?

I'll try tweaking the updated API to add annotateable functions and see if
that helps static analysis tools detect issues, then report back.

I personally think it'd be well worth adopting a wrapped API and
simultaneously renaming ErrorContextCallback to ErrorContextCallbackData to
ensure that code must be updated to compile with the changes. It'd be
simple to provide a backwards compatibility header that extension authors
can copy into their trees so they can use the new API when building against
old postgres, greatly reducing the impact on extension authors. (We do that
sort of thing constantly in pglogical and BDR).

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise

81 matches

Mail list logo