Re: segfault tied to "IS JSON predicate" commit

2023-04-19 Thread Peter Geoghegan
On Sat, Apr 15, 2023 at 5:15 PM Peter Geoghegan  wrote:
> ISTM that b6a0d469ca has created an unmet need for a "--suite
> setup-running", which is analogous to "--suite setup" but works with
> "--setup running". That way there'd at least be a
> "postgresql:setup-running / install_test_files" test that could be
> used here, like so:
>
> $ meson test --setup running --suite setup-running --suite
> test_rls_hooks-running
>
> But...maybe it would be better to find a way to install the stuff from
> "postgresql:setup / install_test_files" in a less kludgy, more
> standard kind of way? I see that the commit message from b6a0d469ca
> says "there is no way to set up up the build system to install extra
> things only when told". Is that *really* the case?

I see that CI deals with this problem using this kludge on FreeBSD,
which tests "--setup running":

  meson test $MTEST_ARGS --quiet --suite setup
  export 
LD_LIBRARY_PATH="$(pwd)/build/tmp_install/usr/local/pgsql/lib/:$LD_LIBRARY_PATH"

That's why CI never failed due to commit b6a0d469ca.

This doesn't seem like something that should become standard operating
procedure. Not that it is right now, mind you. This isn't documented
anywhere, even though "--setup running" is documented (albeit lightly)
in the sgml docs.

-- 
Peter Geoghegan




Re: segfault tied to "IS JSON predicate" commit

2023-04-15 Thread Peter Geoghegan
On Sat, Apr 15, 2023 at 4:11 PM Peter Geoghegan  wrote:
> $ meson test --setup tmp_install --list | grep install
> postgresql:setup / tmp_install
> postgresql:setup / install_test_files
>
> But not the running setup:
>
> $ meson test --setup running --list | grep install | wc -l
> 0

There is a concrete problem here: commit b6a0d469ca ("meson: Prevent
installation of test files during main install") overlooked "--setup
running". It did not add a way for the setup to run "postgresql:setup
/ install_test_files" (or perhaps something very similar).

The segfault must have been caused by unwitting use of a leftover
ancient test_rls_hooks.so from before commit b6a0d469ca. My old stale
.so must have continued to work for a little while, before it broke.
Now that I've fully deleted my install directory, I can see a clear
problem, which is much less mysterious than the segfault. Namely, the
following doesn't still work:

$ meson test --setup running --suite test_rls_hooks-running

This time it's not a segfault, though -- it's due to the .so being
unavailable. Adding "--suite setup" fixes nothing, since I'm using
"--setup running"; while the "--suite running" tests will actually run
and install the .so, they won't install it into the installation
directory I'm actually using (only into a tmp_install directory).
While I was wrong to implicate commit 6ee30209 (the IS JSON commit) at
first, there is a bug here. A bug in b6a0d469ca.

ISTM that b6a0d469ca has created an unmet need for a "--suite
setup-running", which is analogous to "--suite setup" but works with
"--setup running". That way there'd at least be a
"postgresql:setup-running / install_test_files" test that could be
used here, like so:

$ meson test --setup running --suite setup-running --suite
test_rls_hooks-running

But...maybe it would be better to find a way to install the stuff from
"postgresql:setup / install_test_files" in a less kludgy, more
standard kind of way? I see that the commit message from b6a0d469ca
says "there is no way to set up up the build system to install extra
things only when told". Is that *really* the case?

-- 
Peter Geoghegan




Re: segfault tied to "IS JSON predicate" commit

2023-04-15 Thread Peter Geoghegan
On Sat, Apr 15, 2023 at 2:46 PM Justin Pryzby  wrote:
> I think what happened is that you (and I) are in the habbit of running
> "meson test tmp_install" to compile new binaries and install them into
> ./tmp_install, and then run a server out from there.

That's not my habit; this is running against a server that was
installed into a dedicated install directory. Though I agree that an
issue with the environment seems likely.

> But nowadays
> there's also "meson test install_test_files".

That only applies with "--setup tmp_install", which is the default
test setup, and the one that you must be using implicitly. But I'm
using "--setup running" for this.

More concretely, the tmp_install setup has the tests you say are requirements:

$ meson test --setup tmp_install --list | grep install
postgresql:setup / tmp_install
postgresql:setup / install_test_files

But not the running setup:

$ meson test --setup running --list | grep install | wc -l
0

I'm aware of the requirement around specifying "--suite tmp_install
..." right before "... --suite what_i_really_want_to_test" is
specified. However, I can't see how it could be my fault for
forgetting that, since it's structurally impossible to specify
"--suite tmp_install" when using "--setup running". I was using the
setup that gives you behavior that's approximately equivalent to "make
installcheck" (namely "--setup running"), so naturally this would have
been impossible.

Let's review:

* There are two possible --setup modes. I didn't use the default
(which is "--setup tmp_install") here. Rather, I used "--setup
running", which is kinda like "make installcheck".

* There is a test suite named "setup", though it's only available with
"--setup tmp_install", the default setup. (This is not to be confused
with the meson-specific notion of a --setup.)

* The "setup" suite happens to contain an individual test called
"tmp_install"  (as well as one called "install_test_files")

* I cannot possibly have forgotten this, since asking for it with
"--setup running" just doesn't work.

Let's demonstrate what I mean. The following does not and cannot
work, so I cannot have forgotten to do it in any practical sense:

$ meson test --setup running postgresql:setup / tmp_install
ninja: no work to do.
No suitable tests defined.

Such an incantation can only be expected to work with --setup tmp_install, the
default. So this version does work:

$ meson test --setup tmp_install postgresql:setup / tmp_install
**SNIP**
1/1 postgresql:setup / tmp_installOK   0.72s
**SNIP**

Not confusing at all!

--
Peter Geoghegan




Re: segfault tied to "IS JSON predicate" commit

2023-04-15 Thread Justin Pryzby
On Thu, Apr 13, 2023 at 09:14:01PM -0700, Peter Geoghegan wrote:
> I find that if I run the following test against a standard debug build
> on HEAD, my local installation reliably segfaults:
> 
> $ meson test --setup running --suite test_rls_hooks-running
> 
> Attached is a "bt full" run from gdb against a core dump. The query
> "EXPLAIN (costs off) SELECT * FROM rls_test_permissive;" runs when the
> backend segfaults.
> 
> The top frame of the back trace is suggestive of a use-after-free:
> 
> #0  copyObjectImpl (from=0x7f7f7f7f7f7f7f7e) at copyfuncs.c:187
> 187 switch (nodeTag(from))
> ...
> 
> "git bisect" suggests that the problem began at commit 6ee30209,
> "SQL/JSON: support the IS JSON predicate".
> 
> It's a bit surprising that the bug reproduces when I run a standard
> test, and yet we appear to have a bug that's about 2 weeks old.  There
> may be something unusual about my system that will turn out to be
> relevant -- though there is nothing particularly exotic about this
> machine. My repro doesn't rely on concurrent execution, or timing, or
> anything like that -- it's quite reliable.

I was able to reproduce this yesterday but not today.

I think what happened is that you (and I) are in the habbit of running
"meson test tmp_install" to compile new binaries and install them into
./tmp_install, and then run a server out from there.  But nowadays
there's also "meson test install_test_files".  I'm not sure what
combination of things are out of sync, but I suspect you forgot one of
0) compile *and* install the new binaries; or 1) restart the running
postmaster; or, 2) install the new shared library ("test files").

I saw the crash again when I did this:

time ninja
time meson test tmp_install install_test_files regress/regress # does not 
recompile, BTW
./tmp_install/usr/local/pgsql/bin/postgres -D 
./testrun/regress/regress/tmp_check/data -p 5678 -c autovacuum=no&
git checkout HEAD~222
time meson test tmp_install install_test_files
time PGPORT=5678 meson test --setup running test_rls_hooks-running/regress

In this case, I'm not sure if there's anything to blame meson for; the
issue is running server, which surely has different structures since
last month.

-- 
Justin




segfault tied to "IS JSON predicate" commit

2023-04-13 Thread Peter Geoghegan
I find that if I run the following test against a standard debug build
on HEAD, my local installation reliably segfaults:

$ meson test --setup running --suite test_rls_hooks-running

Attached is a "bt full" run from gdb against a core dump. The query
"EXPLAIN (costs off) SELECT * FROM rls_test_permissive;" runs when the
backend segfaults.

The top frame of the back trace is suggestive of a use-after-free:

#0  copyObjectImpl (from=0x7f7f7f7f7f7f7f7e) at copyfuncs.c:187
187 switch (nodeTag(from))
...

"git bisect" suggests that the problem began at commit 6ee30209,
"SQL/JSON: support the IS JSON predicate".

It's a bit surprising that the bug reproduces when I run a standard
test, and yet we appear to have a bug that's about 2 weeks old.  There
may be something unusual about my system that will turn out to be
relevant -- though there is nothing particularly exotic about this
machine. My repro doesn't rely on concurrent execution, or timing, or
anything like that -- it's quite reliable.

-- 
Peter Geoghegan
2023-04-13 20:56:32.345 PDT [72281][postmaster] LOG:  redirecting log output to 
logging collector process
2023-04-13 20:56:32.345 PDT [72281][postmaster] HINT:  Future log output will 
appear in directory "log".
   PID: 72308 (postgres)
   UID: 1000 (pg)
   GID: 1000 (pg)
Signal: 11 (SEGV)
 Timestamp: Thu 2023-04-13 20:56:42 PDT (1s ago)
  Command Line: $'postgres: pg regression_test_rls_hooks [local] EXPLAIN'
Executable: /mnt/nvme/postgresql/patch/install/bin/postgres
 Control Group: /system.slice/ssh.service
  Unit: ssh.service
 Slice: system.slice
   Boot ID: 73f38b1da37d4e2eba4b9c317530325b
Machine ID: 68e529f9438c4d718bc772fcb4e3753d
  Hostname: horse
   Storage: 
/var/lib/systemd/coredump/core.postgres.1000.73f38b1da37d4e2eba4b9c317530325b.72308.168144460200.zst
 (present)
  Size on Disk: 1.6M
   Message: Process 72308 (postgres) of user 1000 dumped core.

Stack trace of thread 72308:
#0  0x5608fc99b3c0 copyObjectImpl (postgres + 0x5ad3c0)
#1  0x5608fc99d6f0 _copyJoinExpr (postgres + 0x5af6f0)
#2  0x5608fc99b68b copyObjectImpl (postgres + 0x5ad68b)
#3  0x5608fc99dba2 _copyA_Expr (postgres + 0x5afba2)
#4  0x5608fc99b6e6 copyObjectImpl (postgres + 0x5ad6e6)
#5  0x7fdcc1533371 test_rls_hooks_permissive 
(test_rls_hooks.so + 0x1371)
#6  0x5608fc7cb32b get_policies_for_relation (postgres + 
0x3dd32b)
#7  0x5608fc7cb79f get_row_security_policies (postgres + 
0x3dd79f)
#8  0x5608fc7c4727 fireRIRrules (postgres + 0x3d6727)
#9  0x5608fc7c78f3 QueryRewrite (postgres + 0x3d98f3)
#10 0x5608fc5e9399 ExplainQuery (postgres + 0x1fb399)
#11 0x5608fc8193fd standard_ProcessUtility (postgres + 
0x42b3fd)
#12 0x5608fc8198a2 ProcessUtility (postgres + 0x42b8a2)
#13 0x5608fc816cc4 PortalRunUtility (postgres + 0x428cc4)
#14 0x5608fc817112 FillPortalStore (postgres + 0x429112)
#15 0x5608fc8174e7 PortalRun (postgres + 0x4294e7)
#16 0x5608fc8133d2 exec_simple_query (postgres + 0x4253d2)
#17 0x5608fc8153af PostgresMain (postgres + 0x4273af)
#18 0x5608fc77d946 BackendRun (postgres + 0x38f946)
#19 0x5608fc77fa4a BackendStartup (postgres + 0x391a4a)
#20 0x5608fc77fbec ServerLoop (postgres + 0x391bec)
#21 0x5608fc7811ee PostmasterMain (postgres + 0x3931ee)
#22 0x5608fc6c405e main (postgres + 0x2d605e)
#23 0x7fdcc084618a __libc_start_call_main (libc.so.6 + 
0x2718a)
#24 0x7fdcc0846245 __libc_start_main_impl (libc.so.6 + 
0x27245)
#25 0x5608fc4bfea1 _start (postgres + 0xd1ea1)
ELF object binary architecture: AMD x86-64

GNU gdb (Debian 13.1-2) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /mnt/nvme/postgresql/patch/install/bin/postgres...

warning: Can't open file /dev/zero (deleted) during file-backed mapping note 
processing

warning: Can't open