Your message dated Wed, 8 Sep 2021 17:28:18 +0200 with message-id <[email protected]> and subject line Re: Bug#993848: postgresql-13: A SELECT query with cursor can cause segmentation fault has caused the Debian Bug report #993848, regarding postgresql-13: A SELECT query with cursor can cause segmentation fault to be marked as done.
This means that you claim that the problem has been dealt with. If this is not the case it is now your responsibility to reopen the Bug report if necessary, and/or fix the problem forthwith. (NB: If you are a system administrator and have no idea what this message is talking about, this may indicate a serious mail system misconfiguration somewhere. Please contact [email protected] immediately.) -- 993848: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=993848 Debian Bug Tracking System Contact [email protected] with problems
--- Begin Message ---Package: postgresql-13 Version: 13.4-1.pgdg100+1 Severity: normal Dear Maintainer, a slightly sophisticated SELECT query with a CURSOR can lead to postgresql server segmentation fault. LOG: server process (PID 10722) was terminated by signal 11: Segmentation fault DETAIL: Failed process was running: COMMIT LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. Here's a backtrace retrieved from core dump: (gdb) bt #0 slot_deform_heap_tuple (natts=5, offp=0x557cc2e8ffd0, tuple=<optimized out>, slot=0x557cc2e8ff88) at ./build/../src/backend/executor/execTuples.c:930 #1 tts_buffer_heap_getsomeattrs (slot=0x557cc2e8ff88, natts=5) at ./build/../src/backend/executor/execTuples.c:695 #2 0x0000557cc1d3998c in slot_getsomeattrs_int (slot=slot@entry=0x557cc2e8ff88, attnum=5) at ./build/../src/backend/executor/execTuples.c:1912 #3 0x0000557cc1d28fba in slot_getsomeattrs (attnum=<optimized out>, slot=0x557cc2e8ff88) at ./build/../src/include/executor/tuptable.h:344 #4 ExecInterpExpr (state=0x557cc303c538, econtext=0x557cc2d08178, isnull=<optimized out>) at ./build/../src/backend/executor/execExprInterp.c:482 #5 0x0000557cc1d5548d in ExecEvalExprSwitchContext (isNull=0x7ffdd2599507, econtext=0x557cc2d08178, state=0x557cc303c538) at ./build/../src/include/executor/executor.h:322 #6 ExecQual (econtext=0x557cc2d08178, state=0x557cc303c538) at ./build/../src/include/executor/executor.h:391 #7 MJFillInner (node=0x557cc2d07f68) at ./build/../src/backend/executor/nodeMergejoin.c:494 #8 0x0000557cc1d55ce8 in ExecMergeJoin (pstate=0x557cc2d07f68) at ./build/../src/backend/executor/nodeMergejoin.c:1353 #9 0x0000557cc1d2cc83 in ExecProcNode (node=0x557cc2d07f68) at ./build/../src/include/executor/executor.h:248 #10 ExecutePlan (execute_once=<optimized out>, dest=0x557cc2ed8f60, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x557cc2d07f68, estate=0x557cc2d07d08) at ./build/../src/backend/executor/execMain.c:1632 #11 standard_ExecutorRun (queryDesc=0x557cc2ed8ed0, direction=<optimized out>, count=0, execute_once=<optimized out>) at ./build/../src/backend/executor/execMain.c:350 #12 0x00007f0ec05ae09d in pgss_ExecutorRun (queryDesc=0x557cc2ed8ed0, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at ./build/../contrib/pg_stat_statements/pg_stat_statements.c:1045 #13 0x0000557cc1cdbcd4 in PersistHoldablePortal (portal=portal@entry=0x557cc2d489f8) at ./build/../src/backend/commands/portalcmds.c:407 #14 0x0000557cc1ff95f9 in HoldPortal (portal=portal@entry=0x557cc2d489f8) at ./build/../src/backend/utils/mmgr/portalmem.c:642 #15 0x0000557cc1ff9e7d in PreCommit_Portals (isPrepare=isPrepare@entry=false) at ./build/../src/backend/utils/mmgr/portalmem.c:738 #16 0x0000557cc1c001c4 in CommitTransaction () at ./build/../src/backend/access/transam/xact.c:2087 #17 0x0000557cc1c015d5 in CommitTransactionCommand () at ./build/../src/backend/access/transam/xact.c:3085 #18 0x0000557cc1ea211d in finish_xact_command () at ./build/../src/backend/tcop/postgres.c:2662 #19 0x0000557cc1ea4703 in exec_simple_query (query_string=0x557cc2c9cd28 "COMMIT") at ./build/../src/backend/tcop/postgres.c:1264 #20 0x0000557cc1ea6143 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x557cc2cf7af8, dbname=<optimized out>, username=<optimized out>) at ./build/../src/backend/tcop/postgres.c:4339 #21 0x0000557cc1e25bcd in BackendRun (port=0x557cc2ce9650, port=0x557cc2ce9650) at ./build/../src/backend/postmaster/postmaster.c:4526 #22 BackendStartup (port=0x557cc2ce9650) at ./build/../src/backend/postmaster/postmaster.c:4210 #23 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1739 #24 0x0000557cc1e26b41 in PostmasterMain (argc=5, argv=<optimized out>) at ./build/../src/backend/postmaster/postmaster.c:1412 #25 0x0000557cc1b70f4f in main (argc=5, argv=0x557cc2c96c30) at ./build/../src/backend/main/main.c:210 (gdb) frame 11 #11 standard_ExecutorRun (queryDesc=0x557cc2ed8ed0, direction=<optimized out>, count=0, execute_once=<optimized out>) at ./build/../src/backend/executor/execMain.c:350 350 ./build/../src/backend/executor/execMain.c: No such file or directory. (gdb) p queryDesc $1 = (QueryDesc *) 0x557cc2ed8ed0 (gdb) p *(QueryDesc *) 0x557cc2ed8ed0 $2 = {operation = CMD_SELECT, plannedstmt = 0x557cc2faef08, sourceText = 0x557cc2ed8a70 "\n", ' ' <repeats 12 times>, "DECLARE \"categoryPagePhotoUrl_image_urls\" CURSOR WITH HOLD FOR\n", ' ' <repeats 12 times>, "SELECT di.itemId, image_number, filename FROM (SELECT *\n", ' ' <repeats 12 times>, "FROM downloaded_images"..., snapshot = 0x557cc2edf328, crosscheck_snapshot = 0x0, dest = 0x557cc2ed8f60, params = 0x0, queryEnv = 0x0, instrument_options = 0, tupDesc = 0x557cc303bc58, estate = 0x557cc2d07d08, planstate = 0x557cc2d07f68, already_executed = true, totaltime = 0x0 I'll try to make an reproducable code, let me known if you need more information. The query might be a bit nasty, but it shouldn't crash whole server. Best Regards, Tomas Barton -- System Information: Debian Release: 10.10 APT prefers oldstable-updates APT policy: (500, 'oldstable-updates'), (500, 'oldstable') Architecture: amd64 (x86_64) Kernel: Linux 4.19.0-17-amd64 (SMP w/12 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages postgresql-13 depends on: ii debconf [debconf-2.0] 1.5.71 ii libc6 2.28-10 ii libgcc1 1:8.3.0-6 ii libgssapi-krb5-2 1.17-3+deb10u2 ii libicu63 63.1-6+deb10u1 ii libldap-2.4-2 2.4.47+dfsg-3+deb10u6 ii libllvm7 1:7.0.1-8+deb10u2 ii libpam0g 1.3.1-5 ii libpq5 13.4-1.pgdg100+1 ii libselinux1 2.8-1+b1 ii libssl1.1 1.1.1d-0+deb10u7 ii libstdc++6 8.3.0-6 ii libsystemd0 241-7~deb10u8 ii libuuid1 2.33.1-0.1 ii libxml2 2.9.4+dfsg1-7+deb10u2 ii libxslt1.1 1.1.32-2.2~deb10u1 ii locales 2.28-10 ii locales-all 2.28-10 ii postgresql-client-13 13.4-1.pgdg100+1 ii postgresql-common 226.pgdg100+1 ii ssl-cert 1.0.39 ii tzdata 2021a-0+deb10u1 ii zlib1g 1:1.2.11.dfsg-1 Versions of packages postgresql-13 recommends: pn sysstat <none> postgresql-13 suggests no packages. -- debconf information excluded
--- End Message ---
--- Begin Message ---Re: Tomas Barton > I've tried to rule out a possible data corruption issue, server is using > ECC RAM modules, checksums are enabled on database. I've tried rebuilding > indexes, running VACUUM FULL, switching to replica etc. > > Anyway the issues was probably caused by a faulty NVMe adapter: > > kernel: pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, > type=Data Link Layer, (Transmitter ID) > kernel: pcieport 0000:00:01.1: device [8086:1905] error > status/mask=00001000/00002000 > kernel: pcieport 0000:00:01.1: [12] Timeout > kernel: pcieport 0000:00:01.1: Error of this Agent is reported first > kernel: nvme 0000:02:00.0: PCIe Bus Error: severity=Corrected, > type=Physical Layer, (Receiver ID) > kernel: nvme 0000:02:00.0: device [144d:a808] error > status/mask=000000c1/0000e000 > kernel: nvme 0000:02:00.0: [ 0] RxErr (First) > > somehow the database on disk got corrupted and it can't be fixed without > losing some data. > > You can close the ticket, sorry for any inconvenience caused. Hi Tomas, that was actually something I was thinking of as well. Unfortunately, PG sometimes trusts data on disk so much it's segfaulting, but as long as that doesn't happen on proper data, there is little to be fixed. Thanks for the feedback, and hopefully you can recover from backup. Christoph
--- End Message ---

