(Reported-Problem Reproduction) -> (Row Pattern Recognition) 2026년 6월 10일 (수) 오전 11:42, Henson Choi <[email protected]>님이 작성:
> Hi Andres, > > Just to let you know — the CI run for this commitfest entry shows the > same crash independently on master as well, so this may not be an RPR > (Reported-Problem Reproduction) issue specific to the patch. > > The identical crash occurs on a standalone test against master. > > Thanks, > Henson > > 2026년 6월 10일 (수) 오전 11:09, Henson Choi <[email protected]>님이 작성: > >> Hi hackers, >> >> While looking into Andres Freund's note that cfbot is failing with crashes >> inside the JIT on the Row Pattern Recognition patch [1], I found that the >> crash is not specific to that patch at all: on the CI's AddressSanitizer >> build with LLVM 19, any query that is pushed through the LLVM JIT code >> generator crashes the backend with SIGILL. It reproduces on plain master >> with a trivial aggregate, so I am reporting it as its own issue, separate >> from that feature. >> >> Minimal reproduction >> -------------------- >> >> SET jit = on; >> SET jit_above_cost = 0; >> SET jit_optimize_above_cost = 0; >> SET jit_inline_above_cost = 0; >> >> SELECT count(*) >> FROM (SELECT i, i * 2 + 1 AS x >> FROM generate_series(1, 100000) i >> WHERE i % 3 = 0) t; >> >> Result: >> >> server closed the connection unexpectedly >> ... >> LOG: client backend (PID NNNNN) was terminated by signal 4: Illegal >> instruction >> >> A postmaster (forked backend) is required to reproduce reliably; >> single-user >> mode does not trip it. With jit = off the same query runs fine. >> >> Environment >> ----------- >> >> This is the cfbot Linux task environment: >> >> - Debian Trixie, libLLVM 19.1 >> - CFLAGS = -O2 -ggdb -fno-sanitize-recover=all -fsanitize=address >> - LDFLAGS = -fsanitize=address >> - meson: -Dcassert=true -Dinjection_points=true --buildtype=debug >> -Dllvm=enabled (auto_features=disabled) >> >> I reproduced this in a container that mirrors the CI configuration, and >> also >> on a from-scratch build of plain upstream master >> (89eafad297a9b01ad77cfc1ab93a433e0af894b0, "Fix tuple deforming with >> virtual >> generated columns"), which contains no in-flight feature patches. >> >> Backtrace >> --------- >> >> The stack is corrupted at the crash, but with libLLVM debug info the top >> frames resolve consistently to: >> >> Program terminated with signal SIGILL, Illegal instruction. >> #0 getUnsignedFromPrefixEncoding () >> at llvm/include/llvm/Support/Discriminator.h:34 >> #1 decodeDiscriminator () >> at llvm/lib/IR/DebugInfoMetadata.cpp:283 >> >> The crashing rip lands in the middle of a valid instruction >> (decodeDiscriminator+48, the immediate byte of "and $0x1f,%r10d"), i.e. >> the >> libLLVM code itself is intact and control flow was transferred into it at >> a >> bad offset. The crash always lands at the same place, for every >> JIT-compiled >> query, which suggests it is systematic rather than random corruption. It >> surfaces in libLLVM's debug-info (discriminator) handling, and persists >> with >> JIT inlining and optimization both disabled. >> >> Reproducer patch >> ---------------- >> >> The attached patch adds a small "jit_crash" regression test that forces >> the >> JIT compiler (jit on, all jit_*_above_cost set to 0) using a plain >> aggregate >> over generate_series(). On a working installation it passes; on the broken >> LLVM 19 + ASAN environment it crashes as above. I have also registered it >> in >> the commitfest so cfbot exercises it directly. >> >> References >> ---------- >> >> [1] >> https://www.postgresql.org/message-id/p7r5bekdbl2zcazid7agvfo2nfnq5bim2a5jkckqygld32n325%40fctfp6ou6qnb >> >> Thanks, >> Henson Choi >> >
