Re: pg_parse_json() should not leak token copies on failure
On 2024-10-01 Tu 3:04 PM, Jacob Champion wrote: (Tom, thank you for the fixup in 75240f65!) Off-list, Andrew suggested an alternative approach to the one I'd been taking: let the client choose whether it wants to take token ownership to begin with. v3 adds a new flag (and associated API) which will allow libpq to refuse ownership of those tokens. The lexer is then free to clean everything up during parse failures. Usually I'm not a fan of "do the right thing" flags, but in this case, libpq really is the outlier. And it's nice that existing clients (potentially including extensions) don't have to worry about an API change. Yeah. Generally looks good. Should we have a check in setJsonLexContextOwnsTokens() that we haven't started parsing yet, for the incremental case? At the moment, we have a test matrix consisting of "standard frontend" and "shlib frontend" tests for the incremental parser. I'm planning for the v4 patch to extend that with a "owned/not owned" dimension; any objections? Sounds reasonable. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: msys inet_pton strangeness
On 2024-09-30 Mo 11:11 AM, Tom Lane wrote: Andrew Dunstan writes: On 2024-09-30 Mo 10:08 AM, Tom Lane wrote: Not entirely ... if fairywren had been generating that warning all along, I would have noticed it long ago, because I periodically scrape the BF database for compiler warnings. There has to have been some recent change in the system include files. here's what I see on vendikar: Oh, wait, I forgot this is only about the v15 branch. I seldom search for warnings except on HEAD. Still, I'd have expected to notice it while v15 was development tip. Maybe we changed something since then? Anyway, it's pretty moot, I see no reason not to push forward with the proposed fix. Thanks, done. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: msys inet_pton strangeness
On 2024-09-30 Mo 10:08 AM, Tom Lane wrote: Andrew Dunstan writes: Ah, so this is because gcc 14.1.0 treats this as an error but gcc 12.2.0 treats it as a warning. Now it makes sense. Not entirely ... if fairywren had been generating that warning all along, I would have noticed it long ago, because I periodically scrape the BF database for compiler warnings. There has to have been some recent change in the system include files. here's what I see on vendikar: pgbfprod=> select min(snapshot) from build_status_log where log_stage in ('build.log', 'make.log') and branch = 'REL_15_STABLE' and sysname = 'fairywren' and snapshot > now() - interval '1500 days' and log_text ~ 'inet_pton'; min --------- 2022-06-30 18:04:08 (1 row) cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: msys inet_pton strangeness
On 2024-09-29 Su 6:28 PM, Thomas Munro wrote: Just an idea... --- a/src/include/port/win32.h +++ b/src/include/port/win32.h @@ -16,7 +16,7 @@ * get support for GetLocaleInfoEx() with locales. For everything else * the minimum version is Windows XP (0x0501). */ -#if defined(_MSC_VER) && _MSC_VER >= 1900 +#if !defined(_MSC_VER) || _MSC_VER >= 1900 #define MIN_WINNT 0x0600 #else #define MIN_WINNT 0x0501 This seems reasonable as just about the most minimal change we can make work. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: msys inet_pton strangeness
On 2024-09-30 Mo 7:00 AM, Alexander Lakhin wrote: Hello Andrew and Thomas, 29.09.2024 18:47, Andrew Dunstan пишет: I'm inclined to think we might need to reverse the order of the last two. TBH I don't really understand how this has worked up to now. I've looked at the last successful run [1] and discovered that fe-secure-common.c didn't compile cleanly too: ccache gcc ... /home/pgrunner/bf/root/REL_15_STABLE/pgsql.build/../pgsql/src/interfaces/libpq/fe-secure-common.c C:/tools/nmsys64/home/pgrunner/bf/root/REL_15_STABLE/pgsql/src/interfaces/libpq/fe-secure-common.c: In function 'pq_verify_peer_name_matches_certificate_ip': C:/tools/nmsys64/home/pgrunner/bf/root/REL_15_STABLE/pgsql/src/interfaces/libpq/fe-secure-common.c:219:21: warning: implicit declaration of function 'inet_pton'; did you mean 'inet_aton'? [-Wimplicit-function-declaration] 219 | if (inet_pton(AF_INET6, host, &addr) == 1) | ^ | inet_aton So it worked just because that missing declaration generated just a warning, not an error. Ah, so this is because gcc 14.1.0 treats this as an error but gcc 12.2.0 treats it as a warning. Now it makes sense. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: msys inet_pton strangeness
On 2024-09-29 Su 1:00 AM, Alexander Lakhin wrote: Hello Thomas and Andrew, 28.09.2024 23:52, Thomas Munro wrote: On Sun, Sep 29, 2024 at 6:26 AM Andrew Dunstan wrote: We should have included ws2tcpip.h, which includes this: #define InetPtonA inet_pton WINSOCK_API_LINKAGE INT WSAAPI InetPtonA(INT Family, LPCSTR pStringBuf, PVOID pAddr); It's conditioned on (_WIN32_WINNT >= 0x0600), but that should be true. Can you print out the value to be sure? I can't imagine they'd set it lower themselves or make it go backwards in an upgrade, but perhaps it's somehow not being set at all, and then we do: #if defined(_MSC_VER) && _MSC_VER >= 1900 #define MIN_WINNT 0x0600 #else #define MIN_WINNT 0x0501 #endif In 16 we don't do that anymore, we just always set it to 0x0A00 (commit 495ed0ef2d72). And before 15, we didn't want that function yet (commit c1932e542863). FWIW, I'm observing the same here. For a trivial test.c (compiled with the same command line as fe-secure-common.c) like: "===_WIN32" _WIN32; "===_WIN32_WINNT"; _WIN32_WINNT; with gcc -E (from mingw-w64-ucrt-x86_64-gcc 14.2.0-1), I get: "===_WIN32" 1; "===_WIN32_WINNT"; _WIN32_WINNT; That is, _WIN32_WINNT is not defined, but with #include above, I see: "===_WIN32_WINNT"; 0x603 With #include "postgres_fe.h" (as in fe-secure-common.c) I get: "===_WIN32_WINNT"; 0x0501; Yeah, src/include/port/win32/sys/socket.h has: #include #include #include I'm inclined to think we might need to reverse the order of the last two. TBH I don't really understand how this has worked up to now. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: msys inet_pton strangeness
On 2024-09-28 Sa 11:49 AM, Tom Lane wrote: Andrew Dunstan writes: It's complaining like this: C:/tools/xmsys64/home/pgrunner/bf/root/REL_15_STABLE/pgsql/src/interfaces/libpq/fe-secure-common.c:219:21: error: implicit declaration of function 'inet_pton'; did you mean 'inet_aton'? [-Wimplicit-function-declaration] 219 | if (inet_pton(AF_INET6, host, &addr) == 1) | ^ configure has determined that we have inet_pton, and I have repeated the test manually. configure's test is purely a linker test. It does not check to see where/whether the function is declared. Meanwhile, the compiler is complaining that it doesn't see a declaration. So the problem probably can be fixed by adding an #include, but you'll need to figure out what. I see that our other user of inet_pton, fe-secure-openssl.c, has a rather different #include setup than fe-secure-common.c; does it compile OK? I'll try, but this error occurs before we get that far. We should have included ws2tcpip.h, which includes this: #define InetPtonA inet_pton WINSOCK_API_LINKAGE INT WSAAPI InetPtonA(INT Family, LPCSTR pStringBuf, PVOID pAddr); It's conditioned on (_WIN32_WINNT >= 0x0600), but that should be true. So I'm still very confused ;-( cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
msys inet_pton strangeness
A week or so ago I upgraded the msys2 animal fairywren to the latest msys2, and ever since then the build has been failing for Release 15. It's complaining like this: ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -O2 -DFRONTEND -DUNSAFE_STAT_OK -I/home/pgrunner/bf/root/REL_15_STABLE/pgsql.build/../pgsql/src/interfaces/libpq -I../../../src/include -I/home/pgrunner/bf/root/REL_15_STABLE/pgsql.build/../pgsql/src/include -I../pgsql/src/include/port/win32 -I/c/progra~1/openssl-win64/include "-I/home/pgrunner/bf/root/REL_15_STABLE/pgsql.build/../pgsql/src/include/port/win32" -DWIN32_STACK_RLIMIT=4194304 -I../../../src/port -I/home/pgrunner/bf/root/REL_15_STABLE/pgsql.build/../pgsql/src/port -DSO_MAJOR_VERSION=5 -c -o fe-secure-common.o /home/pgrunner/bf/root/REL_15_STABLE/pgsql.build/../pgsql/src/interfaces/libpq/fe-secure-common.c C:/tools/xmsys64/home/pgrunner/bf/root/REL_15_STABLE/pgsql/src/interfaces/libpq/fe-secure-common.c: In function 'pq_verify_peer_name_matches_certificate_ip': C:/tools/xmsys64/home/pgrunner/bf/root/REL_15_STABLE/pgsql/src/interfaces/libpq/fe-secure-common.c:219:21: error: implicit declaration of function 'inet_pton'; did you mean 'inet_aton'? [-Wimplicit-function-declaration] 219 | if (inet_pton(AF_INET6, host, &addr) == 1) | ^ | inet_aton make[3]: *** [: fe-secure-common.o] Error 1 configure has determined that we have inet_pton, and I have repeated the test manually. It's not a ccache issue - I have cleared the cache and the problem persists. The test run by meson on the same animal reports not finding the function. So I'm a bit flummoxed about how to fix this, and would appreciate any suggestions. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: [PATCH] Add native windows on arm64 support
On 2024-09-24 Tu 2:31 PM, Dave Cramer wrote: On Tue, 13 Feb 2024 at 16:28, Dave Cramer wrote: On Tue, 13 Feb 2024 at 12:52, Andres Freund wrote: Hi, On 2024-02-13 12:49:33 -0500, Dave Cramer wrote: > > I think I might have been on to something - if my human emulation of a > > preprocessor isn't wrong, we'd end up with > > > > #define S_UNLOCK(lock) \ > > do { _ReadWriteBarrier(); (*(lock)) = 0; } while (0) > > > > on msvc + arm. And that's entirely insufficient - _ReadWriteBarrier() just > > limits *compiler* level reordering, not CPU level reordering. I think it's > > even insufficient on x86[-64], but it's definitely insufficient on arm. > > > In fact ReadWriteBarrier has been deprecated _ReadWriteBarrier | Microsoft > Learn > <https://learn.microsoft.com/en-us/cpp/intrinsics/readwritebarrier?view=msvc-170> I'd just ignore that, that's just pushing towards more modern stuff that's more applicable to C++ than C. > I did try using atomic_thread_fence as per atomic_thread_fence - > cppreference.com <http://cppreference.com> > <https://en.cppreference.com/w/c/atomic/atomic_thread_fence> The semantics of atomic_thread_fence are, uh, very odd. I'd just use MemoryBarrier(). #defineS_UNLOCK(lock) \ do{ MemoryBarrier(); (*(lock)) =0; } while(0) #endif Has no effect. I have no idea if that is what you meant that I should do ? Dave Revisiting this: Andrew, can you explain the difference between ninja test (which passes) and what the build farm does. The buildfarm crashes. The buildfarm client performs these steps: meson test -C $pgsql --no-rebuild --suite setup meson test -t $meson_test_timeout $jflag -C $pgsql --logbase checklog --print-errorlogs --no-rebuild --suite regress --test-args=--no-locale meson test -t $meson_test_timeout $jflag -C $pgsql --print-errorlogs --no-rebuild --logbase checkworld --no-suite setup --no-suite regress foreach tested locale: meson test -t $meson_test_timeout $jflag -v -C $pgsql --no-rebuild --print-errorlogs --setup running --suite regress-running --logbase regress-installcheck-$locale $pgsql is the build root, $jflag is setting the number of jobs IOW, we do test suite setup, run the core regression tests, run all the remaining non-install tests, then run the install tests for each locale. We don't call ninja directly, but I don't see why that should make a difference. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: SQL:2023 JSON simplified accessor support
On 2024-09-27 Fr 5:49 AM, David E. Wheeler wrote: On Sep 26, 2024, at 16:45, Alexandra Wang wrote: I didn’t run pgindent earlier, so here’s the updated version with the correct indentation. Hope this helps! Oh, nice! I don’t suppose the standard also has defined an operator equivalent to ->>, though, has it? I tend to want the text output far more often than a JSON scalar. That would defeat being able to chain these. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: SQL:2023 JSON simplified accessor support
On 2024-09-26 Th 11:45 AM, Alexandra Wang wrote: Hi, I didn’t run pgindent earlier, so here’s the updated version with the correct indentation. Hope this helps! This is a really nice feature, and provides a lot of expressive power for such a small piece of code. I notice this doesn't seem to work for domains over json and jsonb. andrew@~=# create domain json_d as json; CREATE DOMAIN andrew@~=# create table test_json_dot(id int, test_json json_d); CREATE TABLE andrew@~=# insert into test_json_dot select 1, '{"a": 1, "b": 42}'::json; INSERT 0 1 | | andrew@~=# select (test_json_dot.test_json).b, json_query(test_json, 'lax $.b' WITH CONDITIONAL WRAPPER NULL ON EMPTY NULL ON ERROR) as expected from test_json_dot; ERROR: column notation .b applied to type json_d, which is not a composite type LINE 1: select (test_json_dot.test_json).b, json_query(test_json, 'l... I'm not sure that's a terribly important use case, but we should probably make it work. If it's a domain we should get the basetype of the domain. There's some example code in src/backend/utils/adt/jsonfuncs.c cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: attndims, typndims still not enforced, but make the value within a sane threshold
On 2024-09-20 Fr 12:38 AM, Tom Lane wrote: Michael Paquier writes: On Fri, Sep 20, 2024 at 11:51:49AM +0800, Junwang Zhao wrote: Should you also bump the catalog version? No need to worry about that when sending a patch because committers take care of that when merging a patch into the tree. Doing that in each patch submitted just creates more conflicts and work for patch authors because they'd need to recolve conflicts each time a catversion bump happens. And that can happen on a daily basis sometimes depending on what is committed. Right. Sometimes the committer forgets to do that :-(, which is not great but it's not normally a big problem either. We've concluded it's better to err in that direction than impose additional work on patch submitters. FWIW, I have a git pre-commit hook that helps avoid that. Essentially it checks to see if there are changes in src/include/catalog but not in catversion.h. That's not a 100% check, but it probably catches the vast majority of changes that would require a catversion bump. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: [PATCH] WIP: replace method for jsonpath
On 2024-09-18 We 4:23 AM, Peter Eisentraut wrote: On 17.09.24 21:16, David E. Wheeler wrote: On Sep 17, 2024, at 15:03, Florents Tselai wrote: Fallback scenario: make this an extension, but in a first pass I didn’t find any convenient hooks. One has to create a whole new scanner, grammar etc. Yeah, it got me thinking about the RFC-9535 JSONPath "Function Extension" feature[1], which allows users to add functions. Would be cool to have a way to register jsonpath functions somehow, but I would imagine it’d need quite a bit of specification similar to RFC-9535. Wouldn’t surprise me to see something like that appear in a future version of the spec, with an interface something like CREATE OPERATOR. Why can't we "just" call any suitable pg_proc-registered function from JSON path? The proposed patch routes the example '$.replace("hello","bye")' internally to the internal implementation of the SQL function replace(..., 'hello', 'bye'). Why can't we do this automatically for any function call in a JSON path expression? That might work. The thing that bothers me about the original proposal is this: what if we add a new non-spec jsonpath method and then a new version of the spec adds a method with the same name, but not compatible with our method? We'll be in a nasty place. At the very least I think we need to try hard to avoid that. Maybe we should prefix non-spec method names with "pg_", or maybe use an initial capital letter. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: A starter task
On 2024-09-15 Su 6:17 PM, sia kc wrote: About inlining not sure how it is done with gmail. Maybe should use another email client. Click the three dots with the tooltip "Show trimmed content". Then you can scroll down and put your reply inline. (Personally I detest the Gmail web interface, and use a different MUA, but you can do this even with the Gmail web app.) cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: jsonb_strip_nulls with arrays?
On 2024-09-17 Tu 5:26 AM, Florents Tselai wrote: Currently: |jsonb_strip_nulls| ( |jsonb| ) → |jsonb| Deletes all object fields that have null values from the given JSON value, recursively. Null values that are not object fields are untouched. > Null values that are not object fields are untouched. Can we revisit this and make it work with arrays, too? Tbh, at first sight that looked like the expected behavior for me. That is strip nulls from arrays as well. This has been available since 9.5 and iiuc predates lots of the jsonb array work. I don't think that's a great idea. Removing an object field which has a null value shouldn't have any effect on the surrounding data, nor really any on other operations (If you try to get the value of the missing field it should give you back null). But removing a null array member isn't like that at all - unless it's the trailing member of the array it will renumber all the succeeding array members. And I don't think we should be changing the behaviour of a function, that people might have been relying on for the better part of a decade. In practice, though, whenever jsonb_build_array is used (especially with jsonpath), a few nulls do appear in the resulting array most of the times, Currently, there’s no expressive way to remove this. We could also have jsonb_array_strip_nulls(jsonb) as well We could, if we're going to do anything at all in this area. Another possibility would be to provide a second optional parameter for json{b}_strip_nulls. That's probably a better way to go. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: meson vs windows perl
On 2024-07-30 Tu 3:47 PM, Andrew Dunstan wrote: On 2024-07-20 Sa 9:41 AM, Andrew Dunstan wrote: On 2024-05-28 Tu 6:13 PM, Andres Freund wrote: Hi, On 2024-04-05 16:12:12 -0400, Andrew Dunstan wrote: OK, this has been fixed and checked. The attached is what I propose. The perl command is pretty hard to read. What about using python's shlex module instead? Rough draft attached. Still not very pretty, but seems easier to read? It'd be even better if we could just get perl to print out the flags in an easier to parse way, but I couldn't immediately see a way. Thanks for looking. The attached should be easier to read. The perl package similar to shlex is Text::ParseWords, which is already a requirement. It turns out that shellwords eats backslashes, so we would need something like this version, which I have tested on Windows. I will probably commit this in the next few days unless there's an objection. pushed cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Robocopy might be not robust enough for never-ending testing on Windows
On 2024-09-14 Sa 9:00 AM, Alexander Lakhin wrote: Hello hackers, While trying to reproduce inexplicable drongo failures (e. g., [1]) seemingly related to BackgroundPsql, I stumbled upon close, but not the same issue. After many (6-8 thousands) iterations of the 015_stream.pl TAP test, psql failed to start with a STATUS_DLL_INIT_FAILED error, and a corresponding Windows popup preventing a test exit (see ss-1 in the archive attached). Upon reaching that state of the system, following test runs fail with one or another error related to memory (not only with psql, but also with the server processes): testrun/subscription_13/015_stream/log/015_stream_publisher.log:2024-09-11 20:01:51.777 PDT [8812] LOG: server process (PID 11532) was terminated by exception 0xC0FD testrun/subscription_14/015_stream/log/015_stream_subscriber.log:2024-09-11 20:01:19.806 PDT [2036] LOG: server process (PID 10548) was terminated by exception 0xC0FD testrun/subscription_16/015_stream/log/015_stream_publisher.log:2024-09-11 19:59:41.513 PDT [9128] LOG: server process (PID 14476) was terminated by exception 0xC409 testrun/subscription_19/015_stream/log/015_stream_publisher.log:2024-09-11 20:03:27.801 PDT [10156] LOG: server process (PID 2236) was terminated by exception 0xC409 testrun/subscription_20/015_stream/log/015_stream_publisher.log:2024-09-11 19:59:41.359 PDT [10656] LOG: server process (PID 14712) was terminated by exception 0xC12D testrun/subscription_3/015_stream/log/015_stream_publisher.log:2024-09-11 20:02:23.815 PDT [13704] LOG: server process (PID 13992) was terminated by exception 0xC0FD testrun/subscription_9/015_stream/log/015_stream_publisher.log:2024-09-11 19:59:41.360 PDT [9760] LOG: server process (PID 11608) was terminated by exception 0xC142 ... When tests fail, I see Commit Charge reaching 100% (see ss-2 in the attachment), while Physical Memory isn't all used up. To get OS to a working state, I had to reboot it — killing processes, logoff/logon didn't help. I reproduced this issue again, investigated it and found out that it is caused by robocopy (used by PostgreSQL::Test::Cluster->init), which is leaking kernel objects, namely "Event objects" within Non-Paged pool on each run. This can be seen with Kernel Pool Monitor, or just with this simple PS script: for ($i = 1; $i -le 100; $i++) { echo "iteration $i" rm -r c:\temp\target robocopy.exe /E /NJH /NFL /NDL /NP c:\temp\initdb-template c:\temp\target Get-WmiObject -Class Win32_PerfRawData_PerfOS_Memory | % PoolNonpagedBytes } It shows to me: iteration 1 Total Copied Skipped Mismatch FAILED Extras Dirs : 27 27 0 0 0 0 Files : 968 968 0 0 0 0 Bytes : 38.29 m 38.29 m 0 0 0 0 Times : 0:00:00 0:00:00 0:00:00 0:00:00 ... 1226063872 ... iteration 100 Total Copied Skipped Mismatch FAILED Extras Dirs : 27 27 0 0 0 0 Files : 968 968 0 0 0 0 Bytes : 38.29 m 38.29 m 0 0 0 0 Times : 0:00:00 0:00:00 0:00:00 0:00:00 ... 1245220864 (That is, 0.1-0.2 MB leaks per one robocopy run.) I observed this on Windows 10 (Version 10.0.19045.4780), with all updates installed, but not on Windows Server 2016 (10.0.14393.0). Moreover, using robocopy v14393 on Windows 10 doesn't affect the issue. Perhaps this information can be helpful for someone who is running buildfarm/CI tests on Windows animals... [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-09-11%2007%3A24%3A53 Interesting, good detective work. Still, wouldn't this mean drongo would fail consistently? I wonder why we're using robocopy instead of our own RecursiveCopy module? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Mutable foreign key constraints
On 2024-09-12 Th 5:33 PM, Tom Lane wrote: I happened to notice that Postgres will let you do regression=# create table foo (id timestamp primary key); CREATE TABLE regression=# create table bar (ts timestamptz references foo); CREATE TABLE This strikes me as a pretty bad idea, because whether a particular timestamp is equal to a particular timestamptz depends on your timezone setting. Thus the constraint could appear to be violated after a timezone change. I'm inclined to propose rejecting FK constraints if the comparison operator is not immutable. Among the built-in opclasses, the only instances of non-immutable btree equality operators are regression=# select amopopr::regoperator from pg_amop join pg_operator o on o.oid = amopopr join pg_proc p on p.oid = oprcode where amopmethod=403 and amopstrategy=3 and provolatile != 'i'; amopopr - =(date,timestamp with time zone) =(timestamp without time zone,timestamp with time zone) =(timestamp with time zone,date) =(timestamp with time zone,timestamp without time zone) (4 rows) A possible objection is that if anybody has such a setup and hasn't noticed a problem because they never change their timezone setting, they might not appreciate us breaking it. So I certainly wouldn't propose back-patching this. But maybe we should add it as a foot-gun defense going forward. Thoughts? Isn't there an upgrade hazard here? People won't thank us if they can't now upgrade their clusters. If we can get around that then +1. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Jargon and acronyms on this mailing list
On 2024-09-11 We 8:34 AM, Robert Haas wrote: On Mon, Sep 9, 2024 at 3:54 PM Andrew Dunstan wrote: There are some serious obstacles to changing it all over, though. I don't want to rewrite all the history, for example. Because of the way git works, that really wouldn't be an issue. We'd just push the tip of the master branch to main and then start committing to main and delete master. The history wouldn't change at all, because in git, a branch is really just a movable pointer to a commit. The commits themselves don't know that they're part of a branch. A lot of things would break, naturally. We'd still all have master branches in our local repositories and somebody might accidentally try to push one of those branches back to the upstream repository and the buildfarm and lots of other tooling would get confused and it would all be a mess for a while, but the history itself would not change. I think you misunderstood me. I wasn't referring to the git history, but the buildfarm history. Anyway, I think what I have done should suffice. You should no longer see the name HEAD on the buildfarm server, although it will continue to exists in the database. Incidentally, I wrote a blog post about changing the client default name some years ago: <http://adpgtech.blogspot.com/2021/06/buildfarm-adopts-modern-git-naming.html> I also have scripting to do the git server changes (basically to set its default branch), although it's rather github-specific. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Jargon and acronyms on this mailing list
On 2024-09-09 Mo 3:54 PM, Andrew Dunstan wrote: On 2024-09-09 Mo 1:19 PM, Robert Haas wrote: On Mon, Sep 9, 2024 at 1:03 PM Andrew Dunstan wrote: I guess I could try to write code to migrate everything, but it would be somewhat fragile. And what would we do if we ever decided to migrate "master" to another name like "main"? I do at least have code ready for that eventuality, but it would (currently) still keep the visible name of HEAD. Personally, I think using HEAD to mean master is really confusing. In git, master is a branch name, and HEAD is the tip of some branch, or the random commit you've checked out that isn't even a branch. I know that's not how it worked in CVS, but CVS was a very long time ago. If we rename master to main or devel or something, we'll have to adjust the way we speak again, but that's not a reason to keep using the wrong terminology for the way things are now. There are some serious obstacles to changing it all over, though. I don't want to rewrite all the history, for example. What we could do relatively simply is change what is seen publicly. e.g. we could rewrite the status page to read "Branch: master". We could also change URLs we generate to use master instead of HEAD (and change it back when processing the URLs. And so on. I've done this. Nothing in the client or the database has changed, but the fact that we refer to "master" as "HEAD" is pretty much hidden now from the web app and the emails it sends out. That should help lessen any confusion in casual viewers. Comments welcome. I don't think I have missed anything but it's always possible. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Jargon and acronyms on this mailing list
On 2024-09-09 Mo 1:19 PM, Robert Haas wrote: On Mon, Sep 9, 2024 at 1:03 PM Andrew Dunstan wrote: I guess I could try to write code to migrate everything, but it would be somewhat fragile. And what would we do if we ever decided to migrate "master" to another name like "main"? I do at least have code ready for that eventuality, but it would (currently) still keep the visible name of HEAD. Personally, I think using HEAD to mean master is really confusing. In git, master is a branch name, and HEAD is the tip of some branch, or the random commit you've checked out that isn't even a branch. I know that's not how it worked in CVS, but CVS was a very long time ago. If we rename master to main or devel or something, we'll have to adjust the way we speak again, but that's not a reason to keep using the wrong terminology for the way things are now. There are some serious obstacles to changing it all over, though. I don't want to rewrite all the history, for example. What we could do relatively simply is change what is seen publicly. e.g. we could rewrite the status page to read "Branch: master". We could also change URLs we generate to use master instead of HEAD (and change it back when processing the URLs. And so on. Changing things on the client side would be far more complicated and difficult. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Jargon and acronyms on this mailing list
On 2024-09-08 Su 11:35 PM, Tom Lane wrote: David Rowley writes: I think HEAD is commonly misused to mean master instead of the latest commit of the current branch. I see the buildfarm even does that. Thanks for getting that right in your blog post. IIRC, HEAD *was* the technically correct term back when we were using CVS. Old habits die hard. Yeah. The reason we kept doing it that way in the buildfarm was that for a period we actually had some animals using CVS and some using that new-fangled git thing. I guess I could try to write code to migrate everything, but it would be somewhat fragile. And what would we do if we ever decided to migrate "master" to another name like "main"? I do at least have code ready for that eventuality, but it would (currently) still keep the visible name of HEAD. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: json_query conditional wrapper bug
> On Sep 5, 2024, at 11:51 AM, Peter Eisentraut wrote: > > On 05.09.24 17:01, Andrew Dunstan wrote: >>> On 2024-09-04 We 4:10 PM, Andrew Dunstan wrote: >>> >>> On 2024-09-04 We 6:16 AM, Peter Eisentraut wrote: >>>> On 28.08.24 11:21, Peter Eisentraut wrote: >>>>> These are ok: >>>>> >>>>> select json_query('{"a": 1, "b": 42}'::jsonb, 'lax $.b' without wrapper); >>>>> json_query >>>>> >>>>> 42 >>>>> >>>>> select json_query('{"a": 1, "b": 42}'::jsonb, 'lax $.b' with >>>>> unconditional wrapper); >>>>> json_query >>>>> >>>>> [42] >>>>> >>>>> But this appears to be wrong: >>>>> >>>>> select json_query('{"a": 1, "b": 42}'::jsonb, 'lax $.b' with conditional >>>>> wrapper); >>>>> json_query >>>>> >>>>> [42] >>>>> >>>>> This should return an unwrapped 42. >>>> >>>> If I make the code change illustrated in the attached patch, then I get >>>> the correct result here. And various regression test results change, >>>> which, to me, all look more correct after this patch. I don't know what >>>> the code I removed was supposed to accomplish, but it seems to be wrong >>>> somehow. In the current implementation, the WITH CONDITIONAL WRAPPER >>>> clause doesn't appear to work correctly in any case I could identify. >>> >>> >>> Agree the code definitely looks wrong. If anything the test should probably >>> be reversed: >>> >>> wrap = count > 1 || !( >>> IsAJsonbScalar(singleton) || >>> (singleton->type == jbvBinary && >>> JsonContainerIsScalar(singleton->val.binary.data))); >>> >>> i.e. in the count = 1 case wrap unless it's a scalar or a binary wrapping a >>> scalar. The code could do with a comment about the logic. >>> >>> I know we're very close to release but we should fix this as it's a new >>> feature. >> I thought about this again. >> I don't know what the spec says, > > Here is the relevant bit: > > a) Case: > i) If the length of SEQ is 0 (zero), then let WRAPIT be False. > NOTE 479 — This ensures that the ON EMPTY behavior supersedes the WRAPPER > behavior. > ii) If WRAPPER is WITHOUT ARRAY, then let WRAPIT be False. > iii) If WRAPPER is WITH UNCONDITIONAL ARRAY, then let WRAPIT be True. > iv) If WRAPPER is WITH CONDITIONAL ARRAY, then > Case: > 1) If SEQ has a single SQL/JSON item, then let WRAPIT be False. > 2) Otherwise, let WRAPIT be True. > > > but the Oracle docs say:> >>Specify |WITH| |CONDITIONAL| |WRAPPER| to include the array wrapper >>only if the path expression matches a single scalar value or >>multiple values of any type. If the path expression matches a single >>JSON object or JSON array, then the array wrapper is omitted. >> So I now think the code that's there now is actually correct, and what you >> say appears wrong is also correct. > > I tested the above test expressions as well as the regression test case > against Oracle and it agrees with my solution. So it seems to me that this > piece of documentation is wrong. Oh, odd. Then assuming a scalar is an SQL/JSON item your patch appears correct. Cheers Andrew
Re: json_query conditional wrapper bug
On 2024-09-04 We 4:10 PM, Andrew Dunstan wrote: On 2024-09-04 We 6:16 AM, Peter Eisentraut wrote: On 28.08.24 11:21, Peter Eisentraut wrote: These are ok: select json_query('{"a": 1, "b": 42}'::jsonb, 'lax $.b' without wrapper); json_query 42 select json_query('{"a": 1, "b": 42}'::jsonb, 'lax $.b' with unconditional wrapper); json_query [42] But this appears to be wrong: select json_query('{"a": 1, "b": 42}'::jsonb, 'lax $.b' with conditional wrapper); json_query [42] This should return an unwrapped 42. If I make the code change illustrated in the attached patch, then I get the correct result here. And various regression test results change, which, to me, all look more correct after this patch. I don't know what the code I removed was supposed to accomplish, but it seems to be wrong somehow. In the current implementation, the WITH CONDITIONAL WRAPPER clause doesn't appear to work correctly in any case I could identify. Agree the code definitely looks wrong. If anything the test should probably be reversed: wrap = count > 1 || !( IsAJsonbScalar(singleton) || (singleton->type == jbvBinary && JsonContainerIsScalar(singleton->val.binary.data))); i.e. in the count = 1 case wrap unless it's a scalar or a binary wrapping a scalar. The code could do with a comment about the logic. I know we're very close to release but we should fix this as it's a new feature. I thought about this again. I don't know what the spec says, but the Oracle docs say: Specify |WITH| |CONDITIONAL| |WRAPPER| to include the array wrapper only if the path expression matches a single scalar value or multiple values of any type. If the path expression matches a single JSON object or JSON array, then the array wrapper is omitted. So I now think the code that's there now is actually correct, and what you say appears wrong is also correct. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: json_query conditional wrapper bug
On 2024-09-04 We 6:16 AM, Peter Eisentraut wrote: On 28.08.24 11:21, Peter Eisentraut wrote: These are ok: select json_query('{"a": 1, "b": 42}'::jsonb, 'lax $.b' without wrapper); json_query 42 select json_query('{"a": 1, "b": 42}'::jsonb, 'lax $.b' with unconditional wrapper); json_query [42] But this appears to be wrong: select json_query('{"a": 1, "b": 42}'::jsonb, 'lax $.b' with conditional wrapper); json_query [42] This should return an unwrapped 42. If I make the code change illustrated in the attached patch, then I get the correct result here. And various regression test results change, which, to me, all look more correct after this patch. I don't know what the code I removed was supposed to accomplish, but it seems to be wrong somehow. In the current implementation, the WITH CONDITIONAL WRAPPER clause doesn't appear to work correctly in any case I could identify. Agree the code definitely looks wrong. If anything the test should probably be reversed: wrap = count > 1 || !( IsAJsonbScalar(singleton) || (singleton->type == jbvBinary && JsonContainerIsScalar(singleton->val.binary.data))); i.e. in the count = 1 case wrap unless it's a scalar or a binary wrapping a scalar. The code could do with a comment about the logic. I know we're very close to release but we should fix this as it's a new feature. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Improving tracking/processing of buildfarm test failures
On 2024-09-01 Su 2:46 PM, sia kc wrote: Hello everyone I am a developer interested in this project. Had a little involvement with MariaDB and now I like to work on Postgres. Never worked with mailing lists so I am not sure if this is the way I should interact. Liked to be pointed to some tasks and documents to get started. Do you mean you want to be involved with $subject, or that you just want to be involved in Postgres development generally? If the latter, then replying to a specific email thread is not the way to go, and the first thing to do is look at this wiki page <https://wiki.postgresql.org/wiki/Developer_FAQ> cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: pl/pgperl Patch for adding $_FN detail just like triggers have for $_TD
On 2024-08-30 Fr 3:49 PM, Andrew Dunstan wrote: Sorry for empty reply. Errant finger hit send. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
On 2024-08-29 Th 4:44 PM, Jacob Champion wrote: As for the other patches, I'll ping Andrew about 0001, Patch 0001 looks sane to me. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: pl/pgperl Patch for adding $_FN detail just like triggers have for $_TD
On 2024-08-29 Th 5:50 PM, Mark Murawski wrote: On 8/29/24 16:54, Andrew Dunstan wrote: On 2024-08-29 Th 1:01 PM, Mark Murawski wrote: On 8/29/24 11:56, Andrew Dunstan wrote: On 2024-08-28 We 5:53 PM, Mark Murawski wrote: Hi Hackers! This would be version v1 of this feature Basically, the subject says it all: pl/pgperl Patch for being able to tell which function you're in. This is a hashref so it will be possible to populate new and exciting other details in the future as the need arises This also greatly improves logging capabilities for things like catching warnings, Because as it stands right now, there's no information that can assist with locating the source of a warning like this: # tail -f /var/log/postgresql.log *** GOT A WARNING - Use of uninitialized value $prefix in concatenation (.) or string at (eval 531) line 48. Now, with $_FN you can do this: CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE plperlu AS $function$ use warnings; use strict; use Data::Dumper; $SIG{__WARN__} = sub { elog(NOTICE, Dumper($_FN)); print STDERR "In Function: $_FN->{name}: $_[0]\n"; }; my $a; print "$a"; # uninit! return undef; $function$ ; This patch is against 12 which is still our production branch. This could easily be also patched against newer releases as well. I've been using this code in production now for about 3 years, it's greatly helped track down issues. And there shouldn't be anything platform-specific here, it's all regular perl API I'm not sure about adding testing. This is my first postgres patch, so any guidance on adding regression testing would be appreciated. The rationale for this has come from the need to know the source function name, and we've typically resorted to things like this in the past: CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE plperlu AS $function$ my $function_name = 'throw_warning'; $SIG{__WARN__} = sub { print STDERR "In Function: $function_name: $_[0]\n"; } $function$ ; We've literally had to copy/paste this all over and it's something that postgres should just 'give you' since it knows the name already, just like when triggers pass you $_TD with all the pertinent information A wishlist item would be for postgres plperl to automatically prepend the function name and schema when throwing perl warnings so you don't have to do your own __WARN__ handler, but this is the next best thing. I'm not necessarily opposed to this, but the analogy to $_TD isn't really apt. You can't know the trigger data at compile time, whereas you can know the function's name at compile time, using just the mechanism you find irksome. And if we're going to do it for plperl, shouldn't we do it for other PLs? Hi Andrew, Thanks for the feedback. 1) Why is this not similar to _TD? It literally operates identically. At run-time it passes you $_TD for triggers. Same her for functions. This is all run-time. What exactly is the issue you're trying to point out? It's not the same as the trigger data case because the function name is knowable at compile time, as in fact you have demonstrated. You just find it a bit inconvenient to have to code for that knowledge. By contrast, trigger data is ONLY knowable at run time. But I don't see that it is such a heavy burden to have to write my $funcname = "foo"; or similar in your function. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com Understood, regarding knowability. Trigger data is definitely going to be very dynamic in that regard. No, It's not a heavy burden to hard code the function name. But what my ideal goal would be is this: CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE plperlu AS $function$ use 'PostgresWarnHandler'; # <--- imagine this registers a WARN handler and outputs $_FN->{name} for you as part of the final warning my $a; print $a; etc and then there's nothing 'hard coded' regarding the name of the function, anywhere. It just seems nonsensical that postgres plperl can't send you the name of your registered function and there's absolutely no way to get it other than hard coding the name (redundantly, considering you're already know the name when you're defining the function in the first place) But even better would be catching the warn at the plperl level, prepending the function name to the warn message, and then you only need: CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE plperlu AS $function$ my $a; print $a; etc And then in this hypothetical situation -- magic ensues, and you're left with this: # tail -f /var/log/postgresql.log *** GOT A WARNING - Use of uninitialized value $a in concatenation (.) or string in function public.throw_warning() line 1 -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: pl/pgperl Patch for adding $_FN detail just like triggers have for $_TD
On 2024-08-29 Th 1:01 PM, Mark Murawski wrote: On 8/29/24 11:56, Andrew Dunstan wrote: On 2024-08-28 We 5:53 PM, Mark Murawski wrote: Hi Hackers! This would be version v1 of this feature Basically, the subject says it all: pl/pgperl Patch for being able to tell which function you're in. This is a hashref so it will be possible to populate new and exciting other details in the future as the need arises This also greatly improves logging capabilities for things like catching warnings, Because as it stands right now, there's no information that can assist with locating the source of a warning like this: # tail -f /var/log/postgresql.log *** GOT A WARNING - Use of uninitialized value $prefix in concatenation (.) or string at (eval 531) line 48. Now, with $_FN you can do this: CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE plperlu AS $function$ use warnings; use strict; use Data::Dumper; $SIG{__WARN__} = sub { elog(NOTICE, Dumper($_FN)); print STDERR "In Function: $_FN->{name}: $_[0]\n"; }; my $a; print "$a"; # uninit! return undef; $function$ ; This patch is against 12 which is still our production branch. This could easily be also patched against newer releases as well. I've been using this code in production now for about 3 years, it's greatly helped track down issues. And there shouldn't be anything platform-specific here, it's all regular perl API I'm not sure about adding testing. This is my first postgres patch, so any guidance on adding regression testing would be appreciated. The rationale for this has come from the need to know the source function name, and we've typically resorted to things like this in the past: CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE plperlu AS $function$ my $function_name = 'throw_warning'; $SIG{__WARN__} = sub { print STDERR "In Function: $function_name: $_[0]\n"; } $function$ ; We've literally had to copy/paste this all over and it's something that postgres should just 'give you' since it knows the name already, just like when triggers pass you $_TD with all the pertinent information A wishlist item would be for postgres plperl to automatically prepend the function name and schema when throwing perl warnings so you don't have to do your own __WARN__ handler, but this is the next best thing. I'm not necessarily opposed to this, but the analogy to $_TD isn't really apt. You can't know the trigger data at compile time, whereas you can know the function's name at compile time, using just the mechanism you find irksome. And if we're going to do it for plperl, shouldn't we do it for other PLs? Hi Andrew, Thanks for the feedback. 1) Why is this not similar to _TD? It literally operates identically. At run-time it passes you $_TD for triggers. Same her for functions. This is all run-time. What exactly is the issue you're trying to point out? It's not the same as the trigger data case because the function name is knowable at compile time, as in fact you have demonstrated. You just find it a bit inconvenient to have to code for that knowledge. By contrast, trigger data is ONLY knowable at run time. But I don't see that it is such a heavy burden to have to write my $funcname = "foo"; or similar in your function. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: pl/pgperl Patch for adding $_FN detail just like triggers have for $_TD
On 2024-08-28 We 5:53 PM, Mark Murawski wrote: Hi Hackers! This would be version v1 of this feature Basically, the subject says it all: pl/pgperl Patch for being able to tell which function you're in. This is a hashref so it will be possible to populate new and exciting other details in the future as the need arises This also greatly improves logging capabilities for things like catching warnings, Because as it stands right now, there's no information that can assist with locating the source of a warning like this: # tail -f /var/log/postgresql.log *** GOT A WARNING - Use of uninitialized value $prefix in concatenation (.) or string at (eval 531) line 48. Now, with $_FN you can do this: CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE plperlu AS $function$ use warnings; use strict; use Data::Dumper; $SIG{__WARN__} = sub { elog(NOTICE, Dumper($_FN)); print STDERR "In Function: $_FN->{name}: $_[0]\n"; }; my $a; print "$a"; # uninit! return undef; $function$ ; This patch is against 12 which is still our production branch. This could easily be also patched against newer releases as well. I've been using this code in production now for about 3 years, it's greatly helped track down issues. And there shouldn't be anything platform-specific here, it's all regular perl API I'm not sure about adding testing. This is my first postgres patch, so any guidance on adding regression testing would be appreciated. The rationale for this has come from the need to know the source function name, and we've typically resorted to things like this in the past: CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE plperlu AS $function$ my $function_name = 'throw_warning'; $SIG{__WARN__} = sub { print STDERR "In Function: $function_name: $_[0]\n"; } $function$ ; We've literally had to copy/paste this all over and it's something that postgres should just 'give you' since it knows the name already, just like when triggers pass you $_TD with all the pertinent information A wishlist item would be for postgres plperl to automatically prepend the function name and schema when throwing perl warnings so you don't have to do your own __WARN__ handler, but this is the next best thing. I'm not necessarily opposed to this, but the analogy to $_TD isn't really apt. You can't know the trigger data at compile time, whereas you can know the function's name at compile time, using just the mechanism you find irksome. And if we're going to do it for plperl, shouldn't we do it for other PLs? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Index AM API cleanup
On 2024-08-22 Th 2:28 PM, Mark Dilger wrote: On Aug 22, 2024, at 1:36 AM, Alexandra Wang wrote: I had to make some changes to the first two patches in order to run "make check" and compile the treeb code on my machine. I’ve attached my changes. Thank you for the review, and the patches! "make installcheck" for treeb is causing issues on my end. I can investigate further if it’s not a problem for others. The test module index AMs are not intended for use in any installed database, so 'make installcheck' is unnecessary. A mere 'make check' should suffice. However, if you want to run it, you can install the modules, edit postgresql.conf to add 'treeb' to shared_preload_libraries, restart the server, and run 'make installcheck'. This is necessary for 'treeb' because it requests shared memory, and that needs to be done at startup. The v18 patch set includes the changes your patches suggest, though I modified the approach a bit. Specifically, rather than standardizing on '1.0.0' for the module versions, as your patches do, I went with '1.0', as is standard in other modules in neighboring directories. The '1.0.0' numbering was something I had been using in versions 1..16 of this patch, and I only partially converted to '1.0' before posting v17, so sorry about that. The v18 patch also has some whitespace fixes. To address your comments about the noise in the test failures, v18 modifies the clone_tests.pl script to do a little more work translating the expected output to expect the module's AM name ("xash", "xtree", "treeb", or whatnot) beyond what that script did in v17. Small addition to your clone script, taking into account the existence of alternative result files: diff --git a/src/tools/clone_tests.pl b/src/tools/clone_tests.pl index c1c50ad90b..d041f93f87 100755 --- a/src/tools/clone_tests.pl +++ b/src/tools/clone_tests.pl @@ -183,6 +183,12 @@ foreach my $regress (@regress) push (@dst_regress, "$dst_sql/$regress.sql"); push (@src_expected, "$src_expected/$regress.out"); push (@dst_expected, "$dst_expected/$regress.out"); + foreach my $alt (1,2) + { + next unless -f "$src_expected/${regress}_$alt.out"; + push (@src_expected, "$src_expected/${regress}_$alt.out"); + push (@dst_expected, "$dst_expected/${regress}_$alt.out"); + } } my @isolation = grep /\w+/, split(/\s+/, $isolation); foreach my $isolation (@isolation) cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Index AM API cleanup
On 2024-08-21 We 4:09 PM, Mark Dilger wrote: On Aug 21, 2024, at 12:34 PM, Kirill Reshke wrote: Hi! Why is the patch attached as .tar.bz2? Usually raw patches are sent here... I worried the patch set, being greater than 1 MB, might bounce or be held up in moderation. Yes, it would have required moderation AIUI. It is not at all unprecedented to send a compressed tar of patches, and is explicitly provided for by the cfbot: see <https://wiki.postgresql.org/wiki/Cfbot#Which_attachments_are_considered_to_be_patches.3F> cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: why is pg_upgrade's regression run so slow?
On 2024-08-19 Mo 8:00 AM, Alexander Lakhin wrote: Hello Andrew, 29.07.2024 13:54, Andrew Dunstan wrote: On 2024-07-29 Mo 4:00 AM, Alexander Lakhin wrote: And another interesting fact is that TEMP_CONFIG is apparently ignored by `meson test regress/regress`. For example, with temp.config containing invalid settings, `meson test pg_upgrade/002_pg_upgrade` fails, but `meson test regress/regress` passes just fine. Well, that last fact explains the discrepancy I originally complained about. How the heck did that happen? It looks like we just ignored its use in Makefile.global.in :-( Please look at the attached patch (partially based on ideas from [1]) for meson.build, that aligns it with `make` in regard to use of TEMP_CONFIG. Maybe it could be implemented via a separate meson option, but that would also require modifying at least the buildfarm client... [1] https://www.postgresql.org/message-id/CAN55FZ304Kp%2B510-iU5-Nx6hh32ny9jgGn%2BOB5uqPExEMK1gQQ%40mail.gmail.com I think this is the way to go. The patch LGTM. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: The xversion-upgrade test fails to stop server
On 2024-08-18 Su 12:00 PM, Alexander Lakhin wrote: Hello Andrew, 04.06.2024 14:56, Andrew Dunstan wrote: but I can't see ../snapshot_too_old/output_iso/regression.diff and .../snapshot_too_old/output_iso/log/postmaster.log in the log. will do. I've discovered a couple of other failures where the interesting log files are not saved. [1] has only inst/logfile saved and that's not enough for TAP tests in src/test/modules. I've emulated the failure (not trying to guess the real cause) with: --- a/src/test/modules/test_custom_rmgrs/Makefile +++ b/src/test/modules/test_custom_rmgrs/Makefile @@ -14,0 +15,4 @@ TAP_TESTS = 1 +remove: + make uninstall +install: remove and saw mostly the same with the buildfarm client REL_17. I've tried the following addition to run_build.pl: @@ -2194,6 +2194,8 @@ sub make_testmodules_install_check my @logs = glob("$pgsql/src/test/modules/*/regression.diffs"); push(@logs, "inst/logfile"); $log->add_log($_) foreach (@logs); + @logs = glob("$pgsql/src/test/modules/*/tmp_check/log/*"); + $log->add_log($_) foreach (@logs); and it works as expected for me. The output of another failure ([2]) contains only: \342\226\266 1/1 + partition_prune 3736 ms FAIL but no regression.diffs (Fortunately, in this particular case I found "FATAL: incorrect checksum in control file" in inst/logfile, so that can explain the failure.) Probably, run_build.pl needs something like: @@ -1848,7 +1848,7 @@ sub make_install_check @checklog = run_log("cd $pgsql/src/test/regress && $make $chktarget"); } my $status = $? >> 8; - my @logfiles = ("$pgsql/src/test/regress/regression.diffs", "inst/logfile"); + my @logfiles = ("$pgsql/src/test/regress/regression.diffs", "$pgsql/testrun/regress-running/regress/regression.diffs", "inst/logfile"); Thanks, I have added these tweaks. A similar failure have occurred today: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=opaleye&dt=2024-06-08%2001%3A41%3A41 So maybe it would make sense to increase default PGCTLTIMEOUT for buildfarm clients, say, to 180 seconds? Sure. For now I have added it to the config on crake, but we can make it a default. By the way, opaleye failed once again (with 120 seconds timeout): https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=opaleye&dt=2024-08-13%2002%3A04%3A07 [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gull&dt=2024-08-06%2014%3A56%3A39 [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2024-08-17%2001%3A12%3A50 Yeah. In the next release the default will increase to 180. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Recent 027_streaming_regress.pl hangs
On 2024-08-11 Su 8:32 PM, Tom Lane wrote: Andrew Dunstan writes: We'll see. I have switched crake from --run-parallel mode to --run-all mode i.e. the runs are serialized. Maybe that will be enough to stop the errors. I'm still annoyed that this test is susceptible to load, if that is indeed what is the issue. crake is still timing out intermittently on 027_streaming_regress.pl, so that wasn't it. I think we need more data. We know that the wait_for_catchup query is never getting to true: SELECT '$target_lsn' <= ${mode}_lsn AND state = 'streaming' but we don't know if the LSN condition or the state condition is what is failing. And if it is the LSN condition, it'd be good to see the actual last LSN, so we can look for patterns like whether there is a page boundary crossing involved. So I suggest adding something like the attached. If we do this, I'd be inclined to instrument wait_for_slot_catchup and wait_for_subscription_sync similarly, but I thought I'd check for contrary opinions first. Seems reasonable. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: PG_TEST_EXTRA and meson
On 2024-08-09 Fr 5:26 AM, Ashutosh Bapat wrote: I this the patch lacks overriding PG_TEST_EXTRA at run time. AFAIU, following was expected behaviour from both meson and make. Please correct if I am wrong. 1. If PG_TEST_EXTRA is set at the setup/configuration time, it is not required to be set at run time. 2. Runtime PG_TEST_EXTRA always overrides configure time PG_TEST_EXTRA. Yes, that's my understanding of the requirement. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Windows default locale vs initdb
On 2024-08-08 Th 4:08 AM, Ertan Küçükoglu wrote: I already installed Visual Studio 2022 with C++ support as suggested in https://www.postgresql.org/docs/current/install-windows-full.html I cloned codes in the system. But, I cannot find any "src/tools/msvc" directory. It is missing. Document states I need everything in there "The tools for building using Visual C++ or Platform SDK are in the src\tools\msvc directory." It seems I will need help setting up the build environment. I am willing to be a tester for Windows given I could get help setting up the build environment. It also feels documentation needs some update as I failed to find necessary files. If you're trying to build the master branch those documents no longer apply. You will need to build using meson, as documented here: <https://www.postgresql.org/docs/17/install-meson.html> cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Add trim_trailing_whitespace to editorconfig file
On 2024-08-07 We 4:42 PM, Jelte Fennema-Nio wrote: On Wed, 7 Aug 2024 at 21:09, Andrew Dunstan wrote: You're not meant to use our pg_bsd_indent on its own without the appropriate flags, namely (from src/tools/pgindent/pgindent): Ah sorry, I wasn't clear in what I meant then. I meant that if you look at the sources of pg_bsd_indent (such as src/tools/pg_bsd_indent/io.c) then you'll realize that comments are alligned using tabs of width 8, not tabs of width 4. And right now .editorconfig configures editors to show all .c files with a tab_width of 4, because we use that for Postgres source files. The bottom .gitattributes line now results in an editorconfig rule that sets a tab_width of 8 for just the c and h files in src/tools/pg_bsd_indent directory. Ah, OK. Yeah, that makes sense. Also, why are you proposing to undet indent-style for .pl and .pm files? That's not in accordance with our perltidy settings (src/tools/pgindent/perltidyrc), unless I'm misunderstanding. All the way at the bottom of the .editorconfig file those "ident_style = unset" lines are overridden to be "tab" for .pl and .pm files. There's a comment there explaining why it's done that way. # We want editors to use tabs for indenting Perl files, but we cannot add it # such a rule to .gitattributes, because certain lines are still indented with # spaces (e.g. SYNOPSIS blocks). [*.{pl,pm}] indent_style = tab But now thinking about this again after your comment, I realize it's just as easy and effective to change the script slightly to hardcode the indent_style for "*.pl" and "*.pm" so that the resulting .editorconfig looks less confusing. Attached is a patch that does that. OK, good, thanks. I also added a .gitattributes rule for .py files, and changed the default tab_width to unset. Because I realized the resulting .editorconfig was using tab_width 8 for python files when editing src/tools/generate_editorconfig.py sounds good. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Add trim_trailing_whitespace to editorconfig file
On 2024-08-07 We 1:09 PM, Jelte Fennema-Nio wrote: On Tue, 9 Apr 2024 at 12:42, Jelte Fennema-Nio wrote: Okay, I spent the time to add a script to generate the editorconfig based on .gitattributes after all. So attached is a patch that adds that. I would love to see this patch merged (or at least some feedback on the latest version). I think it's pretty trivial and really low risk of breaking anyone's workflow, and it would *significantly* improve my own workflow. Matthias mentioned on Discord that our vendored in pg_bsd_indent uses a tabwidth of 8 and that was showing up ugly in his editor. I updated the patch to include a fix for that too. You're not meant to use our pg_bsd_indent on its own without the appropriate flags, namely (from src/tools/pgindent/pgindent): "-bad -bap -bbb -bc -bl -cli1 -cp33 -cdb -nce -d0 -di12 -nfc1 -i4 -l79 -lp -lpl -nip -npro -sac -tpg -ts4" If that's inconvenient you can create a .indent.pro with the settings. Also, why are you proposing to undet indent-style for .pl and .pm files? That's not in accordance with our perltidy settings (src/tools/pgindent/perltidyrc), unless I'm misunderstanding. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Official devcontainer config
On 2024-08-03 Sa 10:13 PM, Junwang Zhao wrote: On Sat, Aug 3, 2024 at 7:30 PM Andrew Dunstan wrote: On 2024-08-02 Fr 2:45 PM, Peter Eisentraut wrote: On 01.08.24 23:38, Andrew Dunstan wrote: Not totally opposed, and I will probably give it a try very soon, but I'm wondering if this really needs to go in the core repo. We've generally shied away from doing much in the way of editor / devenv support, trying to be fairly agnostic. It's true we carry .dir-locals.el and .editorconfig, so that's not entirely true, but those are really just about supporting our indentation etc. standards. Yeah, the editor support in the tree ought to be minimal and factual, based on coding standards and widely recognized best practices, not a collection of one person's favorite aliases and scripts. If the scripts are good, let's look at them and maybe put them under src/tools/ for everyone to use. But a lot of this looks like it will requite active maintenance if output formats or node formats or build targets etc. change. And other things require specific local paths. That's fine for a local script or something, but not for a mainline tool that the community will need to maintain. I suggest to start with a very minimal configuration. What are the settings that absolute everyone will need, maybe to set indentation style or something. I believe you can get VS Code to support editorconfig, so from that POV maybe we don't need to do anything. I did try yesterday with the code from the OP's patch symlinked into my repo, but got an error with the Docker build, which kinda reinforces your point. The reason symlink does not work is that configure_vscode needs to copy launch.json and tasks.json into .vscode, it has to be in the WORKDIR/.devcontainer. That's kind of awful. Anyway, I think we don't need to do anything about ignoring those. The user should simply add entries for them to .git/info/exclude or their local global exclude file (I have core.excludesfile = /home/andrew/.gitignore set.) I was eventually able to get it to work without using a symlink. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Official devcontainer config
On 2024-08-02 Fr 2:45 PM, Peter Eisentraut wrote: On 01.08.24 23:38, Andrew Dunstan wrote: Not totally opposed, and I will probably give it a try very soon, but I'm wondering if this really needs to go in the core repo. We've generally shied away from doing much in the way of editor / devenv support, trying to be fairly agnostic. It's true we carry .dir-locals.el and .editorconfig, so that's not entirely true, but those are really just about supporting our indentation etc. standards. Yeah, the editor support in the tree ought to be minimal and factual, based on coding standards and widely recognized best practices, not a collection of one person's favorite aliases and scripts. If the scripts are good, let's look at them and maybe put them under src/tools/ for everyone to use. But a lot of this looks like it will requite active maintenance if output formats or node formats or build targets etc. change. And other things require specific local paths. That's fine for a local script or something, but not for a mainline tool that the community will need to maintain. I suggest to start with a very minimal configuration. What are the settings that absolute everyone will need, maybe to set indentation style or something. I believe you can get VS Code to support editorconfig, so from that POV maybe we don't need to do anything. I did try yesterday with the code from the OP's patch symlinked into my repo, but got an error with the Docker build, which kinda reinforces your point. Your point about "one person's preferences" is well taken - some of the git aliases supplied clash with mine. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Official devcontainer config
On 2024-08-01 Th 10:56 AM, Junwang Zhao wrote: Stack Overflow 2024 developer survey[1] said VSCode is the most used development environment. In a PostgreSQL Hacker Mentoring discussion, we talked about how to use vscode to debug and running postgres, Andrey(ccd) has tons of tips for new developers, and I post my daily used devcontainer config[2] , Jelte(ccd) suggested that it might be a good idea we integrate the config into postgres repo so that the barrier to entry for new developers will be much lower. **Note** This is not intended to change the workflow of experienced hackers, it is just hoping to make the life easier for beginner developers. **How to use** Open VSCode Command Palette(cmd/ctrl + shift + p), search devcontainer, then choose something like `Dev containers: Rebuild and Reopen in Container`, you are good to go. **About the patch** devcontainer.json: The .devcontainer/devcontainer.json is the entry point for VSCode to *open folder to develop in a container*, it will build the docker image for the first time you open in container, this will take some time. There are some parameters(runArgs) for running the container, we need some settings and privileges to run perf or generate core dumps. It has a mount point mapping the hosts $HOME/freedom to container's /opt/freedom, I chose the name *freedom* because the container is kind of a jail. It also installed some vscode extensions and did some customizations. After diving into the container, the postCreateCommand.sh will be automatically called, it will do some configurations like git, perf, .vscode, core_pattern, etc. It also downloads michaelpq's pg_plugins and FlameGraph. Dockerfile: It is based on debian bookworm, it installed dependencies to build postgres, also IPC::Run to run TAP tests I guess. It also has a .gdbinit to break elog.c:errfinish for elevel 21, this will make the debugging easier why error is logged. gdbpg.py is adapted from https://github.com/tvondra/gdbpg, I think putting it here will make it evolve as time goes. tasks.json: This is kind of like a bookkeeping for developers, it has the following commands: - meson debug setup - meson release setup - ninja build - regression tests - ninja install - init cluster - start cluster - stop cluster - install pg_bsd_indent - pgindent - apply patch - generate flamegraph launch.json: It has one configuration that makes it possible to attach to one process(e.g. postgres backend) and debug with vscode. PFA and please give it a try if you are a VSCode user. [1]: https://survey.stackoverflow.co/2024/technology#1-integrated-development-environment [2]: https://github.com/atomicdb/devcontainer/tree/main/postgres Not totally opposed, and I will probably give it a try very soon, but I'm wondering if this really needs to go in the core repo. We've generally shied away from doing much in the way of editor / devenv support, trying to be fairly agnostic. It's true we carry .dir-locals.el and .editorconfig, so that's not entirely true, but those are really just about supporting our indentation etc. standards. Also, might it not be better for this to be carried in a separate repo maintained by people using vscode? I don't know how may committers do. Maybe lots do, and I'm just a dinosaur. If vscode really needs .devcontainer to live in the code root, maybe it could be symlinked. Another reason not carry the code ourselves is that it will make it less convenient for people who want to customize it. Without having tried it, looks like a nice effort though. Well done. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Improving tracking/processing of buildfarm test failures
On 2024-08-01 Th 5:00 AM, Alexander Lakhin wrote: I also wrote a simple script (see attached) to check for unknown buildfarm failures using "HTML API", to make sure no failures missed. Surely, it could be improved in many ways, but I find it rather useful as-is. I think we can improve on that. Scraping HTML is not a terribly efficient way of doing it. I'd very much like to improve the reporting side of the server. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: can we mark upper/lower/textlike functions leakproof?
On 2024-07-31 We 2:43 PM, Joe Conway wrote: On 7/31/24 14:03, Tom Lane wrote: Robert Haas writes: If there are some inputs that cause upper() and lower() to fail and others that do not, the functions aren't leakproof, because an attacker can extract information about values that they can't see by feeding those values into these functions and seeing whether they get a failure or not. [ rather exhaustive analysis redacted ] First, thanks you very much, Robert for the analysis. So in summary, I think upper() is ... pretty close to leakproof. But if ICU sometimes fails on certain strings, then it isn't. And if the multi-byte libc path can be made to fail reliably either with really long strings or with certain choices of the LC_CTYPE locale, then it isn't. The problem here is that marking these functions leakproof is a promise about a *whole bunch* of code, much of it not under our control; worse, there's no reason to think all that code is stable. A large fraction of it didn't even exist a few versions ago. Even if we could convince ourselves that the possible issues Robert mentions aren't real at the moment, I think marking these leakproof is mighty risky. It's unlikely we'd remember to revisit the marking the next time someone drops a bunch of new code in here. I still maintain that there is a whole host of users that would accept the risk of side channel attacks via existence of an error or not, if they could only be sure nothing sensitive leaks directly into the logs or to the clients. We should give them that choice. As I meant to say in my previous empty reply, I think your suggestions make lots of sense. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Recent 027_streaming_regress.pl hangs
On 2024-07-31 We 12:05 PM, Tom Lane wrote: Andrew Dunstan writes: There seem to be a bunch of recent failures, and not just on crake, and not just on HEAD: <https://buildfarm.postgresql.org/cgi-bin/show_failures.pl?max_days=14&member=&stage=recoveryCheck&filter=Submit> There were a batch of recovery-stage failures ending about 9 days ago caused by instability of the new 043_vacuum_horizon_floor.pl test. Once you take those out it doesn't look quite so bad. We'll see. I have switched crake from --run-parallel mode to --run-all mode i.e. the runs are serialized. Maybe that will be enough to stop the errors. I'm still annoyed that this test is susceptible to load, if that is indeed what is the issue. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Recent 027_streaming_regress.pl hangs
On 2024-07-25 Th 6:33 PM, Andrew Dunstan wrote: On 2024-07-25 Th 5:14 PM, Tom Lane wrote: I wrote: I'm confused by crake's buildfarm logs. AFAICS it is not running recovery-check at all in most of the runs; at least there is no mention of that step, for example here: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2024-07-25%2013%3A27%3A02 Oh, I see it: the log file that is called recovery-check in a failing run is called misc-check if successful. That seems mighty bizarre, and it's not how my own animals behave. Something weird about the meson code path, perhaps? Yes, it was discussed some time ago. As suggested by Andres, we run the meson test suite more or less all together (in effect like "make checkworld" but without the main regression suite, which is run on its own first), rather than in the separate (and serialized) way we do with the configure/make tests. That results in significant speedup. If the tests fail we report the failure as happening at the first failure we encounter, which is possibly less than ideal, but I haven't got a better idea. Anyway, in this successful run: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=crake&dt=2024-07-25%2018%3A57%3A02&stg=misc-check here are some salient test timings: 1/297 postgresql:pg_upgrade / pg_upgrade/001_basic OK0.18s 9 subtests passed 2/297 postgresql:pg_upgrade / pg_upgrade/003_logical_slots OK 15.95s 12 subtests passed 3/297 postgresql:pg_upgrade / pg_upgrade/004_subscription OK 16.29s 14 subtests passed 17/297 postgresql:isolation / isolation/isolation OK 71.60s 119 subtests passed 41/297 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK 169.13s 18 subtests passed 140/297 postgresql:initdb / initdb/001_initdb OK 41.34s 52 subtests passed 170/297 postgresql:recovery / recovery/027_stream_regress OK 469.49s 9 subtests passed while in the next, failing run https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=crake&dt=2024-07-25%2020%3A18%3A05&stg=recovery-check the same tests took: 1/297 postgresql:pg_upgrade / pg_upgrade/001_basic OK0.22s 9 subtests passed 2/297 postgresql:pg_upgrade / pg_upgrade/003_logical_slots OK 56.62s 12 subtests passed 3/297 postgresql:pg_upgrade / pg_upgrade/004_subscription OK 71.92s 14 subtests passed 21/297 postgresql:isolation / isolation/isolation OK 299.12s 119 subtests passed 31/297 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK 344.42s 18 subtests passed 159/297 postgresql:initdb / initdb/001_initdb OK 344.46s 52 subtests passed 162/297 postgresql:recovery / recovery/027_stream_regress ERROR 840.84s exit status 29 Based on this, it seems fairly likely that crake is simply timing out as a consequence of intermittent heavy background activity. The latest failure is this: Waiting for replication conn standby_1's replay_lsn to pass 2/88E4E260 on primary [16:40:29.481](208.545s) # poll_query_until timed out executing this query: # SELECT '2/88E4E260' <= replay_lsn AND state = 'streaming' # FROM pg_catalog.pg_stat_replication # WHERE application_name IN ('standby_1', 'walreceiver') # expecting this output: # t # last actual query output: # f # with stderr: timed out waiting for catchup at /home/andrew/bf/root/HEAD/pgsql/src/test/recovery/t/027_stream_regress.pl line 103. Maybe it's a case where the system is overloaded, I dunno. I wouldn't bet my house on it. Pretty much nothing else runs on this machine. I have added a mild throttle to the buildfarm config so it doesn't try to run every branch at once. Maybe I also need to bring down the number or meson jobs too? But I suspect there's something deeper. Prior to the failure of this test 10 days ago it hadn't failed in a very long time. The OS was upgraded a month ago. Eight or so days ago I changed PG_TEST_EXTRA. I can't think of anything else. There seem to be a bunch of recent failures, and not just on crake, and not just on HEAD: <https://buildfarm.postgresql.org/cgi-bin/show_failures.pl?max_days=14&member=&stage=recoveryCheck&filter=Submit> cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: can we mark upper/lower/textlike functions leakproof?
On 2024-07-31 We 9:14 AM, Joe Conway wrote: On 7/31/24 05:47, Andrew Dunstan wrote: On 2024-07-30 Tu 6:51 PM, David Rowley wrote: On Wed, 31 Jul 2024 at 09:35, Andrew Dunstan wrote: Fast forward to now. The customer has found no observable ill effects of marking these functions leakproof. The would like to know if there is any reason why we can't mark them leakproof, so that they don't have to do this in every database of every cluster they use. Thoughts? According to [1], it's just not been done yet due to concerns about risk to reward ratios. Nobody mentioned any reason why it couldn't be, but there were some fears that future code changes could yield new failure paths. David [1]https://postgr.es/m/02BDFCCF-BDBB-4658-9717-4D95F9A91561%40thebuild.com Hmm, somehow I missed that thread in searching, and clearly I'd forgotten it. Still, I'm not terribly convinced by arguments along the lines you're suggesting. "Sufficient unto the day is the evil thereof." Maybe we need a test to make sure we don't make changes along those lines, although I have no idea what such a test would look like. I think I have expressed this opinion before (which was shot down), and I will grant that it is hand-wavy, but I will give it another try. In my opinion, for this use case and others, it should be possible to redact the values substituted into log messages based on some criteria. One of those criteria could be "I am in a leakproof call right now". In fact in a similar fashion, an extension ought to be able to mutate the log message based on the entire string, e.g. when "ALTER ROLE...PASSWORD..." is spotted I would like to be able to redact everything after "PASSWORD". Yes it might render the error message unhelpful, but I know of users that would accept that tradeoff in order to get better performance and security on their production workloads. Or in some cases (e.g. PASSWORD) just better security. -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: can we mark upper/lower/textlike functions leakproof?
On 2024-07-30 Tu 6:51 PM, David Rowley wrote: On Wed, 31 Jul 2024 at 09:35, Andrew Dunstan wrote: Fast forward to now. The customer has found no observable ill effects of marking these functions leakproof. The would like to know if there is any reason why we can't mark them leakproof, so that they don't have to do this in every database of every cluster they use. Thoughts? According to [1], it's just not been done yet due to concerns about risk to reward ratios. Nobody mentioned any reason why it couldn't be, but there were some fears that future code changes could yield new failure paths. David [1]https://postgr.es/m/02BDFCCF-BDBB-4658-9717-4D95F9A91561%40thebuild.com Hmm, somehow I missed that thread in searching, and clearly I'd forgotten it. Still, I'm not terribly convinced by arguments along the lines you're suggesting. "Sufficient unto the day is the evil thereof." Maybe we need a test to make sure we don't make changes along those lines, although I have no idea what such a test would look like. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
can we mark upper/lower/textlike functions leakproof?
Several years ago, an EDB customer complained that if they used a functional index involving upper(), lower(), or textlike(), where RLS was involved, the indexes were not used because these functions are not marked leakproof. We presented the customer with several options for addressing the problem, the simplest of which was simply to mark the functions as leakproof, and this was the solution they adopted. The consensus of discussion at the time among our senior developers was that there was probably no reason why at least upper() and lower() should not be marked leakproof, and quite possibly initcap() and textlike() also. It was suggested that we had not been terribly rigorous in assessing whether or not functions can be considered leakproof. At the time we should have asked the community about it, but we didn't. Fast forward to now. The customer has found no observable ill effects of marking these functions leakproof. The would like to know if there is any reason why we can't mark them leakproof, so that they don't have to do this in every database of every cluster they use. Thoughts? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: meson vs windows perl
On 2024-07-20 Sa 9:41 AM, Andrew Dunstan wrote: On 2024-05-28 Tu 6:13 PM, Andres Freund wrote: Hi, On 2024-04-05 16:12:12 -0400, Andrew Dunstan wrote: OK, this has been fixed and checked. The attached is what I propose. The perl command is pretty hard to read. What about using python's shlex module instead? Rough draft attached. Still not very pretty, but seems easier to read? It'd be even better if we could just get perl to print out the flags in an easier to parse way, but I couldn't immediately see a way. Thanks for looking. The attached should be easier to read. The perl package similar to shlex is Text::ParseWords, which is already a requirement. It turns out that shellwords eats backslashes, so we would need something like this version, which I have tested on Windows. I will probably commit this in the next few days unless there's an objection. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com diff --git a/meson.build b/meson.build index 7de0371226..dfcfcfc15e 100644 --- a/meson.build +++ b/meson.build @@ -1065,20 +1065,19 @@ if not perlopt.disabled() # Config's ccdlflags and ldflags. (Those are the choices of those who # built the Perl installation, which are not necessarily appropriate # for building PostgreSQL.) -ldopts = run_command(perl, '-MExtUtils::Embed', '-e', 'ldopts', check: true).stdout().strip() -undesired = run_command(perl_conf_cmd, 'ccdlflags', check: true).stdout().split() -undesired += run_command(perl_conf_cmd, 'ldflags', check: true).stdout().split() +perl_ldopts = run_command(perl, '-e', ''' +use ExtUtils::Embed; +use Text::ParseWords; +# tell perl to suppress including these in ldopts +*ExtUtils::Embed::_ldflags =*ExtUtils::Embed::_ccdlflags = sub { return ""; }; +# adding an argument to ldopts makes it return a value instead of printing +# print one of these per line so splitting will preserve spaces in file names. +# shellwords eats backslashes, so we need to escape them. +(my $opts = ldopts(undef)) =~ s!\\!!g; +print "$_\n" foreach shellwords($opts); +''', + check: true).stdout().strip().split('\n') -perl_ldopts = [] -foreach ldopt : ldopts.split(' ') - if ldopt == '' or ldopt in undesired -continue - endif - - perl_ldopts += ldopt.strip('"') -endforeach - -message('LDFLAGS recommended by perl: "@0@"'.format(ldopts)) message('LDFLAGS for embedding perl: "@0@"'.format(' '.join(perl_ldopts))) perl_dep_int = declare_dependency(
Re: jsonpath: Inconsistency of timestamp_tz() Output
On 2024-07-22 Mo 3:12 AM, Jeevan Chalke wrote: On Fri, Jul 19, 2024 at 7:35 PM David E. Wheeler wrote: On Jul 10, 2024, at 11:19, David E. Wheeler wrote: > Oh, and the time and date were wrong, too, because I blindly used the same conversion for dates as for timestamps. Fixed in v2. > > PR: https://github.com/theory/postgres/pull/7 > CF: https://commitfest.postgresql.org/49/5119/ Rebase on 5784a49. No other changes. I would consider this a bug in features added for 17. I agree with David that we need to set the tz explicitly as the JsonbValue struct maintains that separately. However, in the attached version, I have added some comments and also, fixed some indentation. I have pushed this. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: xid_wraparound tests intermittent failure.
On 2024-07-29 Mo 5:25 PM, Masahiko Sawada wrote: I've attached the patch. Could you please test if the patch fixes the instability you observed? Since we turn off autovacuum on all three tests and we wait for autovacuum to complete processing databases, these tests potentially have a similar (but lower) risk. So I modified these tests to turn it on so we can ensure the autovacuum runs periodically. I assume you actually meant to remove the "autovacuum = off" in 003_wraparound.pl. With that change in your patch I retried my test, but on iteration 100 out of 100 it failed on test 002_limits.pl. I think we need to remove the "autovacuum = off' also in 002_limits.pl as it waits for autovacuum to process both template0 and template1 databases. Just to be clear, the failure happened even without "autovacuum = off"? The attached patch, a slight modification of yours, removes "autovacuum = off" for all three tests, and given that a set of 200 runs was clean for me. Oh I missed that I left "autovacuum = off' for some reason in 002 test. Thank you for attaching the patch, it looks good to me. Thanks, pushed. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: why is pg_upgrade's regression run so slow?
On 2024-07-29 Mo 4:00 AM, Alexander Lakhin wrote: Hello Andrew, 28.07.2024 22:43, Andrew Dunstan wrote: Maybe, IDK. Meanwhile, I disabled "debug_parallel_query = regress" on HEAD for fairywren and drongo - fairwren has just gone green, and I expect drongo will when it reports in a few hours. I'm at a loss for an explanation. I'm observing the same here (using a Windows 10 VM). With no TEMP_CONFIG set, `meson test` gives me these numbers: test: postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade duration: 72.34s test: postgresql:regress / regress/regress duration: 41.98s But with debug_parallel_query=regress in TEMP_CONFIG, I see: test: postgresql:regress / regress/regress duration: 50.08s test: postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade duration: 398.45s With log_min_messages=DEBUG2 added to TEMP_CONFIG, I can see how many parallel workers were started during the test: ...\postgresql\build>grep "starting background worker process" -r testrun/pg_upgrade | wc -l 17532 And another interesting fact is that TEMP_CONFIG is apparently ignored by `meson test regress/regress`. For example, with temp.config containing invalid settings, `meson test pg_upgrade/002_pg_upgrade` fails, but `meson test regress/regress` passes just fine. Well, that last fact explains the discrepancy I originally complained about. How the heck did that happen? It looks like we just ignored its use in Makefile.global.in :-( cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: why is pg_upgrade's regression run so slow?
On 2024-07-27 Sa 6:48 PM, Tom Lane wrote: Andrew Dunstan writes: On 2024-07-27 Sa 10:20 AM, Tom Lane wrote: Just to add some more fuel to the fire: I do *not* observe such an effect on my own animals. The culprit appears to be meson. When I tested running crake with "using_meson => 0" I got results in line with yours. Interesting. Maybe meson is over-aggressively trying to run these test suites in parallel? Maybe, IDK. Meanwhile, I disabled "debug_parallel_query = regress" on HEAD for fairywren and drongo - fairwren has just gone green, and I expect drongo will when it reports in a few hours. I'm at a loss for an explanation. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: why is pg_upgrade's regression run so slow?
On 2024-07-27 Sa 10:20 AM, Tom Lane wrote: Andrew Dunstan writes: As part of its 002_pgupgrade.pl, the pg_upgrade tests do a rerun of the normal regression tests. That in itself is sad, but what concerns me here is why it's so much slower than the regular run? This is apparent everywhere (e.g. on crake the standard run takes about 30 to 90 s, but pg_upgrade's run takes 5 minutes or more). Just to add some more fuel to the fire: I do *not* observe such an effect on my own animals. For instance, sifaka's last run shows 00:09 for install-check-C and the same (within rounding error) for the regression test step in 002_pgupgrade; on the much slower mamba, the last run took 07:33 for install-check-C and 07:40 for the 002_pgupgrade regression test step. I'm still using the makefile-based build, and I'm not using debug_parallel_query, but it doesn't make a lot of sense to me that either of those things should affect this. Looking at crake's last run, there are other oddities: why does the "check" step take 00:24 while "install-check-en_US.utf8" (which should be doing strictly less work) takes 01:00? Again, there are not similar discrepancies on my animals. Are you sure there's not background load on the machine? Quite sure. Running crake and koel all it does. It's Fedora 40 running on bare metal, a Lenovo Y70 with an Intel Core i7-4720HQ CPU @ 2.60GHz and a Samsung SSD. The culprit appears to be meson. When I tested running crake with "using_meson => 0" I got results in line with yours. The regression test times were consistent, including the installcheck tests. That's especially ugly on Windows as we don't have any alternative way of running, and the results are so much more catastrophic. "debug_parallel_query" didn't seem to matter. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: xid_wraparound tests intermittent failure.
On 2024-07-26 Fr 1:46 PM, Masahiko Sawada wrote: On Thu, Jul 25, 2024 at 6:52 PM Andrew Dunstan wrote: On 2024-07-25 Th 3:40 PM, Masahiko Sawada wrote: On Thu, Jul 25, 2024 at 11:06 AM Masahiko Sawada wrote: On Thu, Jul 25, 2024 at 10:56 AM Andrew Dunstan wrote: On 2024-07-23 Tu 6:59 PM, Masahiko Sawada wrote: See <https://bitbucket.org/adunstan/rotfang-fdw/downloads/xid-wraparound-result.tar.bz2> The failure logs are from a run where both tests 1 and 2 failed. Thank you for sharing the logs. I think that the problem seems to match what Alexander Lakhin mentioned[1]. Probably we can fix such a race condition somehow but I'm not sure it's worth it as setting autovacuum = off and autovacuum_max_workers = 1 (or a low number) is an extremely rare case. I think it would be better to stabilize these tests. One idea is to turn the autovacuum GUC parameter on while setting autovacuum_enabled = off for each table. That way, we can ensure that autovacuum workers are launched. And I think it seems to align real use cases. Regards, [1] https://www.postgresql.org/message-id/02373ec3-50c6-df5a-0d65-5b9b1c0c86d6%40gmail.com OK, do you want to propose a patch? Yes, I'll prepare and share it soon. I've attached the patch. Could you please test if the patch fixes the instability you observed? Since we turn off autovacuum on all three tests and we wait for autovacuum to complete processing databases, these tests potentially have a similar (but lower) risk. So I modified these tests to turn it on so we can ensure the autovacuum runs periodically. I assume you actually meant to remove the "autovacuum = off" in 003_wraparound.pl. With that change in your patch I retried my test, but on iteration 100 out of 100 it failed on test 002_limits.pl. I think we need to remove the "autovacuum = off' also in 002_limits.pl as it waits for autovacuum to process both template0 and template1 databases. Just to be clear, the failure happened even without "autovacuum = off"? The attached patch, a slight modification of yours, removes "autovacuum = off" for all three tests, and given that a set of 200 runs was clean for me. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com diff --git a/src/test/modules/xid_wraparound/t/001_emergency_vacuum.pl b/src/test/modules/xid_wraparound/t/001_emergency_vacuum.pl index 37550b67a4..2692b35f34 100644 --- a/src/test/modules/xid_wraparound/t/001_emergency_vacuum.pl +++ b/src/test/modules/xid_wraparound/t/001_emergency_vacuum.pl @@ -18,7 +18,6 @@ my $node = PostgreSQL::Test::Cluster->new('main'); $node->init; $node->append_conf( 'postgresql.conf', qq[ -autovacuum = off # run autovacuum only when to anti wraparound autovacuum_naptime = 1s # so it's easier to verify the order of operations autovacuum_max_workers = 1 @@ -27,23 +26,25 @@ log_autovacuum_min_duration = 0 $node->start; $node->safe_psql('postgres', 'CREATE EXTENSION xid_wraparound'); -# Create tables for a few different test scenarios +# Create tables for a few different test scenarios. We disable autovacuum +# on these tables to run it only to prevent wraparound. $node->safe_psql( 'postgres', qq[ -CREATE TABLE large(id serial primary key, data text, filler text default repeat(random()::text, 10)); +CREATE TABLE large(id serial primary key, data text, filler text default repeat(random()::text, 10)) + WITH (autovacuum_enabled = off); INSERT INTO large(data) SELECT generate_series(1,3); -CREATE TABLE large_trunc(id serial primary key, data text, filler text default repeat(random()::text, 10)); +CREATE TABLE large_trunc(id serial primary key, data text, filler text default repeat(random()::text, 10)) + WITH (autovacuum_enabled = off); INSERT INTO large_trunc(data) SELECT generate_series(1,3); -CREATE TABLE small(id serial primary key, data text, filler text default repeat(random()::text, 10)); +CREATE TABLE small(id serial primary key, data text, filler text default repeat(random()::text, 10)) + WITH (autovacuum_enabled = off); INSERT INTO small(data) SELECT generate_series(1,15000); -CREATE TABLE small_trunc(id serial primary key, data text, filler text default repeat(random()::text, 10)); +CREATE TABLE small_trunc(id serial primary key, data text, filler text default repeat(random()::text, 10)) + WITH (autovacuum_enabled = off); INSERT INTO small_trunc(data) SELECT generate_series(1,15000); - -CREATE TABLE autovacuum_disabled(id serial primary key, data text) WITH (autovacuum_enabled=false); -INSERT INTO autovacuum_disabled(data) SELECT generate_series(1,1000); ]); # Bump the query timeout to avoid false negatives on slow test systems. @@ -63,7 +64,6 @@ $background_psql->query_safe( DELETE FROM large_trunc WHERE id > 1; DELETE FROM small WHERE id % 2 = 0; DELETE FROM small_trunc WHERE id > 1000;
why is pg_upgrade's regression run so slow?
As part of its 002_pgupgrade.pl, the pg_upgrade tests do a rerun of the normal regression tests. That in itself is sad, but what concerns me here is why it's so much slower than the regular run? This is apparent everywhere (e.g. on crake the standard run takes about 30 to 90 s, but pg_upgrade's run takes 5 minutes or more). On Windows, it's catastrophic, and only hasn't been noticed because the buildfarm client wasn't counting a timeout as a failure. That was an error on my part and I have switched a few of my machines to code that checks more robustly for failure of meson tests - specifically by looking for the absence of test.success rather than the presence of test.fail. That means that drongo and fairywren are getting timeout errors. e.g. on the latest run on fairywren, the regular regression run took 226s, but pg_upgrade's run of what should be the same set of tests took 2418s. What the heck is going on here? Is it because there are the concurrent tests running? That doesn't seem enough to make the tests run more than 10 times as long. I have a strong suspicion this is exacerbated by "debug_parallel_query = regress", especially since the tests run much faster on REL_17_STABLE where I am not setting that, but that can't be the whole explanation, since that setting should apply to both sets of tests. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Extension using Meson as build system
On 2024-07-26 Fr 11:06 AM, Peter Eisentraut wrote: On 30.06.24 15:17, Junwang Zhao wrote: Is there any extension that uses meson as build systems? I'm starting a extension project that written in c++, cmake is my initial choice as the build system, but since PostgreSQL has adopted Meson, so I'm wondering if there is any extension that also uses meson that I can reference. I wrote a little demo: https://github.com/petere/plsh/blob/meson/meson.build nifty! cheers andew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: fairywren timeout failures on the pg_amcheck/005_opclass_damage test
On 2024-07-26 Fr 7:00 AM, Alexander Lakhin wrote: 25.07.2024 15:00, Alexander Lakhin wrote: The other question is: why is 005_opclass_damage taking so much time there? ... $ make -s check -C src/bin/pg_amcheck/ PROVE_TESTS="t/005*" PROVE_FLAGS="--timer" [11:11:53] t/005_opclass_damage.pl .. ok 1370 ms ( 0.00 usr 0.00 sys + 0.10 cusr 0.07 csys = 0.17 CPU) $ echo "debug_parallel_query = regress" >/tmp/extra.config $ TEMP_CONFIG=/tmp/extra.config make -s check -C src/bin/pg_amcheck/ PROVE_TESTS="t/005*" PROVE_FLAGS="--timer" [11:12:46] t/005_opclass_damage.pl .. ok 40854 ms ( 0.00 usr 0.00 sys + 0.10 cusr 0.10 csys = 0.20 CPU) ... So maybe at least this test should be improved for testing with debug_parallel_query enabled, if such active use of parallel workers by pg_amcheck can't be an issue to end users? When running this test with "log_min_messages = DEBUG2" in my extra.config, I see thousands of the following messages in the test log: 2024-07-26 09:32:54.544 UTC [2572189:46] DEBUG: postmaster received pmsignal signal 2024-07-26 09:32:54.544 UTC [2572189:47] DEBUG: registering background worker "parallel worker for PID 2572197" 2024-07-26 09:32:54.544 UTC [2572189:48] DEBUG: starting background worker process "parallel worker for PID 2572197" 2024-07-26 09:32:54.547 UTC [2572189:49] DEBUG: unregistering background worker "parallel worker for PID 2572197" 2024-07-26 09:32:54.547 UTC [2572189:50] DEBUG: background worker "parallel worker" (PID 2572205) exited with exit code 0 2024-07-26 09:32:54.547 UTC [2572189:51] DEBUG: postmaster received pmsignal signal 2024-07-26 09:32:54.547 UTC [2572189:52] DEBUG: registering background worker "parallel worker for PID 2572197" 2024-07-26 09:32:54.547 UTC [2572189:53] DEBUG: starting background worker process "parallel worker for PID 2572197" 2024-07-26 09:32:54.549 UTC [2572189:54] DEBUG: unregistering background worker "parallel worker for PID 2572197" 2024-07-26 09:32:54.549 UTC [2572189:55] DEBUG: background worker "parallel worker" (PID 2572206) exited with exit code 0 ... grep ' registering background worker' \ src/bin/pg_amcheck/tmp_check/log/005_opclass_damage_test.log | wc -l 15669 So this test launches more than 15000 processes when debug_parallel_query is enabled. As far as I can see, this is happening because of the "PARALLEL UNSAFE" marking is ignored when the function is called by CREATE INDEX/amcheck. Namely, with a function defined as: CREATE FUNCTION int4_asc_cmp (a int4, b int4) RETURNS int LANGUAGE sql AS $$ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN 1 ELSE -1 END; $$; SELECT int4_asc_cmp(1, 2); executed without parallel workers. Whilst when it's used by an index: CREATE OPERATOR CLASS int4_fickle_ops FOR TYPE int4 USING btree AS ... OPERATOR 5 > (int4, int4), FUNCTION 1 int4_asc_cmp(int4, int4); INSERT INTO int4tbl (SELECT * FROM generate_series(1,1000) gs); CREATE INDEX fickleidx ON int4tbl USING btree (i int4_fickle_ops); launches 1000 parallel workers. (This is reminiscent of bug #18314.) One way to workaround this is to disable debug_parallel_query in the test and another I find possible is to set max_parallel_workers = 0. But wouldn't either of those just be masking the problem? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: xid_wraparound tests intermittent failure.
On 2024-07-25 Th 3:40 PM, Masahiko Sawada wrote: On Thu, Jul 25, 2024 at 11:06 AM Masahiko Sawada wrote: On Thu, Jul 25, 2024 at 10:56 AM Andrew Dunstan wrote: On 2024-07-23 Tu 6:59 PM, Masahiko Sawada wrote: See<https://bitbucket.org/adunstan/rotfang-fdw/downloads/xid-wraparound-result.tar.bz2> The failure logs are from a run where both tests 1 and 2 failed. Thank you for sharing the logs. I think that the problem seems to match what Alexander Lakhin mentioned[1]. Probably we can fix such a race condition somehow but I'm not sure it's worth it as setting autovacuum = off and autovacuum_max_workers = 1 (or a low number) is an extremely rare case. I think it would be better to stabilize these tests. One idea is to turn the autovacuum GUC parameter on while setting autovacuum_enabled = off for each table. That way, we can ensure that autovacuum workers are launched. And I think it seems to align real use cases. Regards, [1]https://www.postgresql.org/message-id/02373ec3-50c6-df5a-0d65-5b9b1c0c86d6%40gmail.com OK, do you want to propose a patch? Yes, I'll prepare and share it soon. I've attached the patch. Could you please test if the patch fixes the instability you observed? Since we turn off autovacuum on all three tests and we wait for autovacuum to complete processing databases, these tests potentially have a similar (but lower) risk. So I modified these tests to turn it on so we can ensure the autovacuum runs periodically. I assume you actually meant to remove the "autovacuum = off" in 003_wraparound.pl. With that change in your patch I retried my test, but on iteration 100 out of 100 it failed on test 002_limits.pl. You can see the logs at <https://f001.backblazeb2.com/file/net-dunslane-public/002_limits-failure-log.tar.bz2> cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Recent 027_streaming_regress.pl hangs
On 2024-07-25 Th 5:14 PM, Tom Lane wrote: I wrote: I'm confused by crake's buildfarm logs. AFAICS it is not running recovery-check at all in most of the runs; at least there is no mention of that step, for example here: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2024-07-25%2013%3A27%3A02 Oh, I see it: the log file that is called recovery-check in a failing run is called misc-check if successful. That seems mighty bizarre, and it's not how my own animals behave. Something weird about the meson code path, perhaps? Yes, it was discussed some time ago. As suggested by Andres, we run the meson test suite more or less all together (in effect like "make checkworld" but without the main regression suite, which is run on its own first), rather than in the separate (and serialized) way we do with the configure/make tests. That results in significant speedup. If the tests fail we report the failure as happening at the first failure we encounter, which is possibly less than ideal, but I haven't got a better idea. Anyway, in this successful run: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=crake&dt=2024-07-25%2018%3A57%3A02&stg=misc-check here are some salient test timings: 1/297 postgresql:pg_upgrade / pg_upgrade/001_basic OK0.18s 9 subtests passed 2/297 postgresql:pg_upgrade / pg_upgrade/003_logical_slots OK 15.95s 12 subtests passed 3/297 postgresql:pg_upgrade / pg_upgrade/004_subscription OK 16.29s 14 subtests passed 17/297 postgresql:isolation / isolation/isolation OK 71.60s 119 subtests passed 41/297 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK 169.13s 18 subtests passed 140/297 postgresql:initdb / initdb/001_initdb OK 41.34s 52 subtests passed 170/297 postgresql:recovery / recovery/027_stream_regress OK 469.49s 9 subtests passed while in the next, failing run https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=crake&dt=2024-07-25%2020%3A18%3A05&stg=recovery-check the same tests took: 1/297 postgresql:pg_upgrade / pg_upgrade/001_basic OK0.22s 9 subtests passed 2/297 postgresql:pg_upgrade / pg_upgrade/003_logical_slots OK 56.62s 12 subtests passed 3/297 postgresql:pg_upgrade / pg_upgrade/004_subscription OK 71.92s 14 subtests passed 21/297 postgresql:isolation / isolation/isolation OK 299.12s 119 subtests passed 31/297 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK 344.42s 18 subtests passed 159/297 postgresql:initdb / initdb/001_initdb OK 344.46s 52 subtests passed 162/297 postgresql:recovery / recovery/027_stream_regress ERROR 840.84s exit status 29 Based on this, it seems fairly likely that crake is simply timing out as a consequence of intermittent heavy background activity. The latest failure is this: Waiting for replication conn standby_1's replay_lsn to pass 2/88E4E260 on primary [16:40:29.481](208.545s) # poll_query_until timed out executing this query: # SELECT '2/88E4E260' <= replay_lsn AND state = 'streaming' # FROM pg_catalog.pg_stat_replication # WHERE application_name IN ('standby_1', 'walreceiver') # expecting this output: # t # last actual query output: # f # with stderr: timed out waiting for catchup at /home/andrew/bf/root/HEAD/pgsql/src/test/recovery/t/027_stream_regress.pl line 103. Maybe it's a case where the system is overloaded, I dunno. I wouldn't bet my house on it. Pretty much nothing else runs on this machine. I have added a mild throttle to the buildfarm config so it doesn't try to run every branch at once. Maybe I also need to bring down the number or meson jobs too? But I suspect there's something deeper. Prior to the failure of this test 10 days ago it hadn't failed in a very long time. The OS was upgraded a month ago. Eight or so days ago I changed PG_TEST_EXTRA. I can't think of anything else. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Recent 027_streaming_regress.pl hangs
On 2024-07-25 Th 12:00 AM, Alexander Lakhin wrote: Hello Andrew, 04.06.2024 13:00, Alexander Lakhin wrote: Also, 027_stream_regress still fails due to the same reason: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2024-05-22%2021%3A55%3A03 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2024-05-22%2021%3A54%3A50 (It's remarkable that these two animals failed at the same time.) It looks like crake is failing now due to other reasons (not just concurrency) — it produced 10+ failures of the 027_stream_regress test starting from July, 9. The first such failure on REL_16_STABLE was [1], and that was the first run with 'PG_TEST_EXTRA' => '... wal_consistency_checking'. There is one known issue related to wal_consistency_checking [2], but I see no "incorrect resource manager data checksum" in the failure log... Moreover, the first such failure on REL_17_STABLE was [3], but that run was performed without wal_consistency_checking, as far as I can see. Can that failure be also related to the OS upgrade (I see that back in June crake was running on Fedora 39, but now it's running on Fedora 40)? So maybe we have two factors combined or there is another one? [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2024-07-17%2014%3A56%3A09 [2] https://www.postgresql.org/message-id/055bb33c-bccc-bc1d-c2f8-859853444...@gmail.com [3] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2024-07-09%2021%3A37%3A04 Unlikely. The change in OS version was on June 17, more than a month ago. But yes we do seem to have seen a lot of recovery_check failures on crake in the last 8 days, which is roughly when I changed PG_TEST_EXTRA to get more coverage. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: xid_wraparound tests intermittent failure.
On 2024-07-23 Tu 6:59 PM, Masahiko Sawada wrote: See<https://bitbucket.org/adunstan/rotfang-fdw/downloads/xid-wraparound-result.tar.bz2> The failure logs are from a run where both tests 1 and 2 failed. Thank you for sharing the logs. I think that the problem seems to match what Alexander Lakhin mentioned[1]. Probably we can fix such a race condition somehow but I'm not sure it's worth it as setting autovacuum = off and autovacuum_max_workers = 1 (or a low number) is an extremely rare case. I think it would be better to stabilize these tests. One idea is to turn the autovacuum GUC parameter on while setting autovacuum_enabled = off for each table. That way, we can ensure that autovacuum workers are launched. And I think it seems to align real use cases. Regards, [1]https://www.postgresql.org/message-id/02373ec3-50c6-df5a-0d65-5b9b1c0c86d6%40gmail.com OK, do you want to propose a patch? cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Sporadic connection-setup-related test failures on Cygwin in v15-
On 2024-07-24 We 4:58 PM, Thomas Munro wrote: On Thu, Jul 25, 2024 at 1:00 AM Alexander Lakhin wrote: The important fact here is that this failure is not reproduced after 7389aad63 (in v16), so it seems that it's somehow related to signal processing. Given that, I'm inclined to stop here, without digging deeper, at least until there are plans to backport that fix or something... +1. I'm not planning to back-patch that work. Perhaps lorikeet could stop testing releases < 16? They don't work and it's not our bug[1]. We decided not to drop Cygwin support[2], but I don't think we're learning anything from investigating that noise in the known-broken branches. Sure, it can. I've made that change. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: fairywren timeout failures on the pg_amcheck/005_opclass_damage test
On 2024-07-25 Th 8:00 AM, Alexander Lakhin wrote: Hello hackers, Please take a look at today's failure [1], that raises several questions at once: 117/244 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade TIMEOUT 3001.48s exit status 1 180/244 postgresql:pg_amcheck / pg_amcheck/005_opclass_damage TIMEOUT 3001.43s exit status 1 Ok: 227 Expected Fail: 0 Fail: 0 Unexpected Pass: 0 Skipped: 15 Timeout: 2 But looking at the previous test run [2], marked 'OK', I can see almost the same: 115/244 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade TIMEOUT 3001.54s exit status 1 176/244 postgresql:pg_amcheck / pg_amcheck/005_opclass_damage TIMEOUT 3001.26s exit status 1 Ok: 227 Expected Fail: 0 Fail: 0 Unexpected Pass: 0 Skipped: 15 Timeout: 2 So it's not clear to me, why is the latter test run considered failed unlike the former? Yesterday I changed the way we detect failure in the buildfarm client, and pushed that to fairywren (and a few more of my animals). Previously it did not consider a timeout to be a failure, but now it does. I'm going to release that soon. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: xid_wraparound tests intermittent failure.
On 2024-07-22 Mo 10:11 PM, Tom Lane wrote: Andrew Dunstan writes: On 2024-07-22 Mo 12:46 PM, Tom Lane wrote: Masahiko Sawada writes: Looking at dodo's failures, it seems that while it passes module-xid_wraparound-check, all failures happened only during testmodules-install-check-C. Can we check the server logs written during xid_wraparound test in testmodules-install-check-C? Oooh, that is indeed an interesting observation. There are enough examples now that it's hard to dismiss it as chance, but why would the two runs be different? It's not deterministic. Perhaps. I tried "make check" on mamba's host and got exactly the same failures as with "make installcheck", which counts in favor of dodo's results being just luck. Still, dodo has now shown 11 failures in "make installcheck" and zero in "make check", so it's getting hard to credit that there's no difference. Yeah, I agree that's perplexing. That step doesn't run with "make -j nn", so it's a bit hard to see why it should get different results from one run rather than the other. The only thing that's different is that there's another postgres instance running. Maybe that's just enough to slow the test down? After all, this is an RPi. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: xid_wraparound tests intermittent failure.
On 2024-07-22 Mo 9:29 PM, Masahiko Sawada wrote: On Mon, Jul 22, 2024 at 12:53 PM Andrew Dunstan wrote: On 2024-07-22 Mo 12:46 PM, Tom Lane wrote: Masahiko Sawada writes: Looking at dodo's failures, it seems that while it passes module-xid_wraparound-check, all failures happened only during testmodules-install-check-C. Can we check the server logs written during xid_wraparound test in testmodules-install-check-C? Oooh, that is indeed an interesting observation. There are enough examples now that it's hard to dismiss it as chance, but why would the two runs be different? It's not deterministic. I tested the theory that it was some other concurrent tests causing the issue, but that didn't wash. Here's what I did: for f in `seq 1 100` do echo iteration = $f meson test --suite xid_wraparound || break done It took until iteration 6 to get an error. I don't think my Ubuntu instance is especially slow. e.g. "meson compile" normally takes a handful of seconds. Maybe concurrent tests make it more likely, but they can't be the only cause. Could you provide server logs in both OK and NG tests? I want to see if there's a difference in the rate at which tables are vacuumed. See <https://bitbucket.org/adunstan/rotfang-fdw/downloads/xid-wraparound-result.tar.bz2> The failure logs are from a run where both tests 1 and 2 failed. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: xid_wraparound tests intermittent failure.
On 2024-07-22 Mo 12:46 PM, Tom Lane wrote: Masahiko Sawada writes: Looking at dodo's failures, it seems that while it passes module-xid_wraparound-check, all failures happened only during testmodules-install-check-C. Can we check the server logs written during xid_wraparound test in testmodules-install-check-C? Oooh, that is indeed an interesting observation. There are enough examples now that it's hard to dismiss it as chance, but why would the two runs be different? It's not deterministic. I tested the theory that it was some other concurrent tests causing the issue, but that didn't wash. Here's what I did: for f in `seq 1 100` do echo iteration = $f meson test --suite xid_wraparound || break done It took until iteration 6 to get an error. I don't think my Ubuntu instance is especially slow. e.g. "meson compile" normally takes a handful of seconds. Maybe concurrent tests make it more likely, but they can't be the only cause. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Enhance pg_dump multi-threaded streaming (WAS: Re: filesystem full during vacuum - space recovery issues)
On 2024-07-19 Fr 9:46 AM, Thomas Simpson wrote: Hi Scott, I realize some of the background was snipped on what I sent to the hacker list, I'll try to fill in the details. Short background is very large database ran out of space during vacuum full taking down the server. There is a replica which was applying the WALs and so it too ran out of space. On restart after clearing some space, the database came back up but left over the in-progress rebuild files. I've cleared that replica and am using it as my rebuild target just now. Trying to identify the 'orphan' files and move them away always led to the database spotting the supposedly unused files having gone and refusing to start, so I had no successful way to clean up and get space back. Last resort after discussion is pg_dumpall & reload. I'm doing this via a network pipe (netcat) as I do not have the vast amount of storage necessary for the dump file to be stored (in any format). On 19-Jul-2024 09:26, Scott Ribe wrote: Do you actually have 100G networking between the nodes? Because if not, a single CPU should be able to saturate 10G. Servers connect via 10G WAN; sending is not the issue, it's application of the incoming stream on the destination which is bottlenecked. Likewise the receiving end would need disk capable of keeping up. Which brings up the question, why not write to disk, but directly to the destination rather than write locally then copy? In this case, it's not a local write, it's piped via netcat. Do you require dump-reload because of suspected corruption? That's a tough one. But if not, if the goal is just to get up and running on a new server, why not pg_basebackup, streaming replica, promote? That depends on the level of data modification activity being low enough that pg_basebackup can keep up with WAL as it's generated and apply it faster than new WAL comes in, but given that your server is currently keeping up with writing that much WAL and flushing that many changes, seems likely it would keep up as long as the network connection is fast enough. Anyway, in that scenario, you don't need to care how long pg_basebackup takes. If you do need a dump/reload because of suspected corruption, the only thing I can think of is something like doing it a table at a time--partitioning would help here, if practical. The basebackup is, to the best of my understanding, essentially just copying the database files. Since the failed vacuum has left extra files, my expectation is these too would be copied, leaving me in the same position I started in. If I'm wrong, please tell me as that would be vastly quicker - it is how I originally set up the replica and it took only a few hours on the 10G link. The inability to get a clean start if I move any files out the way leads me to be concerned for some underlying corruption/issue and the recommendation earlier in the discussion was opt for dump/reload as the fail-safe. Resigned to my fate, my thoughts were to see if there is a way to improve the dump-reload approach for the future. Since dump-reload is the ultimate upgrade suggestion in the documentation, it seems worthwhile to see if there is a way to improve the performance of that especially as very large databases like mine are a thing with PostgreSQL. From a quick review of pg_dump.c (I'm no expert on it obviously), it feels like it's already doing most of what needs done and the addition is some sort of multi-thread coordination with a restore client to ensure each thread can successfully complete each task it has before accepting more work. I realize that's actually difficult to implement. There is a plan for a non-text mode for pg_dumpall. I have started work on it, and hope to have a WIP patch in a month or so. It's not my intention to parallelize it for the first cut, but it could definitely be parallelizable in future. However, it will require writing to disk somewhere, albeit that the data will be compressed. It's well nigh impossible to parallelize text format dumps. Restoration of custom and directory format dumps has long been parallelized. Parallel dumps require directory format, and so will non-text pg_dumpall. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Windows default locale vs initdb
On 2024-07-21 Su 10:51 PM, Thomas Munro wrote: Ertan Küçükoglu offered to try to review and test this, so here's a rebase. Some notes: * it turned out that the Turkish i/I test problem I mentioned earlier in this thread[1] was just always broken on Windows, we just didn't ever test with UTF-8 before Meson took over; it's skipped now, see commit cff4e5a3[2] * it seems that you can't actually put encodings like .1252 on the end (.UTF-8 must be a special case); I don't know if we should look into a better UTF-8 mode for modern Windows, but that'd be a separate project * this patch only benefits people who run initdb.exe without explicitly specifying a locale; probably a good number of real systems in the wild actually use EDB's graphical installer which initialises a cluster and has its own way of choosing the locale, as discussed in Ertan's thread[3] [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGJZskvCh%3DQm75UkHrY6c1QZUuC92Po9rponj1BbLmcMEA%40mail.gmail.com#3a00c08214a4285d2f3c4297b0ac2be2 [2] https://github.com/postgres/postgres/commit/cff4e5a3 [3] https://www.postgresql.org/message-id/flat/CAH2i4ydECHZPxEBB7gtRG3vROv7a0d3tqAFXzcJWQ9hRsc1znQ%40mail.gmail.com I have an environment I can use for testing. But what exactly am I testing? :-) Install a few "problem" language/region settings, switch the system and ensure initdb runs ok? Other than Turkish, which locales should I install? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: xid_wraparound tests intermittent failure.
On 2024-07-21 Su 4:08 PM, Alexander Lakhin wrote: Hello, 21.07.2024 20:34, Tom Lane wrote: Andrew Dunstan writes: I noticed this when working on the PostgreSQL::Test::Session project I have in hand. All the tests pass except occasionally the xid_wraparound tests fail. It's not always the same test script that fails either. I tried everything but couldn't make the failure stop. So then I switched out my patch so it's running on plain master and set things running in a loop. Lo and behold it can be relied on to fail after only a few iterations. I have been noticing xid_wraparound failures in the buildfarm too. They seemed quite infrequent, but it wasn't till just now that I realized that xid_wraparound is not run by default. (You have to put "xid_wraparound" in PG_TEST_EXTRA to enable it.) AFAICS the only buildfarm animals that have enabled it are dodo and perentie. dodo is failing this test fairly often: https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=dodo&br=HEAD I think this failure is counted at [1]. Please look at the linked message [2], where I described what makes the test fail. [1] https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures#001_emergency_vacuum.pl_fails_to_wait_for_datfrozenxid_advancing [2] https://www.postgresql.org/message-id/5811175c-1a31-4869-032f-7af5e3e45...@gmail.com It's sad nothing has happened abut this for 2 months. There's no point in having unreliable tests. What's not 100% clear to me is whether this failure indicates a badly formulated test or the test is correct and has identified an underlying bug. Regarding the point in [2] about the test being run twice in buildfarm clients, I think we should mark the module as NO_INSTALLCHECK in the Makefile. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: xid_wraparound tests intermittent failure.
On 2024-07-21 Su 1:34 PM, Tom Lane wrote: Andrew Dunstan writes: I noticed this when working on the PostgreSQL::Test::Session project I have in hand. All the tests pass except occasionally the xid_wraparound tests fail. It's not always the same test script that fails either. I tried everything but couldn't make the failure stop. So then I switched out my patch so it's running on plain master and set things running in a loop. Lo and behold it can be relied on to fail after only a few iterations. I have been noticing xid_wraparound failures in the buildfarm too. They seemed quite infrequent, but it wasn't till just now that I realized that xid_wraparound is not run by default. (You have to put "xid_wraparound" in PG_TEST_EXTRA to enable it.) AFAICS the only buildfarm animals that have enabled it are dodo and perentie. dodo is failing this test fairly often: https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=dodo&br=HEAD perentie doesn't seem to be having a problem, but I will bet that part of the reason is it's running with cranked-up timeouts: 'build_env' => { 'PG_TEST_EXTRA' => 'xid_wraparound', 'PG_TEST_TIMEOUT_DEFAULT' => '360' }, One thing that seems quite interesting is that the test seems to take about 10 minutes when successful on dodo, but when it fails it's twice that. Why the instability? (Perhaps dodo has highly variable background load, and the thing simply times out in some runs but not others?) Locally, I've not managed to reproduce the failure yet; so perhaps there is some platform dependency. What are you testing on? Linux ub22arm 5.15.0-116-generic #126-Ubuntu SMP Mon Jul 1 10:08:40 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux It's a VM running on UTM/Apple Silicon cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
xid_wraparound tests intermittent failure.
I noticed this when working on the PostgreSQL::Test::Session project I have in hand. All the tests pass except occasionally the xid_wraparound tests fail. It's not always the same test script that fails either. I tried everything but couldn't make the failure stop. So then I switched out my patch so it's running on plain master and set things running in a loop. Lo and behold it can be relied on to fail after only a few iterations. In the latest iteration the failure looks like this stderr: # poll_query_until timed out executing this query: # # SELECT NOT EXISTS ( # SELECT * # FROM pg_database # WHERE age(datfrozenxid) > current_setting('autovacuum_freeze_max_age')::int) # # expecting this output: # t # last actual query output: # f # with stderr: # Tests were run but no plan was declared and done_testing() was not seen. # Looks like your test exited with 29 just after 1. (test program exited with status code 29) ―― Summary of Failures: 295/295 postgresql:xid_wraparound / xid_wraparound/001_emergency_vacuum ERROR 211.76s exit status 29 cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: meson vs windows perl
On 2024-05-28 Tu 6:13 PM, Andres Freund wrote: Hi, On 2024-04-05 16:12:12 -0400, Andrew Dunstan wrote: OK, this has been fixed and checked. The attached is what I propose. The perl command is pretty hard to read. What about using python's shlex module instead? Rough draft attached. Still not very pretty, but seems easier to read? It'd be even better if we could just get perl to print out the flags in an easier to parse way, but I couldn't immediately see a way. Thanks for looking. The attached should be easier to read. The perl package similar to shlex is Text::ParseWords, which is already a requirement. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com diff --git a/meson.build b/meson.build index 5387bb6d5f..e244cbac92 100644 --- a/meson.build +++ b/meson.build @@ -993,20 +993,17 @@ if not perlopt.disabled() # Config's ccdlflags and ldflags. (Those are the choices of those who # built the Perl installation, which are not necessarily appropriate # for building PostgreSQL.) -ldopts = run_command(perl, '-MExtUtils::Embed', '-e', 'ldopts', check: true).stdout().strip() -undesired = run_command(perl_conf_cmd, 'ccdlflags', check: true).stdout().split() -undesired += run_command(perl_conf_cmd, 'ldflags', check: true).stdout().split() +perl_ldopts = run_command(perl, '-e', ''' +use ExtUtils::Embed; +use Text::ParseWords; +# tell perl to suppress including these in ldopts +*ExtUtils::Embed::_ldflags =*ExtUtils::Embed::_ccdlflags = sub { return ""; }; +# adding an argument to ldopts makes it return a value instead of printing +# print one of these per line so splitting will preserve spaces in file names. +print "$_\n" foreach shellwords(ldopts(undef)); +''', + check: true).stdout().strip().split('\n') -perl_ldopts = [] -foreach ldopt : ldopts.split(' ') - if ldopt == '' or ldopt in undesired -continue - endif - - perl_ldopts += ldopt.strip('"') -endforeach - -message('LDFLAGS recommended by perl: "@0@"'.format(ldopts)) message('LDFLAGS for embedding perl: "@0@"'.format(' '.join(perl_ldopts))) perl_dep_int = declare_dependency(
Re: Using LibPq in TAP tests via FFI
On 2024-07-18 Th 6:51 PM, Thomas Munro wrote: On Wed, Jul 17, 2024 at 2:27 AM Andrew Dunstan wrote: Here's the latest version of this patch. It removes all use of background_psql(). Instead it uses libpq's async interface, which seems to me far more robust. There is one remaining use of interactive_psql(), but that's reasonable as it's used for testing psql itself. This looks really nice! Works on my local FBSD machine. cool I pushed it to CI, and mostly saw environmental problems unrelated to the patch, but you might be interested in the ASAN failure visible in the cores section: https://cirrus-ci.com/task/6607915962859520 Unfortunately I can't see the interesting log messages, because it detected that the logs were still being appended to and declined to upload them. I think that means there must be subprocesses not being waited for somewhere? I couldn't see anything obvious either. I spent yesterday creating an XS wrapper for just the 19 libpq functions used in Session.pm. It's pretty simple. I have it passing a very basic test, but haven't tried plugging it into Session.pm yet. Neat. I guess the libpq FFI/XS piece looks the same to the rest of the test framework outside that module. Yeah, that's the idea. It does sound pretty convenient if the patch just works™ on CI/BF without any environment changes, which I assume must be doable because we already build XS stuff in sr/pl/plperl. Looking forward to trying that version. Still working on it. Meanwhile, here's a new version. It has some cleanup and also tries to use Session objects instead of psql in simple cases for safe_psql(). cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com diff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl index 481e4dbe4f..f821f9 100644 --- a/contrib/amcheck/t/001_verify_heapam.pl +++ b/contrib/amcheck/t/001_verify_heapam.pl @@ -5,6 +5,7 @@ use strict; use warnings FATAL => 'all'; use PostgreSQL::Test::Cluster; +use PostgreSQL::Test::Session; use PostgreSQL::Test::Utils; use Test::More; @@ -18,7 +19,9 @@ $node = PostgreSQL::Test::Cluster->new('test'); $node->init; $node->append_conf('postgresql.conf', 'autovacuum=off'); $node->start; -$node->safe_psql('postgres', q(CREATE EXTENSION amcheck)); +my $session = PostgreSQL::Test::Session->new(node => $node); + +$session->do(q(CREATE EXTENSION amcheck)); # # Check a table with data loaded but no corruption, freezing, etc. @@ -49,7 +52,7 @@ detects_heap_corruption( # Check a corrupt table with all-frozen data # fresh_test_table('test'); -$node->safe_psql('postgres', q(VACUUM (FREEZE, DISABLE_PAGE_SKIPPING) test)); +$session->do(q(VACUUM (FREEZE, DISABLE_PAGE_SKIPPING) test)); detects_no_corruption("verify_heapam('test')", "all-frozen not corrupted table"); corrupt_first_page('test'); @@ -81,7 +84,7 @@ sub relation_filepath my ($relname) = @_; my $pgdata = $node->data_dir; - my $rel = $node->safe_psql('postgres', + my $rel = $session->query_oneval( qq(SELECT pg_relation_filepath('$relname'))); die "path not found for relation $relname" unless defined $rel; return "$pgdata/$rel"; @@ -92,8 +95,8 @@ sub fresh_test_table { my ($relname) = @_; - return $node->safe_psql( - 'postgres', qq( + return $session->do( + qq( DROP TABLE IF EXISTS $relname CASCADE; CREATE TABLE $relname (a integer, b text); ALTER TABLE $relname SET (autovacuum_enabled=false); @@ -117,8 +120,8 @@ sub fresh_test_sequence { my ($seqname) = @_; - return $node->safe_psql( - 'postgres', qq( + return $session->do( + qq( DROP SEQUENCE IF EXISTS $seqname CASCADE; CREATE SEQUENCE $seqname INCREMENT BY 13 @@ -134,8 +137,8 @@ sub advance_test_sequence { my ($seqname) = @_; - return $node->safe_psql( - 'postgres', qq( + return $session->query_oneval( + qq( SELECT nextval('$seqname'); )); } @@ -145,10 +148,7 @@ sub set_test_sequence { my ($seqname) = @_; - return $node->safe_psql( - 'postgres', qq( - SELECT setval('$seqname', 102); - )); + return $session->query_oneval(qq(SELECT setval('$seqname', 102))); } # Call SQL functions to reset the sequence @@ -156,8 +156,8 @@ sub reset_test_sequence { my ($seqname) = @_; - return $node->safe_psql( - 'postgres', qq( + return $session->do( + qq( ALTER SEQUENCE $seqname RESTART WITH 51 )); } @@ -169,6 +169,7 @@ sub corrupt_first_page my ($relname) = @_; my $relpath = relation_filepath($relname); + $session->close; $node->stop; my $fh; @@ -191,6 +192,7 @@ sub corrupt_first_page or BAIL_OUT("close failed: $!")
Re: documentation structure
On 2024-07-18 Th 4:16 PM, Corey Huinker wrote: looking back. The patch is big. no convenient way to review/validate it. Perhaps we can break up the patches as follows: 1. create the filelist.sgml entries, and create new files as you detailed, empty with func.sgml still managing the sections, but each section now has it's corresponding &func-something; The files are included, but they're completely empty. 2 to 999+. one commit per function moved from func.sgml to it's corresponding func-something.sgml. It'll be a ton of commits, but each one will be very easy to review. Alternately, if we put each function in its own file, there would be a series of commits, one per function like such: 1. create the new func-my-function.sgml, copying the definition of the same name 2. delete the definition in func.sgml, replaced with the &func-include; 3. new entry in the filelist. This approach looks (and IS) tedious, but it has several key advantages: 1. Each one is very easy to review. 2. Big reduction in future merge conflicts on func.sgml. 3. location of a given functions docs is now trivial. 4. separation of concerns with regard to content of function def vs placement of same. 5. Easy to ensure that all functions have an anchor. 6. The effort can stall and be resumed at our own pace. Perhaps your python script can be adapted to this approach? I'm willing to review, or collaborate, or both. I'm opposed to having a separate file for every function. I think breaking up func.sgml into one piece per sect1 is about right. If that proves cumbersome still we can look at breaking it up further, but let's start with that. More concretely, sometimes the bits that relate to a particular function are not contiguous. e.g. you might have an entry in a table for a function and then at the end Notes relating to that function. That make doing a file per function not very practical. Also, being able to view the context for your function documentation is useful when editing. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: PG_TEST_EXTRA and meson
On 2024-07-17 We 11:01 AM, Tom Lane wrote: Jacob Champion writes: On Wed, Jul 17, 2024 at 3:34 AM Nazir Bilal Yavuz wrote: Sorry, the previous reply was wrong; I misunderstood what you said. Yes, that is how the patch was coded and I agree that getting rid of config time PG_TEST_EXTRA could be a better alternative. Personally I use the config-time PG_TEST_EXTRA extensively. I'd be sad to see it go, especially if developers are no longer forced to use it. The existing and documented expectation is that PG_TEST_EXTRA is an environment variable, ie it's a runtime option not a configure option. Making it be the latter seems like a significant loss of flexibility to me. AIUI the only reason we have it as a configure option at all is that meson is *very* dogmatic about not using environment variables. I get their POV when it comes to building, but that should not extend to testing. That said, I don't mind if this is a configure option as long as it can be overridden at run time without having to run "meson configure". cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Windows perl/tcl requirement documentation
On 2024-07-16 Tu 7:21 PM, Michael Paquier wrote: On Mon, Jul 01, 2024 at 11:27:26AM -0400, Andrew Dunstan wrote: Our docs currently state this regarding the perl requirement for building on Windows: ActiveState Perl ActiveState Perl is required to run the build generation scripts. MinGW or Cygwin Perl will not work. It must also be present in the PATH. Binaries can be downloaded from https://www.activestate.com <https://www.activestate.com> (Note: version 5.14 or later is required, the free Standard Distribution is sufficient). This really hasn't been a true requirement for quite some time. I stopped using AS perl quite a few years ago due to possible licensing issues, and have been building with MSVC using Strawberry Perl ever since. Andres recently complained that Strawberry was somewhat out of date, but that's no longer really the case - it's on 5.38.2, which is the latest in that series, and perl 5.40.0 was only releases a few weeks ago. This is an area where I have proposed a set of changes in the last commit fest of March, but it led nowehere as, at least it seems to me, there was no strong consensus about what to mention as main references: https://commitfest.postgresql.org/47/4745/ https://www.postgresql.org/message-id/flat/zzegb7nqbkzm0...@paquier.xyz Not everything should be gone, but I was wondering back on this thread if it would make most sense to replace all these references to point to the popular packaging systems used in the buildfarm. At the very least we should stop recommending things we don't use or test with. Our default position of doing nothing is getting increasingly untenable here. We're actively misleading people IMNSHO. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: recovery test error
On 2024-07-16 Tu 7:45 PM, Michael Paquier wrote: On Tue, Jul 16, 2024 at 03:04:13PM -0400, Andrew Dunstan wrote: This was called by poll_query_until(), which is changed by the patch to use a libpq session rather than constantly forking psql. ISTM we should be passing true as a second parameter so we keep going if the file doesn't exist. Thoughts? Sounds like a good idea to me as this call could return ENOENT depending on the timing of the archiver pushing the new history file, as writeTimeLineHistory() at the end of recovery notifies the archiver but does not wait for the fact to happen (history files are prioritized, still there is a delay). Thanks. Done. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
recovery test error
As I was trying out the libpq perl wrapper on Windows, I encountered a failure in recovery test 002_archiving.pl from this query: SELECT size IS NOT NULL FROM pg_stat_file('c:/prog/postgresql/build/testrun/recovery/002_archiving/data/t_002_archiving_primary_data/archives/0002.history') The test errored out because the file didn't exist. This was called by poll_query_until(), which is changed by the patch to use a libpq session rather than constantly forking psql. ISTM we should be passing true as a second parameter so we keep going if the file doesn't exist. Thoughts? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Using LibPq in TAP tests via FFI
On 2024-06-17 Mo 10:01 AM, Andrew Dunstan wrote: I agree with you that falling back on BackgroundPsql is not a terribly satisfactory solution. I'm somewhat doubtful we'll just agree on making FFI::Platypus a hard dependency, but if we agree to do so... Maybe not. If so your other suggestion of a small XS wrapper might make sense. Here's the latest version of this patch. It removes all use of background_psql(). Instead it uses libpq's async interface, which seems to me far more robust. There is one remaining use of interactive_psql(), but that's reasonable as it's used for testing psql itself. I spent yesterday creating an XS wrapper for just the 19 libpq functions used in Session.pm. It's pretty simple. I have it passing a very basic test, but haven't tried plugging it into Session.pm yet. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com diff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl index 481e4dbe4f..f821f9 100644 --- a/contrib/amcheck/t/001_verify_heapam.pl +++ b/contrib/amcheck/t/001_verify_heapam.pl @@ -5,6 +5,7 @@ use strict; use warnings FATAL => 'all'; use PostgreSQL::Test::Cluster; +use PostgreSQL::Test::Session; use PostgreSQL::Test::Utils; use Test::More; @@ -18,7 +19,9 @@ $node = PostgreSQL::Test::Cluster->new('test'); $node->init; $node->append_conf('postgresql.conf', 'autovacuum=off'); $node->start; -$node->safe_psql('postgres', q(CREATE EXTENSION amcheck)); +my $session = PostgreSQL::Test::Session->new(node => $node); + +$session->do(q(CREATE EXTENSION amcheck)); # # Check a table with data loaded but no corruption, freezing, etc. @@ -49,7 +52,7 @@ detects_heap_corruption( # Check a corrupt table with all-frozen data # fresh_test_table('test'); -$node->safe_psql('postgres', q(VACUUM (FREEZE, DISABLE_PAGE_SKIPPING) test)); +$session->do(q(VACUUM (FREEZE, DISABLE_PAGE_SKIPPING) test)); detects_no_corruption("verify_heapam('test')", "all-frozen not corrupted table"); corrupt_first_page('test'); @@ -81,7 +84,7 @@ sub relation_filepath my ($relname) = @_; my $pgdata = $node->data_dir; - my $rel = $node->safe_psql('postgres', + my $rel = $session->query_oneval( qq(SELECT pg_relation_filepath('$relname'))); die "path not found for relation $relname" unless defined $rel; return "$pgdata/$rel"; @@ -92,8 +95,8 @@ sub fresh_test_table { my ($relname) = @_; - return $node->safe_psql( - 'postgres', qq( + return $session->do( + qq( DROP TABLE IF EXISTS $relname CASCADE; CREATE TABLE $relname (a integer, b text); ALTER TABLE $relname SET (autovacuum_enabled=false); @@ -117,8 +120,8 @@ sub fresh_test_sequence { my ($seqname) = @_; - return $node->safe_psql( - 'postgres', qq( + return $session->do( + qq( DROP SEQUENCE IF EXISTS $seqname CASCADE; CREATE SEQUENCE $seqname INCREMENT BY 13 @@ -134,8 +137,8 @@ sub advance_test_sequence { my ($seqname) = @_; - return $node->safe_psql( - 'postgres', qq( + return $session->query_oneval( + qq( SELECT nextval('$seqname'); )); } @@ -145,10 +148,7 @@ sub set_test_sequence { my ($seqname) = @_; - return $node->safe_psql( - 'postgres', qq( - SELECT setval('$seqname', 102); - )); + return $session->query_oneval(qq(SELECT setval('$seqname', 102))); } # Call SQL functions to reset the sequence @@ -156,8 +156,8 @@ sub reset_test_sequence { my ($seqname) = @_; - return $node->safe_psql( - 'postgres', qq( + return $session->do( + qq( ALTER SEQUENCE $seqname RESTART WITH 51 )); } @@ -169,6 +169,7 @@ sub corrupt_first_page my ($relname) = @_; my $relpath = relation_filepath($relname); + $session->close; $node->stop; my $fh; @@ -191,6 +192,7 @@ sub corrupt_first_page or BAIL_OUT("close failed: $!"); $node->start; + $session->reconnect; } sub detects_heap_corruption @@ -216,7 +218,7 @@ sub detects_corruption my ($function, $testname, @re) = @_; - my $result = $node->safe_psql('postgres', qq(SELECT * FROM $function)); + my $result = $session->query_tuples(qq(SELECT * FROM $function)); like($result, $_, $testname) for (@re); } @@ -226,7 +228,7 @@ sub detects_no_corruption my ($function, $testname) = @_; - my $result = $node->safe_psql('postgres', qq(SELECT * FROM $function)); + my $result = $session->query_tuples(qq(SELECT * FROM $function)); is($result, '', $testname); } diff --git a/contrib/amcheck/t/003_cic_2pc.pl b/contrib/amcheck/t/003_cic_2pc.pl index fc314b8524..ff345f36ac 100644 --- a/contrib/amcheck/t/003_cic_2pc.pl +++ b/cont
Re: ActiveState Perl is not valid anymore to build PG17 on the Windows 10/11 platforms, So Documentation still suggesting it should be updated
On 2024-07-16 Tu 7:46 AM, Yasir wrote: Hi Hackers, Recently, I compiled PG17 on the windows. Till PG16 "ActiveState Perl", as instructed in the documentation <https://www.postgresql.org/docs/16/install-windows-full.html#INSTALL-WINDOWS-FULL-REQUIREMENTS>, was being used successfully on the Windows 10/11 to compile PG. However, it looks like that "ActiveState Perl" is not valid anymore to compile PG17 on Windows 10/11 but documentation <https://www.postgresql.org/docs/17/installation-platform-notes.html#WINDOWS-REQUIREMENTS> still suggests it. Therefore, I think documentation needs to be updated. Moreover, I had to install "strawberry's perl" in order to compile PG17 on Windows 10/11. Please check out the thread "errors building on windows using meson <https://www.postgresql.org/message-id/flat/CADK3HHLQ1MNmfXqEvQi36D_MQrheOZPcXv2H3s6otMbSmfwjzg%40mail.gmail.com>" highlighting the issue. See https://postgr.es/m/4acddcd4-1c08-44e7-ba60-cab102259...@dunslane.net I agree we should fix the docco. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: CFbot failed on Windows platform
On 2024-07-11 Th 9:50 AM, Tom Lane wrote: Tatsuo Ishii writes: I think It is related to the '628c1d1f2c' commit. This commit changed the output of the regress test in Windows. Yeah, it seems that explains. I see few buildfarm window animals complain too. I think that the contents of src/test/regress/expected/collate.windows.win1252.out are actually wrong, and we'd not noticed because it was only checked with diff -w. psql does put an extra trailing space in some lines of table output, but that space isn't there in collate.windows.win1252.out. Yeah, makes sense I will produce an updated output file and then re-enable --strip-trailing-cr (after a full test run). cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: tests fail on windows with default git settings
On 2024-07-11 Th 7:29 AM, Andrew Dunstan wrote: On 2024-07-11 Th 4:59 AM, Nazir Bilal Yavuz wrote: Hi, On Wed, 10 Jul 2024 at 17:04, Andrew Dunstan wrote: On 2024-07-10 We 9:25 AM, Tom Lane wrote: Dave Page writes: On Wed, 10 Jul 2024 at 12:12, Andrew Dunstan wrote: As I was looking at this I wondered if there might be anywhere else that needed adjustment. One thing that occurred to me was that that maybe we should replace the use of "-w" in pg_regress.c with this rather less dangerous flag, so instead of ignoring any white space difference we would only ignore line end differences. The use of "-w" apparently dates back to 2009. That seems like a good improvement to me. +1 OK, done. It looks like Postgres CI did not like this change. 'Windows - Server 2019, VS 2019 - Meson & ninja' [1] task started to fail after this commit, there is one extra space at the end of line in regress test's output. [1] https://cirrus-ci.com/task/6753781205958656 Oh, that's annoying. Will investigate. Thanks for the heads up. I have reverted the pg_regress.c portion of the patch. I will investigate non line-end differences on Windows further. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: CFbot failed on Windows platform
On 2024-07-11 Th 5:34 AM, Tatsuo Ishii wrote: The differences are that the result has an extra space at the end of line. I am not familiar with Windows and has no idea why this could happen (I haven't changed the patch set since May 24. The last succeeded test was on July 9 14:58:44 (I am not sure about time zone). Also I see exactly the same test failures in some other tests (for example http://cfbot.cputube.org/highlights/all.html#4337) Any idea? I think It is related to the '628c1d1f2c' commit. This commit changed the output of the regress test in Windows. Yeah, it seems that explains. I see few buildfarm window animals complain too. I have partially reverted that patch. Thanks for the report. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: tests fail on windows with default git settings
On 2024-07-11 Th 4:59 AM, Nazir Bilal Yavuz wrote: Hi, On Wed, 10 Jul 2024 at 17:04, Andrew Dunstan wrote: On 2024-07-10 We 9:25 AM, Tom Lane wrote: Dave Page writes: On Wed, 10 Jul 2024 at 12:12, Andrew Dunstan wrote: As I was looking at this I wondered if there might be anywhere else that needed adjustment. One thing that occurred to me was that that maybe we should replace the use of "-w" in pg_regress.c with this rather less dangerous flag, so instead of ignoring any white space difference we would only ignore line end differences. The use of "-w" apparently dates back to 2009. That seems like a good improvement to me. +1 OK, done. It looks like Postgres CI did not like this change. 'Windows - Server 2019, VS 2019 - Meson & ninja' [1] task started to fail after this commit, there is one extra space at the end of line in regress test's output. [1] https://cirrus-ci.com/task/6753781205958656 Oh, that's annoying. Will investigate. Thanks for the heads up. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: tests fail on windows with default git settings
On 2024-07-10 We 9:25 AM, Tom Lane wrote: Dave Page writes: On Wed, 10 Jul 2024 at 12:12, Andrew Dunstan wrote: As I was looking at this I wondered if there might be anywhere else that needed adjustment. One thing that occurred to me was that that maybe we should replace the use of "-w" in pg_regress.c with this rather less dangerous flag, so instead of ignoring any white space difference we would only ignore line end differences. The use of "-w" apparently dates back to 2009. That seems like a good improvement to me. +1 OK, done. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Should we work around msvc failing to compile tab-complete.c?
On 2024-07-10 We 5:55 AM, Dave Page wrote: What is more relevant is that as far as I can see, we've never actually supported either libedit or readline in MSVC++ builds - which kinda makes sense because Windows terminals are very different from traditional *nix ones. Windows isn't supported by either libedit or readline as far as I can see, except through a couple of forks. I imagine from mingw/cygwin builds, readline would only actually work properly in their respective terminals. configure.ac explicitly forces --without-readline on win32, so no for mingw. configure.ac claims there are issues especially with non-US code pages. One of the reasons I've continued to support building with Cygwin is that its readline does seem to work, making its psql the best I know of on Windows. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: tests fail on windows with default git settings
On 2024-07-09 Tu 11:34 AM, Andrew Dunstan wrote: On 2024-07-09 Tu 9:52 AM, Dave Page wrote: > What I suggest (see attached) is we run the diff command with > --strip-trailing-cr on Windows. Then we just won't care if the expected file > and/or the output file has CRs. I was wondering about that too, but I wasn't sure we can rely on that flag being supported... I have 4 different diff.exe's on my ~6 week old build VM (not counting shims), all of which seem to support --strip-trailing-cr. Those builds came with: - git - VC++ - diffutils (installed by chocolatey) - vcpkg I think it's reasonable to assume it'll be supported. Ok, cool. So I propose to patch the test_json_parser and pg_bsd_indent tests to use it on Windows, later today unless there's some objection. As I was looking at this I wondered if there might be anywhere else that needed adjustment. One thing that occurred to me was that that maybe we should replace the use of "-w" in pg_regress.c with this rather less dangerous flag, so instead of ignoring any white space difference we would only ignore line end differences. The use of "-w" apparently dates back to 2009. Thoughts? cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: tests fail on windows with default git settings
On 2024-07-09 Tu 9:52 AM, Dave Page wrote: > What I suggest (see attached) is we run the diff command with > --strip-trailing-cr on Windows. Then we just won't care if the expected file > and/or the output file has CRs. I was wondering about that too, but I wasn't sure we can rely on that flag being supported... I have 4 different diff.exe's on my ~6 week old build VM (not counting shims), all of which seem to support --strip-trailing-cr. Those builds came with: - git - VC++ - diffutils (installed by chocolatey) - vcpkg I think it's reasonable to assume it'll be supported. Ok, cool. So I propose to patch the test_json_parser and pg_bsd_indent tests to use it on Windows, later today unless there's some objection. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: tests fail on windows with default git settings
On 2024-07-08 Mo 5:44 PM, Andres Freund wrote: What I suggest (see attached) is we run the diff command with --strip-trailing-cr on Windows. Then we just won't care if the expected file and/or the output file has CRs. I was wondering about that too, but I wasn't sure we can rely on that flag being supported... Well, my suggestion was to use it only on Windows. I'm using the diffutils from chocolatey, which has it, as does Msys2 diff. Not sure what you have in the CI setup. Not sure what the issue is with pg_bsd_indent, though. I think it's purely that we *read* with fopen("r") and write with fopen("wb"). Which means that any \r\n in the input will be converted to \n in the output. That's not a problem if the repo has been cloned without autocrlf, as there are no crlf in the expected files, but if autocrlf has been used, the expected files don't match. It doesn't look like it'd be trivial to make indent remember what was used in the input. So I think for now the best path is to just use .gitattributes to exclude the expected files from crlf conversion. If we don't want to do so repo wide, we can do so just for these files. either that or we could use the --strip-trailing-cr gadget here too. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: 010_pg_basebackup.pl vs multiple filesystems
On 2024-07-08 Mo 4:31 PM, Andres Freund wrote: Hi, On 2024-07-07 09:10:48 -0400, Andrew Dunstan wrote: On 2024-07-07 Su 7:28 AM, Andrew Dunstan wrote: I'll be happy to hear of one. I agree it's a mess. Maybe we could test that the temp directory is on the same device on Windows and skip the test if not? You could still get the test to run by setting TMPDIR and/or friends. Maybe we should just not try to rename the directory. Looking at the test I'm pretty sure the directory should be empty. Instead of trying to move it, let's just remove it, and recreate it in the tmp location. Good catch, yes, that'd be much better! diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index 489dde4adf..c0c334c6fc 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -363,8 +363,8 @@ my $sys_tempdir = PostgreSQL::Test::Utils::tempdir_short; # Elsewhere use $tempdir to avoid file system boundary issues with moving. my $tmploc = $windows_os ? $sys_tempdir : $tempdir; The comment would need a bit of editing, I guess. I think we should consider just getting rid of the os-dependant switch now, it shouldn't be needed anymore? -rename("$pgdata/pg_replslot", "$tmploc/pg_replslot") - or BAIL_OUT "could not move $pgdata/pg_replslot"; +rmtree("$pgdata/pg_replslot"); Should this perhaps be an rmdir, to ensure that we're not removing something we don't want (e.g. somebody adding an earlier test for slots that then gets broken by the rmtree)? OK, done like this. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: 010_pg_basebackup.pl vs multiple filesystems
On 2024-07-08 Mo 4:45 PM, Robert Haas wrote: On Sun, Jul 7, 2024 at 3:02 AM Andres Freund wrote: While working on [1] I encountered the issue that, on github-actions, 010_pg_basebackup.pl fails on windows. The reason for that is that github actions uses two drives, with TMP/TEMP located on c:, the tested code on d:. This causes the following code to fail: # Create a temporary directory in the system location. my $sys_tempdir = PostgreSQL::Test::Utils::tempdir_short; Whatever we end up doing about this, it would be a good idea to check the other places that use tempdir_short and see if they also need adjustment. I don't think it's a problem. There are lots of systems that have tempdir on a different device. That's why we previously saw lots of errors from this code, resulting in the present incomplete workaround. The fact that we haven't seen such errors from other tests means we should be ok. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: tests fail on windows with default git settings
On 2024-07-08 Mo 4:16 PM, Andres Freund wrote: On 2024-07-07 06:30:57 -0400, Andrew Dunstan wrote: On 2024-07-07 Su 1:26 AM, Tom Lane wrote: Andres Freund writes: Do we want to support checking out with core.autocrlf? -1. It would be a constant source of breakage, and you could never expect that (for example) making a tarball from such a checkout would match anyone else's results. Yeah, totally agree. If we do not want to support that, ISTM we ought to raise an error somewhere? +1, if we can figure out how. ISTM the right fix is probably to use PG_BINARY_R mode instead of "r" when opening the files, at least in the case if the test_json_parser tests. That does seem like it'd fix this issue, assuming the parser can cope with \r\n. Yes, the parser can handle \r\n. Note that they can only be white space in JSON - they can only be present in string values via escapes. I'm actually mildly surprised that the tests don't fail when *not* using autocrlf, because afaict test_json_parser_incremental.c doesn't set stdout to binary and thus we presumably end up with \r\n in the output? Except that that can't be true, because the test does pass on repos without autocrlf... That approach does seem to mildly conflict with Tom and your preference for fixing this by disallowing core.autocrlf? If we do so, the test never ought to see a crlf? IDK. I normally use core.autocrlf=false core.eol=lf on Windows. The editors I use are reasonably well behaved ;-) What I suggest (see attached) is we run the diff command with --strip-trailing-cr on Windows. Then we just won't care if the expected file and/or the output file has CRs. Not sure what the issue is with pg_bsd_indent, though. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com diff --git a/src/test/modules/test_json_parser/t/003_test_semantic.pl b/src/test/modules/test_json_parser/t/003_test_semantic.pl index 74e0fa5bb1..2a430ad0c4 100644 --- a/src/test/modules/test_json_parser/t/003_test_semantic.pl +++ b/src/test/modules/test_json_parser/t/003_test_semantic.pl @@ -29,7 +29,9 @@ print $fh $stdout, "\n"; close($fh); -($stdout, $stderr) = run_command([ "diff", "-u", $fname, $test_out ]); +my @diffopts = ("-u"); +push(@diffopts, "--strip-railing-cr") if $windows_os; +($stdout, $stderr) = run_command([ "diff", @diffopts, $fname, $test_out ]); is($stdout, "", "no output diff"); is($stderr, "", "no diff error"); diff --git a/src/test/modules/test_json_parser/test_json_parser_incremental.c b/src/test/modules/test_json_parser/test_json_parser_incremental.c index f4c442ac36..47040e1e42 100644 --- a/src/test/modules/test_json_parser/test_json_parser_incremental.c +++ b/src/test/modules/test_json_parser/test_json_parser_incremental.c @@ -124,7 +124,7 @@ main(int argc, char **argv) makeJsonLexContextIncremental(&lex, PG_UTF8, need_strings); initStringInfo(&json); - if ((json_file = fopen(testfile, "r")) == NULL) + if ((json_file = fopen(testfile, PG_BINARY_R)) == NULL) pg_fatal("error opening input: %m"); if (fstat(fileno(json_file), &statbuf) != 0) diff --git a/src/test/modules/test_json_parser/test_json_parser_perf.c b/src/test/modules/test_json_parser/test_json_parser_perf.c index ea85626cbd..74cc5f3f54 100644 --- a/src/test/modules/test_json_parser/test_json_parser_perf.c +++ b/src/test/modules/test_json_parser/test_json_parser_perf.c @@ -55,7 +55,7 @@ main(int argc, char **argv) sscanf(argv[1], "%d", &iter); - if ((json_file = fopen(argv[2], "r")) == NULL) + if ((json_file = fopen(argv[2], PG_BINARY_R)) == NULL) pg_fatal("Could not open input file '%s': %m", argv[2]); while ((n_read = fread(buff, 1, 6000, json_file)) > 0)
Re: ssl tests fail due to TCP port conflict
On 2024-07-08 Mo 8:00 AM, Alexander Lakhin wrote: Hello, 07.06.2024 17:25, Tom Lane wrote: Andrew Dunstan writes: I still think my patch to force TCP mode for the SSL test makes sense as well. +1 to both things. If that doesn't get the failure rate down to an acceptable level, we can look at the retry idea. I have push patches for both of those (i.e. start SSL test nodes in TCP mode and change the range of ports we allocate server ports from) I didn't see this email until after I had pushed them. I'd like to add that the kerberos/001_auth test also suffers from the port conflict, but slightly differently. Look for example at [1]: krb5kdc.log contains: Jul 02 09:29:41 andres-postgres-buildfarm-v5 krb5kdc[471964](info): setting up network... Jul 02 09:29:41 andres-postgres-buildfarm-v5 krb5kdc[471964](Error): Address already in use - Cannot bind server socket on 127.0.0.1.55853 Jul 02 09:29:41 andres-postgres-buildfarm-v5 krb5kdc[471964](Error): Failed setting up a UDP socket (for 127.0.0.1.55853) Jul 02 09:29:41 andres-postgres-buildfarm-v5 krb5kdc[471964](Error): Address already in use - Error setting up network As far as I can see, the port for kdc is chosen by PostgreSQL::Test::Kerberos, via PostgreSQL::Test::Cluster::get_free_port(), which checks only for TCP port availability (with can_bind()), but not for UDP, so this increases the probability of the conflict for this test (a similar failure: [2]). Although we can also find a failure with TCP: [3] (It's not clear to me, what processes can use UDP ports while testing, but maybe those buildfarm animals are running on the same logical machine simultaneously?) [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2024-07-02%2009%3A27%3A15 [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2024-05-15%2001%3A25%3A07 [3] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit&dt=2024-07-04%2008%3A28%3A19 Let's see if this persists now we are testing for free ports in a different range, not the range usually used for ephemeral ports. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: 010_pg_basebackup.pl vs multiple filesystems
On 2024-07-07 Su 7:28 AM, Andrew Dunstan wrote: On 2024-07-07 Su 3:02 AM, Andres Freund wrote: Hi, While working on [1] I encountered the issue that, on github-actions, 010_pg_basebackup.pl fails on windows. The reason for that is that github actions uses two drives, with TMP/TEMP located on c:, the tested code on d:. This causes the following code to fail: # Create a temporary directory in the system location. my $sys_tempdir = PostgreSQL::Test::Utils::tempdir_short; # On Windows use the short location to avoid path length issues. # Elsewhere use $tempdir to avoid file system boundary issues with moving. my $tmploc = $windows_os ? $sys_tempdir : $tempdir; rename("$pgdata/pg_replslot", "$tmploc/pg_replslot") or BAIL_OUT "could not move $pgdata/pg_replslot"; dir_symlink("$tmploc/pg_replslot", "$pgdata/pg_replslot") or BAIL_OUT "could not symlink to $pgdata/pg_replslot"; It's not possible to move (vs copy) a file between two filesystems, on most operating systems. Which thus leads to: [23:02:03.114](0.288s) Bail out! could not move D:\a\winpgbuild\winpgbuild\postgresql-17beta2\build/testrun/pg_basebackup/010_pg_basebackup\data/t_010_pg_basebackup_main_data/pgdata/pg_replslot This logic was added in commit e213de8e785aac4e2ebc44282b8dc0fcc74834e8 Author: Andrew Dunstan Date: 2023-07-08 11:21:58 -0400 Use shorter location for pg_replslot in pg_basebackup test and revised in commit e9f15bc9db7564a29460d089c0917590bc13fffc Author: Andrew Dunstan Date: 2023-07-08 12:34:25 -0400 Fix tmpdir issues with commit e213de8e78 The latter deals with precisely this issue, for !windows. I've temporarily worked around this by setting TMP/TEMP to something else, but ISTM we need a better solution. I'll be happy to hear of one. I agree it's a mess. Maybe we could test that the temp directory is on the same device on Windows and skip the test if not? You could still get the test to run by setting TMPDIR and/or friends. Maybe we should just not try to rename the directory. Looking at the test I'm pretty sure the directory should be empty. Instead of trying to move it, let's just remove it, and recreate it in the tmp location. Something like diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index 489dde4adf..c0c334c6fc 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -363,8 +363,8 @@ my $sys_tempdir = PostgreSQL::Test::Utils::tempdir_short; # Elsewhere use $tempdir to avoid file system boundary issues with moving. my $tmploc = $windows_os ? $sys_tempdir : $tempdir; -rename("$pgdata/pg_replslot", "$tmploc/pg_replslot") - or BAIL_OUT "could not move $pgdata/pg_replslot"; +rmtree("$pgdata/pg_replslot"); +mkdir ("$tmploc/pg_replslot", 0700); dir_symlink("$tmploc/pg_replslot", "$pgdata/pg_replslot") or BAIL_OUT "could not symlink to $pgdata/pg_replslot"; cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: 010_pg_basebackup.pl vs multiple filesystems
On 2024-07-07 Su 3:02 AM, Andres Freund wrote: Hi, While working on [1] I encountered the issue that, on github-actions, 010_pg_basebackup.pl fails on windows. The reason for that is that github actions uses two drives, with TMP/TEMP located on c:, the tested code on d:. This causes the following code to fail: # Create a temporary directory in the system location. my $sys_tempdir = PostgreSQL::Test::Utils::tempdir_short; # On Windows use the short location to avoid path length issues. # Elsewhere use $tempdir to avoid file system boundary issues with moving. my $tmploc = $windows_os ? $sys_tempdir : $tempdir; rename("$pgdata/pg_replslot", "$tmploc/pg_replslot") or BAIL_OUT "could not move $pgdata/pg_replslot"; dir_symlink("$tmploc/pg_replslot", "$pgdata/pg_replslot") or BAIL_OUT "could not symlink to $pgdata/pg_replslot"; It's not possible to move (vs copy) a file between two filesystems, on most operating systems. Which thus leads to: [23:02:03.114](0.288s) Bail out! could not move D:\a\winpgbuild\winpgbuild\postgresql-17beta2\build/testrun/pg_basebackup/010_pg_basebackup\data/t_010_pg_basebackup_main_data/pgdata/pg_replslot This logic was added in commit e213de8e785aac4e2ebc44282b8dc0fcc74834e8 Author: Andrew Dunstan Date: 2023-07-08 11:21:58 -0400 Use shorter location for pg_replslot in pg_basebackup test and revised in commit e9f15bc9db7564a29460d089c0917590bc13fffc Author: Andrew Dunstan Date: 2023-07-08 12:34:25 -0400 Fix tmpdir issues with commit e213de8e78 The latter deals with precisely this issue, for !windows. I've temporarily worked around this by setting TMP/TEMP to something else, but ISTM we need a better solution. I'll be happy to hear of one. I agree it's a mess. Maybe we could test that the temp directory is on the same device on Windows and skip the test if not? You could still get the test to run by setting TMPDIR and/or friends. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: tests fail on windows with default git settings
On 2024-07-07 Su 1:26 AM, Tom Lane wrote: Andres Freund writes: Do we want to support checking out with core.autocrlf? -1. It would be a constant source of breakage, and you could never expect that (for example) making a tarball from such a checkout would match anyone else's results. Yeah, totally agree. If we do not want to support that, ISTM we ought to raise an error somewhere? +1, if we can figure out how. ISTM the right fix is probably to use PG_BINARY_R mode instead of "r" when opening the files, at least in the case if the test_json_parser tests. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Problem while installing PostgreSQL using make
On 2024-07-04 Th 6:22 AM, Mohab Yaser wrote: can you send me a link to download this version on windows as I didn't find anything other than the one I already have downloaded On Thu, Jul 4, 2024 at 1:21 PM Aleksander Alekseev wrote: Hi, > While I was trying to install PostgreSQL from the git repository to start contributing I faced this issue. When I try to type ./configure it gives me this error > > checking for a thread-safe mkdir -p... /usr/bin/mkdir -p > checking for bison... /c/GnuWin32/bin/bison > configure: using bison (GNU Bison) 2.4.1 > checking for flex... configure: error: > *** The installed version of Flex, /c/GnuWin32/bin/flex, is too old to use with PostgreSQL. > *** Flex version 2.5.35 or later is required, but this is C:\GnuWin32\bin\flex.exe version 2.5.4. > > Look at the last two lines, the error says that the installed version of flex is too old and is 2.4 which is correct and not too old and should be valid but actually I can't proceed beyond this point. And I double checked the version of flex > > $ flex --version > C:\GnuWin32\bin\flex.exe version 2.5.4 > > and made sure that it is properly included in PATH > > $ which flex > /c/GnuWin32/bin/flex Flex 2.5.4 is ancient. Version 2.5.39 was released in 2020 and I didn't look further to figure out the exact release year of 2.5.4 You need something like flex 2.6.4 and bison >= 2.3. That's what I use. I assume this configure script is running under Msys2? Just install its flex and bison (and remove these ancient versions): pacman -S bison flex cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com