Re: pg_parse_json() should not leak token copies on failure

2024-10-02 Thread Andrew Dunstan



On 2024-10-01 Tu 3:04 PM, Jacob Champion wrote:

(Tom, thank you for the fixup in 75240f65!)

Off-list, Andrew suggested an alternative approach to the one I'd been
taking: let the client choose whether it wants to take token ownership
to begin with. v3 adds a new flag (and associated API) which will
allow libpq to refuse ownership of those tokens. The lexer is then
free to clean everything up during parse failures.

Usually I'm not a fan of "do the right thing" flags, but in this case,
libpq really is the outlier. And it's nice that existing clients
(potentially including extensions) don't have to worry about an API
change.



Yeah.

Generally looks good. Should we have a check in 
setJsonLexContextOwnsTokens() that we haven't started parsing yet, for 
the incremental case?






At the moment, we have a test matrix consisting of "standard frontend"
and "shlib frontend" tests for the incremental parser. I'm planning
for the v4 patch to extend that with a "owned/not owned" dimension;
any objections?



Sounds reasonable.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: msys inet_pton strangeness

2024-09-30 Thread Andrew Dunstan


On 2024-09-30 Mo 11:11 AM, Tom Lane wrote:

Andrew Dunstan  writes:

On 2024-09-30 Mo 10:08 AM, Tom Lane wrote:

Not entirely ... if fairywren had been generating that warning all
along, I would have noticed it long ago, because I periodically
scrape the BF database for compiler warnings.  There has to have
been some recent change in the system include files.

here's what I see on vendikar:

Oh, wait, I forgot this is only about the v15 branch.  I seldom
search for warnings except on HEAD.  Still, I'd have expected to
notice it while v15 was development tip.  Maybe we changed something
since then?

Anyway, it's pretty moot, I see no reason not to push forward
with the proposed fix.





Thanks, done.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: msys inet_pton strangeness

2024-09-30 Thread Andrew Dunstan



On 2024-09-30 Mo 10:08 AM, Tom Lane wrote:

Andrew Dunstan  writes:

Ah, so this is because gcc 14.1.0 treats this as an error but gcc 12.2.0
treats it as a warning. Now it makes sense.

Not entirely ... if fairywren had been generating that warning all
along, I would have noticed it long ago, because I periodically
scrape the BF database for compiler warnings.  There has to have
been some recent change in the system include files.



here's what I see on vendikar:


pgbfprod=> select min(snapshot) from build_status_log where log_stage in 
('build.log', 'make.log') and branch = 'REL_15_STABLE' and sysname = 
'fairywren' and snapshot > now() - interval '1500 days' and log_text ~ 
'inet_pton';

 min
---------
 2022-06-30 18:04:08
(1 row)


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: msys inet_pton strangeness

2024-09-30 Thread Andrew Dunstan



On 2024-09-29 Su 6:28 PM, Thomas Munro wrote:

Just an idea...

--- a/src/include/port/win32.h
+++ b/src/include/port/win32.h
@@ -16,7 +16,7 @@
   * get support for GetLocaleInfoEx() with locales. For everything else
   * the minimum version is Windows XP (0x0501).
   */
-#if defined(_MSC_VER) && _MSC_VER >= 1900
+#if !defined(_MSC_VER) || _MSC_VER >= 1900
  #define MIN_WINNT 0x0600
  #else
  #define MIN_WINNT 0x0501



This seems reasonable as just about the most minimal change we can make 
work.



cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: msys inet_pton strangeness

2024-09-30 Thread Andrew Dunstan



On 2024-09-30 Mo 7:00 AM, Alexander Lakhin wrote:

Hello Andrew and Thomas,

29.09.2024 18:47, Andrew Dunstan пишет:


I'm inclined to think we might need to reverse the order of the last 
two. TBH I don't really understand how this has worked up to now.




I've looked at the last successful run [1] and discovered that
fe-secure-common.c didn't compile cleanly too:
ccache gcc ... 
/home/pgrunner/bf/root/REL_15_STABLE/pgsql.build/../pgsql/src/interfaces/libpq/fe-secure-common.c
C:/tools/nmsys64/home/pgrunner/bf/root/REL_15_STABLE/pgsql/src/interfaces/libpq/fe-secure-common.c: 
In function 'pq_verify_peer_name_matches_certificate_ip':
C:/tools/nmsys64/home/pgrunner/bf/root/REL_15_STABLE/pgsql/src/interfaces/libpq/fe-secure-common.c:219:21: 
warning: implicit declaration of function 'inet_pton'; did you mean 
'inet_aton'? [-Wimplicit-function-declaration]

  219 | if (inet_pton(AF_INET6, host, &addr) == 1)
  | ^
  | inet_aton

So it worked just because that missing declaration generated just a
warning, not an error.




Ah, so this is because gcc 14.1.0 treats this as an error but gcc 12.2.0 
treats it as a warning. Now it makes sense.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: msys inet_pton strangeness

2024-09-29 Thread Andrew Dunstan


On 2024-09-29 Su 1:00 AM, Alexander Lakhin wrote:

Hello Thomas and Andrew,

28.09.2024 23:52, Thomas Munro wrote:
On Sun, Sep 29, 2024 at 6:26 AM Andrew Dunstan  
wrote:

We should have included ws2tcpip.h, which includes this:

#define InetPtonA inet_pton
WINSOCK_API_LINKAGE INT WSAAPI InetPtonA(INT Family, LPCSTR 
pStringBuf, PVOID pAddr);


It's conditioned on (_WIN32_WINNT >= 0x0600), but that should be true.

Can you print out the value to be sure?  I can't imagine they'd set it
lower themselves or make it go backwards in an upgrade, but perhaps
it's somehow not being set at all, and then we do:

#if defined(_MSC_VER) && _MSC_VER >= 1900
#define MIN_WINNT 0x0600
#else
#define MIN_WINNT 0x0501
#endif

In 16 we don't do that anymore, we just always set it to 0x0A00
(commit 495ed0ef2d72).  And before 15, we didn't want that function
yet (commit c1932e542863).


FWIW, I'm observing the same here.
For a trivial test.c (compiled with the same command line as
fe-secure-common.c) like:
"===_WIN32"
_WIN32;
"===_WIN32_WINNT";
_WIN32_WINNT;

with gcc -E (from mingw-w64-ucrt-x86_64-gcc 14.2.0-1), I get:
"===_WIN32"
1;
"===_WIN32_WINNT";
_WIN32_WINNT;

That is, _WIN32_WINNT is not defined, but with #include  
above,

I see:
"===_WIN32_WINNT";
0x603

With #include "postgres_fe.h" (as in fe-secure-common.c) I get:
"===_WIN32_WINNT";
0x0501;






Yeah, src/include/port/win32/sys/socket.h has:

   #include 
   #include 
   #include 

I'm inclined to think we might need to reverse the order of the last 
two. TBH I don't really understand how this has worked up to now.



cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: msys inet_pton strangeness

2024-09-28 Thread Andrew Dunstan


On 2024-09-28 Sa 11:49 AM, Tom Lane wrote:

Andrew Dunstan  writes:

It's complaining like this:
C:/tools/xmsys64/home/pgrunner/bf/root/REL_15_STABLE/pgsql/src/interfaces/libpq/fe-secure-common.c:219:21:
 error: implicit declaration of function 'inet_pton'; did you mean 'inet_aton'? 
[-Wimplicit-function-declaration]
219 | if (inet_pton(AF_INET6, host, &addr) == 1)
| ^
configure has determined that we have inet_pton, and I have repeated the
test manually.

configure's test is purely a linker test.  It does not check to see
where/whether the function is declared.  Meanwhile, the compiler is
complaining that it doesn't see a declaration.  So the problem
probably can be fixed by adding an #include, but you'll need to
figure out what.

I see that our other user of inet_pton, fe-secure-openssl.c,
has a rather different #include setup than fe-secure-common.c;
does it compile OK?



I'll try, but this error occurs before we get that far.

We should have included ws2tcpip.h, which includes this:

   #define InetPtonA inet_pton
   WINSOCK_API_LINKAGE INT WSAAPI InetPtonA(INT Family, LPCSTR pStringBuf, 
PVOID pAddr);

It's conditioned on (_WIN32_WINNT >= 0x0600), but that should be true.

So I'm still very confused ;-(


cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


msys inet_pton strangeness

2024-09-28 Thread Andrew Dunstan


A week or so ago I upgraded the msys2 animal fairywren to the latest 
msys2, and ever since then the build has been failing for Release 15. 
It's complaining like this:


ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement 
-Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 
-Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv 
-fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -O2  
-DFRONTEND -DUNSAFE_STAT_OK 
-I/home/pgrunner/bf/root/REL_15_STABLE/pgsql.build/../pgsql/src/interfaces/libpq 
-I../../../src/include 
-I/home/pgrunner/bf/root/REL_15_STABLE/pgsql.build/../pgsql/src/include  
-I../pgsql/src/include/port/win32  -I/c/progra~1/openssl-win64/include 
"-I/home/pgrunner/bf/root/REL_15_STABLE/pgsql.build/../pgsql/src/include/port/win32"
 -DWIN32_STACK_RLIMIT=4194304 -I../../../src/port 
-I/home/pgrunner/bf/root/REL_15_STABLE/pgsql.build/../pgsql/src/port -DSO_MAJOR_VERSION=5 
 -c -o fe-secure-common.o 
/home/pgrunner/bf/root/REL_15_STABLE/pgsql.build/../pgsql/src/interfaces/libpq/fe-secure-common.c
C:/tools/xmsys64/home/pgrunner/bf/root/REL_15_STABLE/pgsql/src/interfaces/libpq/fe-secure-common.c:
 In function 'pq_verify_peer_name_matches_certificate_ip':
C:/tools/xmsys64/home/pgrunner/bf/root/REL_15_STABLE/pgsql/src/interfaces/libpq/fe-secure-common.c:219:21:
 error: implicit declaration of function 'inet_pton'; did you mean 'inet_aton'? 
[-Wimplicit-function-declaration]
  219 | if (inet_pton(AF_INET6, host, &addr) == 1)
  | ^
  | inet_aton
make[3]: *** [: fe-secure-common.o] Error 1

configure has determined that we have inet_pton, and I have repeated the 
test manually. It's not a ccache issue - I have cleared the cache and 
the problem persists. The test run by meson on the same animal reports 
not finding the function.


So I'm a bit flummoxed about how to fix this, and would appreciate any 
suggestions.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: [PATCH] Add native windows on arm64 support

2024-09-28 Thread Andrew Dunstan


On 2024-09-24 Tu 2:31 PM, Dave Cramer wrote:



On Tue, 13 Feb 2024 at 16:28, Dave Cramer  
wrote:





On Tue, 13 Feb 2024 at 12:52, Andres Freund 
wrote:

Hi,

On 2024-02-13 12:49:33 -0500, Dave Cramer wrote:
> > I think I might have been on to something - if my human
emulation of a
> > preprocessor isn't wrong, we'd end up with
> >
> > #define S_UNLOCK(lock)  \
> >         do { _ReadWriteBarrier(); (*(lock)) = 0; } while (0)
> >
> > on msvc + arm. And that's entirely insufficient -
_ReadWriteBarrier() just
> > limits *compiler* level reordering, not CPU level
reordering.  I think it's
> > even insufficient on x86[-64], but it's definitely
insufficient on arm.
> >
> In fact ReadWriteBarrier has been deprecated
_ReadWriteBarrier | Microsoft
> Learn
>

<https://learn.microsoft.com/en-us/cpp/intrinsics/readwritebarrier?view=msvc-170>

I'd just ignore that, that's just pushing towards more modern
stuff that's
more applicable to C++ than C.


> I did try using atomic_thread_fence as per atomic_thread_fence -
> cppreference.com <http://cppreference.com>
> <https://en.cppreference.com/w/c/atomic/atomic_thread_fence>

The semantics of atomic_thread_fence are, uh, very odd.  I'd
just use
MemoryBarrier().

#defineS_UNLOCK(lock) \
do{ MemoryBarrier(); (*(lock)) =0; } while(0)
#endif
Has no effect.

I have no idea if that is what you meant that I should do ?

Dave



Revisiting this:

Andrew, can you explain the difference between ninja test (which 
passes) and what the build farm does. The buildfarm crashes.



The buildfarm client performs these steps:


   meson test -C $pgsql --no-rebuild --suite setup
   meson test -t $meson_test_timeout $jflag -C $pgsql --logbase checklog 
--print-errorlogs --no-rebuild --suite regress --test-args=--no-locale
   meson test -t $meson_test_timeout $jflag -C $pgsql --print-errorlogs 
--no-rebuild --logbase checkworld --no-suite setup --no-suite regress
   foreach tested locale: meson test -t $meson_test_timeout $jflag -v -C $pgsql 
--no-rebuild --print-errorlogs --setup running --suite regress-running 
--logbase regress-installcheck-$locale


$pgsql is the build root, $jflag is setting the number of jobs


IOW, we do test suite setup, run the core regression tests, run all the 
remaining non-install tests, then run the install tests for each locale.


We don't call ninja directly, but I don't see why that should make a 
difference.



cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: SQL:2023 JSON simplified accessor support

2024-09-27 Thread Andrew Dunstan



On 2024-09-27 Fr 5:49 AM, David E. Wheeler wrote:

On Sep 26, 2024, at 16:45, Alexandra Wang  wrote:


I didn’t run pgindent earlier, so here’s the updated version with the
correct indentation. Hope this helps!

Oh,  nice! I don’t suppose the standard also has defined an operator equivalent to 
->>, though, has it? I tend to want the text output far more often than a JSON 
scalar.



That would defeat being able to chain these.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: SQL:2023 JSON simplified accessor support

2024-09-26 Thread Andrew Dunstan



On 2024-09-26 Th 11:45 AM, Alexandra Wang wrote:

Hi,

I didn’t run pgindent earlier, so here’s the updated version with the
correct indentation. Hope this helps!



This is a really nice feature, and provides a lot of expressive power 
for such a small piece of code.


I notice this doesn't seem to work for domains over json and jsonb.


andrew@~=# create domain json_d as json;
CREATE DOMAIN
andrew@~=# create table test_json_dot(id int, test_json json_d);
CREATE TABLE
andrew@~=# insert into test_json_dot select 1, '{"a": 1, "b": 42}'::json;
INSERT 0 1  |  |
andrew@~=# select (test_json_dot.test_json).b, json_query(test_json, 
'lax $.b' WITH CONDITIONAL WRAPPER NULL ON EMPTY NULL ON ERROR) as 
expected from test_json_dot;
ERROR:  column notation .b applied to type json_d, which is not a 
composite type

LINE 1: select (test_json_dot.test_json).b, json_query(test_json, 'l...


I'm not sure that's a terribly important use case, but we should 
probably make it work. If it's a domain we should get the basetype of 
the domain. There's some example code in src/backend/utils/adt/jsonfuncs.c



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: attndims, typndims still not enforced, but make the value within a sane threshold

2024-09-23 Thread Andrew Dunstan



On 2024-09-20 Fr 12:38 AM, Tom Lane wrote:

Michael Paquier  writes:

On Fri, Sep 20, 2024 at 11:51:49AM +0800, Junwang Zhao wrote:

Should you also bump the catalog version?

No need to worry about that when sending a patch because committers
take care of that when merging a patch into the tree.  Doing that in
each patch submitted just creates more conflicts and work for patch
authors because they'd need to recolve conflicts each time a
catversion bump happens.  And that can happen on a daily basis
sometimes depending on what is committed.

Right.  Sometimes the committer forgets to do that :-(, which is
not great but it's not normally a big problem either.  We've concluded
it's better to err in that direction than impose additional work
on patch submitters.



FWIW, I have a git pre-commit hook that helps avoid that. Essentially it 
checks to see if there are changes in src/include/catalog but not in 
catversion.h. That's not a 100% check, but it probably catches the vast 
majority of changes that would require a catversion bump.



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [PATCH] WIP: replace method for jsonpath

2024-09-18 Thread Andrew Dunstan



On 2024-09-18 We 4:23 AM, Peter Eisentraut wrote:

On 17.09.24 21:16, David E. Wheeler wrote:
On Sep 17, 2024, at 15:03, Florents Tselai 
 wrote:


Fallback scenario: make this an extension, but in a first pass I 
didn’t find any convenient hooks.

One has to create a whole new scanner, grammar etc.


Yeah, it got me thinking about the RFC-9535 JSONPath "Function 
Extension" feature[1], which allows users to add functions. Would be 
cool to have a way to register jsonpath functions somehow, but I 
would imagine it’d need quite a bit of specification similar to 
RFC-9535. Wouldn’t surprise me to see something like that appear in a 
future version of the spec, with an interface something like CREATE 
OPERATOR.


Why can't we "just" call any suitable pg_proc-registered function from 
JSON path?  The proposed patch routes the example 
'$.replace("hello","bye")' internally to the internal implementation 
of the SQL function replace(..., 'hello', 'bye'). Why can't we do this 
automatically for any function call in a JSON path expression?






That might work. The thing that bothers me about the original proposal 
is this: what if we add a new non-spec jsonpath method and then a new 
version of the spec adds a method with the same name, but not compatible 
with our method? We'll be in a nasty place. At the very least I think we 
need to try hard to avoid that. Maybe we should prefix non-spec method 
names with "pg_", or maybe use an initial capital letter.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: A starter task

2024-09-17 Thread Andrew Dunstan


On 2024-09-15 Su 6:17 PM, sia kc wrote:



About inlining not sure how it is done with gmail. Maybe should use 
another email client.



Click the three dots with the tooltip "Show trimmed content". Then you 
can scroll down and put your reply inline. (Personally I detest the 
Gmail web interface, and use a different MUA, but you can do this even 
with the Gmail web app.)



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: jsonb_strip_nulls with arrays?

2024-09-17 Thread Andrew Dunstan


On 2024-09-17 Tu 5:26 AM, Florents Tselai wrote:


Currently:


|jsonb_strip_nulls| ( |jsonb| ) → |jsonb|

Deletes all object fields that have null values from the given JSON 
value, recursively. Null values that are not object fields are untouched.



> Null values that are not object fields are untouched.


Can we revisit this and make it work with arrays, too?

Tbh, at first sight that looked like the expected behavior for me.

That is strip nulls from arrays as well.


This has been available since 9.5 and iiuc predates lots of the jsonb 
array work.




I don't think that's a great idea. Removing an object field which has a 
null value shouldn't have any effect on the surrounding data, nor really 
any on other operations (If you try to get the value of the missing 
field it should give you back null). But removing a null array member 
isn't like that at all - unless it's the trailing member of the array it 
will renumber all the succeeding array members.


And I don't think we should be changing the behaviour of a function, 
that people might have been relying on for the better part of a decade.





In practice, though, whenever jsonb_build_array is used (especially 
with jsonpath),


a few nulls do appear in the resulting array most of the times,

Currently, there’s no expressive way to remove this.


We could also have jsonb_array_strip_nulls(jsonb) as well



We could, if we're going to do anything at all in this area. Another 
possibility would be to provide a second optional parameter for 
json{b}_strip_nulls. That's probably a better way to go.



cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: meson vs windows perl

2024-09-14 Thread Andrew Dunstan



On 2024-07-30 Tu 3:47 PM, Andrew Dunstan wrote:


On 2024-07-20 Sa 9:41 AM, Andrew Dunstan wrote:


On 2024-05-28 Tu 6:13 PM, Andres Freund wrote:

Hi,

On 2024-04-05 16:12:12 -0400, Andrew Dunstan wrote:

OK, this has been fixed and checked. The attached is what I propose.
The perl command is pretty hard to read. What about using python's 
shlex
module instead? Rough draft attached.  Still not very pretty, but 
seems easier

to read?

It'd be even better if we could just get perl to print out the flags 
in an

easier to parse way, but I couldn't immediately see a way.




Thanks for looking.

The attached should be easier to read. The perl package similar to 
shlex is Text::ParseWords, which is already a requirement.



It turns out that shellwords eats backslashes, so we would need 
something like this version, which I have tested on Windows. I will 
probably commit this in the next few days unless there's an objection.




pushed


cheers


andrew

--

Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Robocopy might be not robust enough for never-ending testing on Windows

2024-09-14 Thread Andrew Dunstan



On 2024-09-14 Sa 9:00 AM, Alexander Lakhin wrote:

Hello hackers,

While trying to reproduce inexplicable drongo failures (e. g., [1])
seemingly related to BackgroundPsql, I stumbled upon close, but not
the same issue. After many (6-8 thousands) iterations of the
015_stream.pl TAP test, psql failed to start with a 
STATUS_DLL_INIT_FAILED

error, and a corresponding Windows popup preventing a test exit (see ss-1
in the archive attached).

Upon reaching that state of the system, following test runs fail with one
or another error related to memory (not only with psql, but also with the
server processes):
testrun/subscription_13/015_stream/log/015_stream_publisher.log:2024-09-11 
20:01:51.777 PDT [8812] LOG:  server process (PID 11532) was 
terminated by exception 0xC0FD
testrun/subscription_14/015_stream/log/015_stream_subscriber.log:2024-09-11 
20:01:19.806 PDT [2036] LOG:  server process (PID 10548) was 
terminated by exception 0xC0FD
testrun/subscription_16/015_stream/log/015_stream_publisher.log:2024-09-11 
19:59:41.513 PDT [9128] LOG:  server process (PID 14476) was 
terminated by exception 0xC409
testrun/subscription_19/015_stream/log/015_stream_publisher.log:2024-09-11 
20:03:27.801 PDT [10156] LOG:  server process (PID 2236) was 
terminated by exception 0xC409
testrun/subscription_20/015_stream/log/015_stream_publisher.log:2024-09-11 
19:59:41.359 PDT [10656] LOG:  server process (PID 14712) was 
terminated by exception 0xC12D
testrun/subscription_3/015_stream/log/015_stream_publisher.log:2024-09-11 
20:02:23.815 PDT [13704] LOG:  server process (PID 13992) was 
terminated by exception 0xC0FD
testrun/subscription_9/015_stream/log/015_stream_publisher.log:2024-09-11 
19:59:41.360 PDT [9760] LOG:  server process (PID 11608) was 
terminated by exception 0xC142

...

When tests fail, I see Commit Charge reaching 100% (see ss-2 in the
attachment), while Physical Memory isn't all used up. To get OS to a
working state, I had to reboot it — killing processes, logoff/logon 
didn't

help.

I reproduced this issue again, investigated it and found out that it is
caused by robocopy (used by PostgreSQL::Test::Cluster->init), which is
leaking kernel objects, namely "Event objects" within Non-Paged pool on
each run.

This can be seen with Kernel Pool Monitor, or just with this simple PS 
script:

for ($i = 1; $i -le 100; $i++)
{
echo "iteration $i"
rm -r c:\temp\target
robocopy.exe /E /NJH /NFL /NDL /NP c:\temp\initdb-template c:\temp\target
Get-WmiObject -Class Win32_PerfRawData_PerfOS_Memory | % 
PoolNonpagedBytes

}

It shows to me:
iteration 1
   Total    Copied   Skipped  Mismatch    FAILED Extras
    Dirs :    27    27 0 0 0 0
   Files :   968   968 0 0 0 0
   Bytes :   38.29 m   38.29 m 0 0 0 0
   Times :   0:00:00   0:00:00   0:00:00 0:00:00
...
1226063872
...
iteration 100
   Total    Copied   Skipped  Mismatch    FAILED Extras
    Dirs :    27    27 0 0 0 0
   Files :   968   968 0 0 0 0
   Bytes :   38.29 m   38.29 m 0 0 0 0
   Times :   0:00:00   0:00:00   0:00:00 0:00:00
...
1245220864

(That is, 0.1-0.2 MB leaks per one robocopy run.)

I observed this on Windows 10 (Version 10.0.19045.4780), with all updates
installed, but not on Windows Server 2016 (10.0.14393.0). Moreover, using
robocopy v14393 on Windows 10 doesn't affect the issue.

Perhaps this information can be helpful for someone who is running
buildfarm/CI tests on Windows animals...

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-09-11%2007%3A24%3A53





Interesting, good detective work. Still, wouldn't this mean drongo would 
fail consistently?


I wonder why we're using robocopy instead of our own RecursiveCopy module?


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Mutable foreign key constraints

2024-09-14 Thread Andrew Dunstan



On 2024-09-12 Th 5:33 PM, Tom Lane wrote:

I happened to notice that Postgres will let you do

regression=# create table foo (id timestamp primary key);
CREATE TABLE
regression=# create table bar (ts timestamptz references foo);
CREATE TABLE

This strikes me as a pretty bad idea, because whether a particular
timestamp is equal to a particular timestamptz depends on your
timezone setting.  Thus the constraint could appear to be violated
after a timezone change.

I'm inclined to propose rejecting FK constraints if the comparison
operator is not immutable.  Among the built-in opclasses, the only
instances of non-immutable btree equality operators are

regression=# select amopopr::regoperator from pg_amop join pg_operator o on 
o.oid = amopopr join pg_proc p on p.oid = oprcode where amopmethod=403 and 
amopstrategy=3 and provolatile != 'i';
  amopopr
-
  =(date,timestamp with time zone)
  =(timestamp without time zone,timestamp with time zone)
  =(timestamp with time zone,date)
  =(timestamp with time zone,timestamp without time zone)
(4 rows)

A possible objection is that if anybody has such a setup and
hasn't noticed a problem because they never change their
timezone setting, they might not appreciate us breaking it.
So I certainly wouldn't propose back-patching this.  But
maybe we should add it as a foot-gun defense going forward.

Thoughts?





Isn't there an upgrade hazard here? People won't thank us if they can't 
now upgrade their clusters. If we can get around that then +1.



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Jargon and acronyms on this mailing list

2024-09-11 Thread Andrew Dunstan



On 2024-09-11 We 8:34 AM, Robert Haas wrote:

On Mon, Sep 9, 2024 at 3:54 PM Andrew Dunstan  wrote:

There are some serious obstacles to changing it all over, though. I
don't want to rewrite all the history, for example.

Because of the way git works, that really wouldn't be an issue. We'd
just push the tip of the master branch to main and then start
committing to main and delete master. The history wouldn't change at
all, because in git, a branch is really just a movable pointer to a
commit. The commits themselves don't know that they're part of a
branch.

A lot of things would break, naturally. We'd still all have master
branches in our local repositories and somebody might accidentally try
to push one of those branches back to the upstream repository and the
buildfarm and lots of other tooling would get confused and it would
all be a mess for a while, but the history itself would not change.




I think you misunderstood me. I wasn't referring to the git history, but 
the buildfarm history.


Anyway, I think what I have done should suffice. You should no longer 
see the name HEAD on the buildfarm server, although it will continue to 
exists in the database.


Incidentally, I wrote a blog post about changing the client default name 
some years ago: 
<http://adpgtech.blogspot.com/2021/06/buildfarm-adopts-modern-git-naming.html>


I also have scripting to do the git server changes (basically to set its 
default branch), although it's rather github-specific.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Jargon and acronyms on this mailing list

2024-09-10 Thread Andrew Dunstan



On 2024-09-09 Mo 3:54 PM, Andrew Dunstan wrote:


On 2024-09-09 Mo 1:19 PM, Robert Haas wrote:
On Mon, Sep 9, 2024 at 1:03 PM Andrew Dunstan  
wrote:
I guess I could try to write code to migrate everything, but it 
would be

somewhat fragile. And what would we do if we ever decided to migrate
"master" to another name like "main"? I do at least have code ready for
that eventuality, but it would (currently) still keep the visible name
of HEAD.

Personally, I think using HEAD to mean master is really confusing. In
git, master is a branch name, and HEAD is the tip of some branch, or
the random commit you've checked out that isn't even a branch. I know
that's not how it worked in CVS, but CVS was a very long time ago.

If we rename master to main or devel or something, we'll have to
adjust the way we speak again, but that's not a reason to keep using
the wrong terminology for the way things are now.



There are some serious obstacles to changing it all over, though. I 
don't want to rewrite all the history, for example.


What we could do relatively simply is change what is seen publicly. 
e.g. we could rewrite the status page to read "Branch: master". We 
could also change URLs we generate to use master instead of HEAD (and 
change it back when processing the URLs. And so on.



I've done this. Nothing in the client or the database has changed, but 
the fact that we refer to "master" as "HEAD" is pretty much hidden now 
from the web app and the emails it sends out. That should help lessen 
any confusion in casual viewers.


Comments welcome. I don't think I have missed anything but it's always 
possible.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Jargon and acronyms on this mailing list

2024-09-09 Thread Andrew Dunstan



On 2024-09-09 Mo 1:19 PM, Robert Haas wrote:

On Mon, Sep 9, 2024 at 1:03 PM Andrew Dunstan  wrote:

I guess I could try to write code to migrate everything, but it would be
somewhat fragile. And what would we do if we ever decided to migrate
"master" to another name like "main"? I do at least have code ready for
that eventuality, but it would (currently) still keep the visible name
of HEAD.

Personally, I think using HEAD to mean master is really confusing. In
git, master is a branch name, and HEAD is the tip of some branch, or
the random commit you've checked out that isn't even a branch. I know
that's not how it worked in CVS, but CVS was a very long time ago.

If we rename master to main or devel or something, we'll have to
adjust the way we speak again, but that's not a reason to keep using
the wrong terminology for the way things are now.



There are some serious obstacles to changing it all over, though. I 
don't want to rewrite all the history, for example.


What we could do relatively simply is change what is seen publicly. e.g. 
we could rewrite the status page to read "Branch: master". We could also 
change URLs we generate to use master instead of HEAD (and change it 
back when processing the URLs. And so on.


Changing things on the client side would be far more complicated and 
difficult.



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Jargon and acronyms on this mailing list

2024-09-09 Thread Andrew Dunstan



On 2024-09-08 Su 11:35 PM, Tom Lane wrote:

David Rowley  writes:

I think HEAD is commonly misused to mean master instead of the latest
commit of the current branch. I see the buildfarm even does that.
Thanks for getting that right in your blog post.

IIRC, HEAD *was* the technically correct term back when we were
using CVS.  Old habits die hard.





Yeah. The reason we kept doing it that way in the buildfarm was that for 
a period we actually had some animals using CVS and some using that 
new-fangled git thing.


I guess I could try to write code to migrate everything, but it would be 
somewhat fragile. And what would we do if we ever decided to migrate 
"master" to another name like "main"? I do at least have code ready for 
that eventuality, but it would (currently) still keep the visible name 
of HEAD.



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: json_query conditional wrapper bug

2024-09-05 Thread Andrew Dunstan




> On Sep 5, 2024, at 11:51 AM, Peter Eisentraut  wrote:
> 
> On 05.09.24 17:01, Andrew Dunstan wrote:
>>> On 2024-09-04 We 4:10 PM, Andrew Dunstan wrote:
>>> 
>>> On 2024-09-04 We 6:16 AM, Peter Eisentraut wrote:
>>>> On 28.08.24 11:21, Peter Eisentraut wrote:
>>>>> These are ok:
>>>>> 
>>>>> select json_query('{"a": 1, "b": 42}'::jsonb, 'lax $.b' without wrapper);
>>>>>   json_query
>>>>> 
>>>>>   42
>>>>> 
>>>>> select json_query('{"a": 1, "b": 42}'::jsonb, 'lax $.b' with 
>>>>> unconditional wrapper);
>>>>>   json_query
>>>>> 
>>>>>   [42]
>>>>> 
>>>>> But this appears to be wrong:
>>>>> 
>>>>> select json_query('{"a": 1, "b": 42}'::jsonb, 'lax $.b' with conditional 
>>>>> wrapper);
>>>>>   json_query
>>>>> 
>>>>>   [42]
>>>>> 
>>>>> This should return an unwrapped 42.
>>>> 
>>>> If I make the code change illustrated in the attached patch, then I get 
>>>> the correct result here.  And various regression test results change, 
>>>> which, to me, all look more correct after this patch.  I don't know what 
>>>> the code I removed was supposed to accomplish, but it seems to be wrong 
>>>> somehow.  In the current implementation, the WITH CONDITIONAL WRAPPER 
>>>> clause doesn't appear to work correctly in any case I could identify.
>>> 
>>> 
>>> Agree the code definitely looks wrong. If anything the test should probably 
>>> be reversed:
>>> 
>>> wrap = count > 1  || !(
>>> IsAJsonbScalar(singleton) ||
>>> (singleton->type == jbvBinary &&
>>> JsonContainerIsScalar(singleton->val.binary.data)));
>>> 
>>> i.e. in the count = 1 case wrap unless it's a scalar or a binary wrapping a 
>>> scalar. The code could do with a comment about the logic.
>>> 
>>> I know we're very close to release but we should fix this as it's a new 
>>> feature.
>> I thought about this again.
>> I don't know what the spec says,
> 
> Here is the relevant bit:
> 
> a) Case:
> i) If the length of SEQ is 0 (zero), then let WRAPIT be False.
> NOTE 479 — This ensures that the ON EMPTY behavior supersedes the WRAPPER 
> behavior.
> ii) If WRAPPER is WITHOUT ARRAY, then let WRAPIT be False.
> iii) If WRAPPER is WITH UNCONDITIONAL ARRAY, then let WRAPIT be True.
> iv) If WRAPPER is WITH CONDITIONAL ARRAY, then
> Case:
> 1) If SEQ has a single SQL/JSON item, then let WRAPIT be False.
> 2) Otherwise, let WRAPIT be True.
> 
> > but the Oracle docs say:>
>>Specify |WITH| |CONDITIONAL| |WRAPPER| to include the array wrapper
>>only if the path expression matches a single scalar value or
>>multiple values of any type. If the path expression matches a single
>>JSON object or JSON array, then the array wrapper is omitted.
>> So I now think the code that's there now is actually correct, and what you 
>> say appears wrong is also correct.
> 
> I tested the above test expressions as well as the regression test case 
> against Oracle and it agrees with my solution.  So it seems to me that this 
> piece of documentation is wrong.

Oh, odd. Then assuming a scalar is an SQL/JSON item your patch appears correct.

Cheers

Andrew




Re: json_query conditional wrapper bug

2024-09-05 Thread Andrew Dunstan


On 2024-09-04 We 4:10 PM, Andrew Dunstan wrote:


On 2024-09-04 We 6:16 AM, Peter Eisentraut wrote:

On 28.08.24 11:21, Peter Eisentraut wrote:

These are ok:

select json_query('{"a": 1, "b": 42}'::jsonb, 'lax $.b' without 
wrapper);

  json_query

  42

select json_query('{"a": 1, "b": 42}'::jsonb, 'lax $.b' with 
unconditional wrapper);

  json_query

  [42]

But this appears to be wrong:

select json_query('{"a": 1, "b": 42}'::jsonb, 'lax $.b' with 
conditional wrapper);

  json_query

  [42]

This should return an unwrapped 42.


If I make the code change illustrated in the attached patch, then I 
get the correct result here.  And various regression test results 
change, which, to me, all look more correct after this patch.  I 
don't know what the code I removed was supposed to accomplish, but it 
seems to be wrong somehow.  In the current implementation, the WITH 
CONDITIONAL WRAPPER clause doesn't appear to work correctly in any 
case I could identify.



Agree the code definitely looks wrong. If anything the test should 
probably be reversed:


    wrap = count > 1  || !(
    IsAJsonbScalar(singleton) ||
    (singleton->type == jbvBinary &&
JsonContainerIsScalar(singleton->val.binary.data)));

i.e. in the count = 1 case wrap unless it's a scalar or a binary 
wrapping a scalar. The code could do with a comment about the logic.


I know we're very close to release but we should fix this as it's a 
new feature.



I thought about this again.

I don't know what the spec says, but the Oracle docs say:

   Specify |WITH| |CONDITIONAL| |WRAPPER| to include the array wrapper
   only if the path expression matches a single scalar value or
   multiple values of any type. If the path expression matches a single
   JSON object or JSON array, then the array wrapper is omitted.

So I now think the code that's there now is actually correct, and what 
you say appears wrong is also correct.




cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: json_query conditional wrapper bug

2024-09-04 Thread Andrew Dunstan



On 2024-09-04 We 6:16 AM, Peter Eisentraut wrote:

On 28.08.24 11:21, Peter Eisentraut wrote:

These are ok:

select json_query('{"a": 1, "b": 42}'::jsonb, 'lax $.b' without 
wrapper);

  json_query

  42

select json_query('{"a": 1, "b": 42}'::jsonb, 'lax $.b' with 
unconditional wrapper);

  json_query

  [42]

But this appears to be wrong:

select json_query('{"a": 1, "b": 42}'::jsonb, 'lax $.b' with 
conditional wrapper);

  json_query

  [42]

This should return an unwrapped 42.


If I make the code change illustrated in the attached patch, then I 
get the correct result here.  And various regression test results 
change, which, to me, all look more correct after this patch.  I don't 
know what the code I removed was supposed to accomplish, but it seems 
to be wrong somehow.  In the current implementation, the WITH 
CONDITIONAL WRAPPER clause doesn't appear to work correctly in any 
case I could identify.



Agree the code definitely looks wrong. If anything the test should 
probably be reversed:


    wrap = count > 1  || !(
    IsAJsonbScalar(singleton) ||
    (singleton->type == jbvBinary &&
JsonContainerIsScalar(singleton->val.binary.data)));

i.e. in the count = 1 case wrap unless it's a scalar or a binary 
wrapping a scalar. The code could do with a comment about the logic.


I know we're very close to release but we should fix this as it's a new 
feature.



cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Improving tracking/processing of buildfarm test failures

2024-09-02 Thread Andrew Dunstan


On 2024-09-01 Su 2:46 PM, sia kc wrote:

Hello everyone

I am a developer interested in this project. Had a little involvement 
with MariaDB and now I like to work on Postgres. Never worked with 
mailing lists so I am not sure if this is the way I should interact. 
Liked to be pointed to some tasks and documents to get started.



Do you mean you want to be involved with $subject, or that you just want 
to be involved in Postgres development generally? If the latter, then 
replying to a specific email thread is not the way to go, and the first 
thing to do is look at this wiki page 
<https://wiki.postgresql.org/wiki/Developer_FAQ>



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: pl/pgperl Patch for adding $_FN detail just like triggers have for $_TD

2024-08-30 Thread Andrew Dunstan



On 2024-08-30 Fr 3:49 PM, Andrew Dunstan wrote:





Sorry for empty reply. Errant finger hit send.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible

2024-08-30 Thread Andrew Dunstan


On 2024-08-29 Th 4:44 PM, Jacob Champion wrote:

As for the other patches, I'll ping Andrew about 0001,



Patch 0001 looks sane to me.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: pl/pgperl Patch for adding $_FN detail just like triggers have for $_TD

2024-08-30 Thread Andrew Dunstan



On 2024-08-29 Th 5:50 PM, Mark Murawski wrote:



On 8/29/24 16:54, Andrew Dunstan wrote:


On 2024-08-29 Th 1:01 PM, Mark Murawski wrote:



On 8/29/24 11:56, Andrew Dunstan wrote:


On 2024-08-28 We 5:53 PM, Mark Murawski wrote:

Hi Hackers!

This would be version v1 of this feature

Basically, the subject says it all: pl/pgperl Patch for being able 
to tell which function you're in.
This is a hashref so it will be possible to populate new and 
exciting other details in the future as the need arises


This also greatly improves logging capabilities for things like 
catching warnings,  Because as it stands right now, there's no 
information that can assist with locating the source of a warning 
like this:


# tail -f /var/log/postgresql.log
*** GOT A WARNING - Use of uninitialized value $prefix in 
concatenation (.) or string at (eval 531) line 48.


Now, with $_FN you can do this:


CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE 
plperlu AS $function$


use warnings;
use strict;
use Data::Dumper;

$SIG{__WARN__} = sub {
  elog(NOTICE, Dumper($_FN));

  print STDERR "In Function: $_FN->{name}: $_[0]\n";
};

my $a;
print "$a"; # uninit!

return undef;

$function$
;

This patch is against 12 which is still our production branch. 
This could easily be also patched against newer releases as well.


I've been using this code in production now for about 3 years, 
it's greatly helped track down issues.  And there shouldn't be 
anything platform-specific here, it's all regular perl API


I'm not sure about adding testing.  This is my first postgres 
patch, so any guidance on adding regression testing would be 
appreciated.


The rationale for this has come from the need to know the source 
function name, and we've typically resorted to things like this in 
the past:


CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE 
plperlu AS $function$

my $function_name = 'throw_warning';
$SIG{__WARN__} = sub { print STDERR "In Function: $function_name: 
$_[0]\n"; }

$function$
;

We've literally had to copy/paste this all over and it's something 
that postgres should just 'give you' since it knows the name 
already, just like when triggers pass you $_TD with all the 
pertinent information


A wishlist item would be for postgres plperl to automatically 
prepend the function name and schema when throwing perl warnings 
so you don't have to do your own __WARN__ handler, but this is the 
next best thing.




I'm not necessarily opposed to this, but the analogy to $_TD isn't 
really apt.  You can't know the trigger data at compile time, 
whereas you can know the function's name at compile time, using 
just the mechanism you find irksome.


And if we're going to do it for plperl, shouldn't we do it for 
other PLs?






Hi Andrew,


Thanks for the feedback.


1) Why is this not similar to _TD?  It literally operates 
identically. At run-time it passes you $_TD  for triggers. Same her 
for functions.  This is all run-time.   What exactly is the issue 
you're trying to point out?



It's not the same as the trigger data case because the function name 
is knowable at compile time, as in fact you have demonstrated. You 
just find it a bit inconvenient to have to code for that knowledge. 
By contrast, trigger data is ONLY knowable at run time.


But I don't see that it is such a heavy burden to have to write

  my $funcname = "foo";

or similar in your function.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Understood, regarding knowability.  Trigger data is definitely going 
to be very dynamic in that regard.


No, It's not a heavy burden to hard code the function name.  But what 
my ideal goal would be is this:


CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE 
plperlu AS $function$
use 'PostgresWarnHandler'; # <--- imagine this registers a WARN 
handler and outputs $_FN->{name} for you as part of the final warning


my $a;
print $a;

 etc


and then there's nothing 'hard coded' regarding the name of the 
function, anywhere.  It just seems nonsensical that postgres plperl 
can't send you the name of your registered function and there's 
absolutely no way to get it other than hard coding the name 
(redundantly, considering you're already know the name when you're 
defining the function in the first place)


But even better would be catching the warn at the plperl level, 
prepending the function name to the warn message, and then you only need:


CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE 
plperlu AS $function$


my $a;
print $a;

 etc

And then in this hypothetical situation -- magic ensues, and you're 
left with this:

# tail -f /var/log/postgresql.log
*** GOT A WARNING - Use of uninitialized value $a in concatenation 
(.) or string in function public.throw_warning() line 1









--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pl/pgperl Patch for adding $_FN detail just like triggers have for $_TD

2024-08-29 Thread Andrew Dunstan



On 2024-08-29 Th 1:01 PM, Mark Murawski wrote:



On 8/29/24 11:56, Andrew Dunstan wrote:


On 2024-08-28 We 5:53 PM, Mark Murawski wrote:

Hi Hackers!

This would be version v1 of this feature

Basically, the subject says it all: pl/pgperl Patch for being able 
to tell which function you're in.
This is a hashref so it will be possible to populate new and 
exciting other details in the future as the need arises


This also greatly improves logging capabilities for things like 
catching warnings,  Because as it stands right now, there's no 
information that can assist with locating the source of a warning 
like this:


# tail -f /var/log/postgresql.log
*** GOT A WARNING - Use of uninitialized value $prefix in 
concatenation (.) or string at (eval 531) line 48.


Now, with $_FN you can do this:


CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE 
plperlu AS $function$


use warnings;
use strict;
use Data::Dumper;

$SIG{__WARN__} = sub {
  elog(NOTICE, Dumper($_FN));

  print STDERR "In Function: $_FN->{name}: $_[0]\n";
};

my $a;
print "$a"; # uninit!

return undef;

$function$
;

This patch is against 12 which is still our production branch. This 
could easily be also patched against newer releases as well.


I've been using this code in production now for about 3 years, it's 
greatly helped track down issues.  And there shouldn't be anything 
platform-specific here, it's all regular perl API


I'm not sure about adding testing.  This is my first postgres patch, 
so any guidance on adding regression testing would be appreciated.


The rationale for this has come from the need to know the source 
function name, and we've typically resorted to things like this in 
the past:


CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE 
plperlu AS $function$

my $function_name = 'throw_warning';
$SIG{__WARN__} = sub { print STDERR "In Function: $function_name: 
$_[0]\n"; }

$function$
;

We've literally had to copy/paste this all over and it's something 
that postgres should just 'give you' since it knows the name 
already, just like when triggers pass you $_TD with all the 
pertinent information


A wishlist item would be for postgres plperl to automatically 
prepend the function name and schema when throwing perl warnings so 
you don't have to do your own __WARN__ handler, but this is the next 
best thing.




I'm not necessarily opposed to this, but the analogy to $_TD isn't 
really apt.  You can't know the trigger data at compile time, whereas 
you can know the function's name at compile time, using just the 
mechanism you find irksome.


And if we're going to do it for plperl, shouldn't we do it for other 
PLs?






Hi Andrew,


Thanks for the feedback.


1) Why is this not similar to _TD?  It literally operates identically. 
At run-time it passes you $_TD  for triggers.    Same her for 
functions.  This is all run-time.   What exactly is the issue you're 
trying to point out?



It's not the same as the trigger data case because the function name is 
knowable at compile time, as in fact you have demonstrated. You just 
find it a bit inconvenient to have to code for that knowledge. By 
contrast, trigger data is ONLY knowable at run time.


But I don't see that it is such a heavy burden to have to write

  my $funcname = "foo";

or similar in your function.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pl/pgperl Patch for adding $_FN detail just like triggers have for $_TD

2024-08-29 Thread Andrew Dunstan



On 2024-08-28 We 5:53 PM, Mark Murawski wrote:

Hi Hackers!

This would be version v1 of this feature

Basically, the subject says it all: pl/pgperl Patch for being able to 
tell which function you're in.
This is a hashref so it will be possible to populate new and exciting 
other details in the future as the need arises


This also greatly improves logging capabilities for things like 
catching warnings,  Because as it stands right now, there's no 
information that can assist with locating the source of a warning like 
this:


# tail -f /var/log/postgresql.log
*** GOT A WARNING - Use of uninitialized value $prefix in 
concatenation (.) or string at (eval 531) line 48.


Now, with $_FN you can do this:


CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE 
plperlu AS $function$


use warnings;
use strict;
use Data::Dumper;

$SIG{__WARN__} = sub {
  elog(NOTICE, Dumper($_FN));

  print STDERR "In Function: $_FN->{name}: $_[0]\n";
};

my $a;
print "$a"; # uninit!

return undef;

$function$
;

This patch is against 12 which is still our production branch. This 
could easily be also patched against newer releases as well.


I've been using this code in production now for about 3 years, it's 
greatly helped track down issues.  And there shouldn't be anything 
platform-specific here, it's all regular perl API


I'm not sure about adding testing.  This is my first postgres patch, 
so any guidance on adding regression testing would be appreciated.


The rationale for this has come from the need to know the source 
function name, and we've typically resorted to things like this in the 
past:


CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE 
plperlu AS $function$

my $function_name = 'throw_warning';
$SIG{__WARN__} = sub { print STDERR "In Function: $function_name: 
$_[0]\n"; }

$function$
;

We've literally had to copy/paste this all over and it's something 
that postgres should just 'give you' since it knows the name already, 
just like when triggers pass you $_TD with all the pertinent information


A wishlist item would be for postgres plperl to automatically prepend 
the function name and schema when throwing perl warnings so you don't 
have to do your own __WARN__ handler, but this is the next best thing.




I'm not necessarily opposed to this, but the analogy to $_TD isn't 
really apt.  You can't know the trigger data at compile time, whereas 
you can know the function's name at compile time, using just the 
mechanism you find irksome.


And if we're going to do it for plperl, shouldn't we do it for other PLs?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Index AM API cleanup

2024-08-26 Thread Andrew Dunstan


On 2024-08-22 Th 2:28 PM, Mark Dilger wrote:



On Aug 22, 2024, at 1:36 AM, Alexandra Wang  
wrote:

I had to make some changes to the first two patches in order to run
"make check" and compile the treeb code on my machine. I’ve attached
my changes.

Thank you for the review, and the patches!



"make installcheck" for treeb is causing issues on my end. I can
investigate further if it’s not a problem for others.

The test module index AMs are not intended for use in any installed database, 
so 'make installcheck' is unnecessary.  A mere 'make check' should suffice.  
However, if you want to run it, you can install the modules, edit 
postgresql.conf to add 'treeb' to shared_preload_libraries, restart the server, 
and run 'make installcheck'.   This is necessary for 'treeb' because it 
requests shared memory, and that needs to be done at startup.

The v18 patch set includes the changes your patches suggest, though I modified 
the approach a bit.  Specifically, rather than standardizing on '1.0.0' for the 
module versions, as your patches do, I went with '1.0', as is standard in other 
modules in neighboring directories.  The '1.0.0' numbering was something I had 
been using in versions 1..16 of this patch, and I only partially converted to 
'1.0' before posting v17, so sorry about that.  The v18 patch also has some 
whitespace fixes.

To address your comments about the noise in the test failures, v18 modifies the clone_tests.pl script to do a 
little more work translating the expected output to expect the module's AM name ("xash", 
"xtree", "treeb", or whatnot) beyond what that script did in v17.




Small addition to your clone script, taking into account the existence 
of alternative result files:



diff --git a/src/tools/clone_tests.pl b/src/tools/clone_tests.pl
index c1c50ad90b..d041f93f87 100755
--- a/src/tools/clone_tests.pl
+++ b/src/tools/clone_tests.pl
@@ -183,6 +183,12 @@ foreach my $regress (@regress)
    push (@dst_regress, "$dst_sql/$regress.sql");
    push (@src_expected, "$src_expected/$regress.out");
    push (@dst_expected, "$dst_expected/$regress.out");
+   foreach my $alt (1,2)
+   {
+   next unless -f "$src_expected/${regress}_$alt.out";
+   push (@src_expected, "$src_expected/${regress}_$alt.out");
+   push (@dst_expected, "$dst_expected/${regress}_$alt.out");
+   }
 }
 my @isolation = grep /\w+/, split(/\s+/, $isolation);
 foreach my $isolation (@isolation)


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Index AM API cleanup

2024-08-21 Thread Andrew Dunstan


On 2024-08-21 We 4:09 PM, Mark Dilger wrote:



On Aug 21, 2024, at 12:34 PM, Kirill Reshke  wrote:

Hi! Why is the patch attached as .tar.bz2? Usually raw patches are sent here...

I worried the patch set, being greater than 1 MB, might bounce or be held up in 
moderation.



Yes, it would have required moderation AIUI. It is not at all 
unprecedented to send a compressed tar of patches, and is explicitly 
provided for by the cfbot: see 
<https://wiki.postgresql.org/wiki/Cfbot#Which_attachments_are_considered_to_be_patches.3F>



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: why is pg_upgrade's regression run so slow?

2024-08-20 Thread Andrew Dunstan



On 2024-08-19 Mo 8:00 AM, Alexander Lakhin wrote:

Hello Andrew,

29.07.2024 13:54, Andrew Dunstan wrote:


On 2024-07-29 Mo 4:00 AM, Alexander Lakhin wrote:


And another interesting fact is that TEMP_CONFIG is apparently 
ignored by

`meson test regress/regress`. For example, with temp.config containing
invalid settings, `meson test pg_upgrade/002_pg_upgrade` fails, but
`meson test regress/regress` passes just fine.




Well, that last fact explains the discrepancy I originally complained 
about. How the heck did that happen? It looks like we just ignored 
its use in Makefile.global.in :-(


Please look at the attached patch (partially based on ideas from [1]) for
meson.build, that aligns it with `make` in regard to use of TEMP_CONFIG.

Maybe it could be implemented via a separate meson option, but that would
also require modifying at least the buildfarm client...

[1] 
https://www.postgresql.org/message-id/CAN55FZ304Kp%2B510-iU5-Nx6hh32ny9jgGn%2BOB5uqPExEMK1gQQ%40mail.gmail.com





I think this is the way to go. The patch LGTM.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: The xversion-upgrade test fails to stop server

2024-08-20 Thread Andrew Dunstan



On 2024-08-18 Su 12:00 PM, Alexander Lakhin wrote:

Hello Andrew,

04.06.2024 14:56, Andrew Dunstan wrote:




but I can't see ../snapshot_too_old/output_iso/regression.diff and
.../snapshot_too_old/output_iso/log/postmaster.log in the log.




will do.



I've discovered a couple of other failures where the interesting log 
files

are not saved.

[1] has only inst/logfile saved and that's not enough for TAP tests in
src/test/modules.

I've emulated the failure (not trying to guess the real cause) with:
--- a/src/test/modules/test_custom_rmgrs/Makefile
+++ b/src/test/modules/test_custom_rmgrs/Makefile
@@ -14,0 +15,4 @@ TAP_TESTS = 1
+remove:
+    make uninstall
+install: remove

and saw mostly the same with the buildfarm client REL_17.

I've tried the following addition to run_build.pl:
@@ -2194,6 +2194,8 @@ sub make_testmodules_install_check
    my @logs = glob("$pgsql/src/test/modules/*/regression.diffs");
    push(@logs, "inst/logfile");
    $log->add_log($_) foreach (@logs);
+   @logs = glob("$pgsql/src/test/modules/*/tmp_check/log/*");
+   $log->add_log($_) foreach (@logs);
and it works as expected for me.

The output of another failure ([2]) contains only:
\342\226\266 1/1 + partition_prune  3736 ms FAIL
but no regression.diffs
(Fortunately, in this particular case I found "FATAL:  incorrect checksum
in control file" in inst/logfile, so that can explain the failure.)

Probably, run_build.pl needs something like:
@@ -1848,7 +1848,7 @@ sub make_install_check
    @checklog = run_log("cd $pgsql/src/test/regress && 
$make $chktarget");

    }
    my $status = $? >> 8;
-   my @logfiles = ("$pgsql/src/test/regress/regression.diffs", 
"inst/logfile");
+   my @logfiles = ("$pgsql/src/test/regress/regression.diffs", 
"$pgsql/testrun/regress-running/regress/regression.diffs", 
"inst/logfile");



Thanks, I have added these tweaks.






A similar failure have occurred today:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=opaleye&dt=2024-06-08%2001%3A41%3A41 



So maybe it would make sense to increase default PGCTLTIMEOUT for
buildfarm clients, say, to 180 seconds?



Sure. For now I have added it to the config on crake, but we can make 
it a default.


By the way, opaleye failed once again (with 120 seconds timeout):
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=opaleye&dt=2024-08-13%2002%3A04%3A07 



[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gull&dt=2024-08-06%2014%3A56%3A39
[2] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2024-08-17%2001%3A12%3A50





Yeah. In the next release the default will increase to 180.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Recent 027_streaming_regress.pl hangs

2024-08-12 Thread Andrew Dunstan


On 2024-08-11 Su 8:32 PM, Tom Lane wrote:

Andrew Dunstan  writes:

We'll see. I have switched crake from --run-parallel mode to --run-all
mode i.e. the runs are serialized. Maybe that will be enough to stop the
errors. I'm still annoyed that this test is susceptible to load, if that
is indeed what is the issue.

crake is still timing out intermittently on 027_streaming_regress.pl,
so that wasn't it.  I think we need more data.  We know that the
wait_for_catchup query is never getting to true:

SELECT '$target_lsn' <= ${mode}_lsn AND state = 'streaming'

but we don't know if the LSN condition or the state condition is
what is failing.  And if it is the LSN condition, it'd be good
to see the actual last LSN, so we can look for patterns like
whether there is a page boundary crossing involved.  So I suggest
adding something like the attached.

If we do this, I'd be inclined to instrument wait_for_slot_catchup
and wait_for_subscription_sync similarly, but I thought I'd check
for contrary opinions first.

        



Seems reasonable.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: PG_TEST_EXTRA and meson

2024-08-11 Thread Andrew Dunstan


On 2024-08-09 Fr 5:26 AM, Ashutosh Bapat wrote:

I this the patch lacks overriding PG_TEST_EXTRA at run time.

AFAIU, following was expected behaviour from both meson and make.
Please correct if I am wrong.
1. If PG_TEST_EXTRA is set at the setup/configuration time, it is not
required to be set at run time.
2. Runtime PG_TEST_EXTRA always overrides configure time PG_TEST_EXTRA.



Yes, that's my understanding of the requirement.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Windows default locale vs initdb

2024-08-08 Thread Andrew Dunstan


On 2024-08-08 Th 4:08 AM, Ertan Küçükoglu wrote:


I already installed Visual Studio 2022 with C++ support as
suggested in
https://www.postgresql.org/docs/current/install-windows-full.html
I cloned codes in the system.
But, I cannot find any "src/tools/msvc" directory. It is missing.
Document states I need everything in there
"The tools for building using Visual C++ or Platform SDK are in
the src\tools\msvc directory."
It seems I will need help setting up the build environment.


I am willing to be a tester for Windows given I could get help setting 
up the build environment.
It also feels documentation needs some update as I failed to find 
necessary files.



If you're trying to build the master branch those documents no longer 
apply. You will need to build using meson, as documented here: 
<https://www.postgresql.org/docs/17/install-meson.html>



cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Add trim_trailing_whitespace to editorconfig file

2024-08-08 Thread Andrew Dunstan


On 2024-08-07 We 4:42 PM, Jelte Fennema-Nio wrote:

On Wed, 7 Aug 2024 at 21:09, Andrew Dunstan  wrote:

You're not meant to use our pg_bsd_indent on its own without the
appropriate flags, namely (from src/tools/pgindent/pgindent):

Ah sorry, I wasn't clear in what I meant then. I meant that if you
look at the sources of pg_bsd_indent (such as
src/tools/pg_bsd_indent/io.c) then you'll realize that comments are
alligned using tabs of width 8, not tabs of width 4. And right now
.editorconfig configures editors to show all .c files with a tab_width
of 4, because we use that for Postgres source files. The bottom
.gitattributes line now results in an editorconfig rule that sets a
tab_width of 8 for just the c and h files in src/tools/pg_bsd_indent
directory.



Ah, OK. Yeah, that makes sense.





Also, why are you proposing to undet indent-style for .pl and .pm files?
That's not in accordance with our perltidy settings
(src/tools/pgindent/perltidyrc), unless I'm misunderstanding.

All the way at the bottom of the .editorconfig file those "ident_style
= unset" lines are overridden to be "tab" for .pl and .pm files.
There's a comment there explaining why it's done that way.

# We want editors to use tabs for indenting Perl files, but we cannot add it
# such a rule to .gitattributes, because certain lines are still indented with
# spaces (e.g. SYNOPSIS blocks).
[*.{pl,pm}]
indent_style = tab

But now thinking about this again after your comment, I realize it's
just as easy and effective to change the script slightly to hardcode
the indent_style for "*.pl" and "*.pm" so that the resulting
.editorconfig looks less confusing. Attached is a patch that does
that.



OK, good, thanks.




I also added a .gitattributes rule for .py files, and changed the
default tab_width to unset. Because I realized the resulting
.editorconfig was using tab_width 8 for python files when editing
src/tools/generate_editorconfig.py



sounds good.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Add trim_trailing_whitespace to editorconfig file

2024-08-07 Thread Andrew Dunstan



On 2024-08-07 We 1:09 PM, Jelte Fennema-Nio wrote:

On Tue, 9 Apr 2024 at 12:42, Jelte Fennema-Nio  wrote:

Okay, I spent the time to add a script to generate the editorconfig
based on .gitattributes after all. So attached is a patch that adds
that.

I would love to see this patch merged (or at least some feedback on
the latest version). I think it's pretty trivial and really low risk
of breaking anyone's workflow, and it would *significantly* improve my
own workflow.

Matthias mentioned on Discord that our vendored in pg_bsd_indent uses
a tabwidth of 8 and that was showing up ugly in his editor. I updated
the patch to include a fix for that too.



You're not meant to use our pg_bsd_indent on its own without the 
appropriate flags, namely (from src/tools/pgindent/pgindent):


"-bad -bap -bbb -bc -bl -cli1 -cp33 -cdb -nce -d0 -di12 -nfc1 -i4 -l79 
-lp -lpl -nip -npro -sac -tpg -ts4"


If that's inconvenient you can create a .indent.pro with the settings.


Also, why are you proposing to undet indent-style for .pl and .pm files? 
That's not in accordance with our perltidy settings 
(src/tools/pgindent/perltidyrc), unless I'm misunderstanding.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Official devcontainer config

2024-08-04 Thread Andrew Dunstan



On 2024-08-03 Sa 10:13 PM, Junwang Zhao wrote:

On Sat, Aug 3, 2024 at 7:30 PM Andrew Dunstan  wrote:


On 2024-08-02 Fr 2:45 PM, Peter Eisentraut wrote:

On 01.08.24 23:38, Andrew Dunstan wrote:

Not totally opposed, and I will probably give it a try very soon, but
I'm wondering if this really needs to go in the core repo. We've
generally shied away from doing much in the way of editor / devenv
support, trying to be fairly agnostic. It's true we carry
.dir-locals.el and .editorconfig, so that's not entirely true, but
those are really just about supporting our indentation etc. standards.

Yeah, the editor support in the tree ought to be minimal and factual,
based on coding standards and widely recognized best practices, not a
collection of one person's favorite aliases and scripts.  If the
scripts are good, let's look at them and maybe put them under
src/tools/ for everyone to use.  But a lot of this looks like it will
requite active maintenance if output formats or node formats or build
targets etc. change.  And other things require specific local paths.
That's fine for a local script or something, but not for a mainline
tool that the community will need to maintain.

I suggest to start with a very minimal configuration. What are the
settings that absolute everyone will need, maybe to set indentation
style or something.


I believe you can get VS Code to support editorconfig, so from that POV
maybe we don't need to do anything.

I did try yesterday with the code from the OP's patch symlinked into my
repo, but got an error with the Docker build, which kinda reinforces
your point.

The reason symlink does not work is that configure_vscode needs to copy
launch.json and tasks.json into .vscode, it has to be in the
WORKDIR/.devcontainer.



That's kind of awful. Anyway, I think we don't need to do anything about 
ignoring those. The user should simply add entries for them to 
.git/info/exclude or their local global exclude file (I have 
core.excludesfile = /home/andrew/.gitignore set.)


I was eventually able to get it to work without using a symlink.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Official devcontainer config

2024-08-03 Thread Andrew Dunstan



On 2024-08-02 Fr 2:45 PM, Peter Eisentraut wrote:

On 01.08.24 23:38, Andrew Dunstan wrote:
Not totally opposed, and I will probably give it a try very soon, but 
I'm wondering if this really needs to go in the core repo. We've 
generally shied away from doing much in the way of editor / devenv 
support, trying to be fairly agnostic. It's true we carry 
.dir-locals.el and .editorconfig, so that's not entirely true, but 
those are really just about supporting our indentation etc. standards.


Yeah, the editor support in the tree ought to be minimal and factual, 
based on coding standards and widely recognized best practices, not a 
collection of one person's favorite aliases and scripts.  If the 
scripts are good, let's look at them and maybe put them under 
src/tools/ for everyone to use.  But a lot of this looks like it will 
requite active maintenance if output formats or node formats or build 
targets etc. change.  And other things require specific local paths.  
That's fine for a local script or something, but not for a mainline 
tool that the community will need to maintain.


I suggest to start with a very minimal configuration. What are the 
settings that absolute everyone will need, maybe to set indentation 
style or something.




I believe you can get VS Code to support editorconfig, so from that POV 
maybe we don't need to do anything.


I did try yesterday with the code from the OP's patch symlinked into my 
repo, but got an error with the Docker build, which kinda reinforces 
your point.


Your point about "one person's preferences" is well taken - some of the 
git aliases supplied  clash with mine.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Official devcontainer config

2024-08-01 Thread Andrew Dunstan



On 2024-08-01 Th 10:56 AM, Junwang Zhao wrote:

Stack Overflow 2024 developer survey[1] said VSCode
is the most used development environment.

In a PostgreSQL Hacker Mentoring discussion, we talked
about how to use vscode to debug and running postgres,
Andrey(ccd) has tons of tips for new developers, and
I post my daily used devcontainer config[2] , Jelte(ccd)
suggested that it might be a good idea we integrate the
config into postgres repo so that the barrier to entry for
new developers will be much lower.

**Note**

This is not intended to change the workflow of experienced
hackers, it is just hoping to make the life easier for
beginner developers.

**How to use**

Open VSCode Command Palette(cmd/ctrl + shift + p),
search devcontainer, then choose something like
`Dev containers: Rebuild and Reopen in Container`, you are
good to go.

**About the patch**

devcontainer.json:

The .devcontainer/devcontainer.json is the entry point for
VSCode to *open folder to develop in a container*, it will build
the docker image for the first time you open in container,
this will take some time.

There are some parameters(runArgs) for running the container,
we need some settings and privileges to run perf or generate
core dumps.

It has a mount point mapping the hosts $HOME/freedom
to container's /opt/freedom, I chose the name *freedom*
because the container is kind of a jail.

It also installed some vscode extensions and did some
customizations.

After diving into the container, the postCreateCommand.sh
will be automatically called, it will do some configurations
like git, perf, .vscode, core_pattern, etc. It also downloads
michaelpq's pg_plugins and FlameGraph.

Dockerfile:

It is based on debian bookworm, it installed dependencies
to build postgres, also IPC::Run to run TAP tests I guess.

It also has a .gdbinit to break elog.c:errfinish for elevel 21,
this will make the debugging easier why error is logged.

gdbpg.py is adapted from https://github.com/tvondra/gdbpg,
I think putting it here will make it evolve as time goes.

tasks.json:

This is kind of like a bookkeeping for developers, it has
the following commands:

- meson debug setup
- meson release setup
- ninja build
- regression tests
- ninja install
- init cluster
- start cluster
- stop cluster
- install pg_bsd_indent
- pgindent
- apply patch
- generate flamegraph

launch.json:

It has one configuration that makes it possible to attach
to one process(e.g. postgres backend) and debug
with vscode.

PFA and please give it a try if you are a VSCode user.

[1]: 
https://survey.stackoverflow.co/2024/technology#1-integrated-development-environment
[2]: https://github.com/atomicdb/devcontainer/tree/main/postgres



Not totally opposed, and I will probably give it a try very soon, but 
I'm wondering if this really needs to go in the core repo. We've 
generally shied away from doing much in the way of editor / devenv 
support, trying to be fairly agnostic. It's true we carry .dir-locals.el 
and .editorconfig, so that's not entirely true, but those are really 
just about supporting our indentation etc. standards.


Also, might it not be better for this to be carried in a separate repo 
maintained by people using vscode? I don't know how may committers do. 
Maybe lots do, and I'm just a dinosaur. If vscode really needs 
.devcontainer to live in the code root, maybe it could be symlinked. 
Another reason not carry the code ourselves is that it will make it less 
convenient for people who want to customize it.


Without having tried it, looks like a nice effort though. Well done.


cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Improving tracking/processing of buildfarm test failures

2024-08-01 Thread Andrew Dunstan



On 2024-08-01 Th 5:00 AM, Alexander Lakhin wrote:



I also wrote a simple script (see attached) to check for unknown 
buildfarm

failures using "HTML API", to make sure no failures missed. Surely, it
could be improved in many ways, but I find it rather useful as-is.




I think we can improve on that. Scraping HTML is not a terribly 
efficient way of doing it. I'd very much like to improve the reporting 
side of the server.



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: can we mark upper/lower/textlike functions leakproof?

2024-07-31 Thread Andrew Dunstan



On 2024-07-31 We 2:43 PM, Joe Conway wrote:

On 7/31/24 14:03, Tom Lane wrote:

Robert Haas  writes:

If there are some inputs that cause upper() and lower() to fail and
others that do not, the functions aren't leakproof, because an
attacker can extract information about values that they can't see by
feeding those values into these functions and seeing whether they get
a failure or not.



[ rather exhaustive analysis redacted ]



First, thanks you very much, Robert for the analysis.





So in summary, I think upper() is ... pretty close to leakproof. But
if ICU sometimes fails on certain strings, then it isn't. And if the
multi-byte libc path can be made to fail reliably either with really
long strings or with certain choices of the LC_CTYPE locale, then it
isn't.


The problem here is that marking these functions leakproof is a
promise about a *whole bunch* of code, much of it not under our
control; worse, there's no reason to think all that code is stable.
A large fraction of it didn't even exist a few versions ago.

Even if we could convince ourselves that the possible issues Robert
mentions aren't real at the moment, I think marking these leakproof
is mighty risky.  It's unlikely we'd remember to revisit the marking
the next time someone drops a bunch of new code in here.



I still maintain that there is a whole host of users that would accept 
the risk of side channel attacks via existence of an error or not, if 
they could only be sure nothing sensitive leaks directly into the logs 
or to the clients. We should give them that choice.




As I meant to say in my previous empty reply, I think your suggestions 
make lots of sense.



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Recent 027_streaming_regress.pl hangs

2024-07-31 Thread Andrew Dunstan



On 2024-07-31 We 12:05 PM, Tom Lane wrote:

Andrew Dunstan  writes:

There seem to be a bunch of recent failures, and not just on crake, and
not just on HEAD:
<https://buildfarm.postgresql.org/cgi-bin/show_failures.pl?max_days=14&member=&stage=recoveryCheck&filter=Submit>

There were a batch of recovery-stage failures ending about 9 days ago
caused by instability of the new 043_vacuum_horizon_floor.pl test.
Once you take those out it doesn't look quite so bad.




We'll see. I have switched crake from --run-parallel mode to --run-all 
mode i.e. the runs are serialized. Maybe that will be enough to stop the 
errors. I'm still annoyed that this test is susceptible to load, if that 
is indeed what is the issue.



cheers


andrew


--

Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Recent 027_streaming_regress.pl hangs

2024-07-31 Thread Andrew Dunstan


On 2024-07-25 Th 6:33 PM, Andrew Dunstan wrote:



On 2024-07-25 Th 5:14 PM, Tom Lane wrote:

I wrote:

I'm confused by crake's buildfarm logs.  AFAICS it is not running
recovery-check at all in most of the runs; at least there is no
mention of that step, for example here:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2024-07-25%2013%3A27%3A02

Oh, I see it: the log file that is called recovery-check in a
failing run is called misc-check if successful.  That seems
mighty bizarre, and it's not how my own animals behave.
Something weird about the meson code path, perhaps?



Yes, it was discussed some time ago. As suggested by Andres, we run 
the meson test suite more or less all together (in effect like "make 
checkworld" but without the main regression suite, which is run on its 
own first), rather than in the separate (and serialized) way we do 
with the configure/make tests. That results in significant speedup. If 
the tests fail we report the failure as happening at the first failure 
we encounter, which is possibly less than ideal, but I haven't got a 
better idea.




Anyway, in this successful run:

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=crake&dt=2024-07-25%2018%3A57%3A02&stg=misc-check

here are some salient test timings:

   1/297 postgresql:pg_upgrade / pg_upgrade/001_basic   
 OK0.18s   9 subtests passed
   2/297 postgresql:pg_upgrade / pg_upgrade/003_logical_slots   
 OK   15.95s   12 subtests passed
   3/297 postgresql:pg_upgrade / pg_upgrade/004_subscription
 OK   16.29s   14 subtests passed
  17/297 postgresql:isolation / isolation/isolation 
 OK   71.60s   119 subtests passed
  41/297 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade  
 OK  169.13s   18 subtests passed
140/297 postgresql:initdb / initdb/001_initdb   
OK   41.34s   52 subtests passed
170/297 postgresql:recovery / recovery/027_stream_regress   
OK  469.49s   9 subtests passed

while in the next, failing run

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=crake&dt=2024-07-25%2020%3A18%3A05&stg=recovery-check

the same tests took:

   1/297 postgresql:pg_upgrade / pg_upgrade/001_basic   
 OK0.22s   9 subtests passed
   2/297 postgresql:pg_upgrade / pg_upgrade/003_logical_slots   
 OK   56.62s   12 subtests passed
   3/297 postgresql:pg_upgrade / pg_upgrade/004_subscription
 OK   71.92s   14 subtests passed
  21/297 postgresql:isolation / isolation/isolation 
 OK  299.12s   119 subtests passed
  31/297 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade  
 OK  344.42s   18 subtests passed
159/297 postgresql:initdb / initdb/001_initdb   
OK  344.46s   52 subtests passed
162/297 postgresql:recovery / recovery/027_stream_regress   
ERROR   840.84s   exit status 29

Based on this, it seems fairly likely that crake is simply timing out
as a consequence of intermittent heavy background activity.




The latest failure is this:


Waiting for replication conn standby_1's replay_lsn to pass 2/88E4E260 on 
primary
[16:40:29.481](208.545s) # poll_query_until timed out executing this query:
# SELECT '2/88E4E260' <= replay_lsn AND state = 'streaming'
#  FROM pg_catalog.pg_stat_replication
#  WHERE application_name IN ('standby_1', 'walreceiver')
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
timed out waiting for catchup at 
/home/andrew/bf/root/HEAD/pgsql/src/test/recovery/t/027_stream_regress.pl line 
103.


Maybe it's a case where the system is overloaded, I dunno. I wouldn't bet my 
house on it. Pretty much nothing else runs on this machine.

I have added a mild throttle to the buildfarm config so it doesn't try 
to run every branch at once. Maybe I also need to bring down the 
number or meson jobs too? But I suspect there's something deeper. 
Prior to the failure of this test 10 days ago it hadn't failed in a 
very long time. The OS was upgraded a month ago. Eight or so days ago 
I changed PG_TEST_EXTRA. I can't think of anything else.







There seem to be a bunch of recent failures, and not just on crake, and 
not just on HEAD: 
<https://buildfarm.postgresql.org/cgi-bin/show_failures.pl?max_days=14&member=&stage=recoveryCheck&filter=Submit>



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: can we mark upper/lower/textlike functions leakproof?

2024-07-31 Thread Andrew Dunstan



On 2024-07-31 We 9:14 AM, Joe Conway wrote:

On 7/31/24 05:47, Andrew Dunstan wrote:

On 2024-07-30 Tu 6:51 PM, David Rowley wrote:
On Wed, 31 Jul 2024 at 09:35, Andrew Dunstan 
wrote:
Fast forward to now. The customer has found no observable ill 
effects of

marking these functions leakproof. The would like to know if there is
any reason why we can't mark them leakproof, so that they don't 
have to

do this in every database of every cluster they use.

Thoughts?

According to [1], it's just not been done yet due to concerns about
risk to reward ratios.  Nobody mentioned any reason why it couldn't
be, but there were some fears that future code changes could yield new
failure paths.

David

[1]https://postgr.es/m/02BDFCCF-BDBB-4658-9717-4D95F9A91561%40thebuild.com 



Hmm, somehow I missed that thread in searching, and clearly I'd 
forgotten it.


Still, I'm not terribly convinced by arguments along the lines you're 
suggesting. "Sufficient unto the day is the evil thereof." Maybe we 
need a test to make sure we don't make changes along those lines, 
although I have no idea what such a test would look like.



I think I have expressed this opinion before (which was shot down), 
and I will grant that it is hand-wavy, but I will give it another try.


In my opinion, for this use case and others, it should be possible to 
redact the values substituted into log messages based on some 
criteria. One of those criteria could be "I am in a leakproof call 
right now". In fact in a similar fashion, an extension ought to be 
able to mutate the log message based on the entire string, e.g. when 
"ALTER ROLE...PASSWORD..." is spotted I would like to be able to 
redact everything after "PASSWORD".


Yes it might render the error message unhelpful, but I know of users 
that would accept that tradeoff in order to get better performance and 
security on their production workloads. Or in some cases (e.g. 
PASSWORD) just better security.



--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: can we mark upper/lower/textlike functions leakproof?

2024-07-31 Thread Andrew Dunstan


On 2024-07-30 Tu 6:51 PM, David Rowley wrote:

On Wed, 31 Jul 2024 at 09:35, Andrew Dunstan  wrote:

Fast forward to now. The customer has found no observable ill effects of
marking these functions leakproof. The would like to know if there is
any reason why we can't mark them leakproof, so that they don't have to
do this in every database of every cluster they use.

Thoughts?

According to [1], it's just not been done yet due to concerns about
risk to reward ratios.  Nobody mentioned any reason why it couldn't
be, but there were some fears that future code changes could yield new
failure paths.

David

[1]https://postgr.es/m/02BDFCCF-BDBB-4658-9717-4D95F9A91561%40thebuild.com



Hmm, somehow I missed that thread in searching, and clearly I'd 
forgotten it.


Still, I'm not terribly convinced by arguments along the lines you're 
suggesting. "Sufficient unto the day is the evil thereof." Maybe we need 
a test to make sure we don't make changes along those lines, although I 
have no idea what such a test would look like.



cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


can we mark upper/lower/textlike functions leakproof?

2024-07-30 Thread Andrew Dunstan



Several years ago, an EDB customer complained that if they used a 
functional index involving upper(), lower(), or textlike(), where RLS 
was involved, the indexes were not used because these functions are not 
marked leakproof. We presented the customer with several options for 
addressing the problem, the simplest of which was simply to mark the 
functions as leakproof, and this was the solution they adopted.


The consensus of discussion at the time among our senior developers was 
that there was probably no reason why at least upper() and lower() 
should not be marked leakproof, and quite possibly initcap() and 
textlike() also. It was suggested that we had not been terribly rigorous 
in assessing whether or not functions can be considered leakproof.


At the time we should have asked the community about it, but we didn't.

Fast forward to now. The customer has found no observable ill effects of 
marking these functions leakproof. The would like to know if there is 
any reason why we can't mark them leakproof, so that they don't have to 
do this in every database of every cluster they use.


Thoughts?


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: meson vs windows perl

2024-07-30 Thread Andrew Dunstan


On 2024-07-20 Sa 9:41 AM, Andrew Dunstan wrote:


On 2024-05-28 Tu 6:13 PM, Andres Freund wrote:

Hi,

On 2024-04-05 16:12:12 -0400, Andrew Dunstan wrote:

OK, this has been fixed and checked. The attached is what I propose.

The perl command is pretty hard to read. What about using python's shlex
module instead? Rough draft attached.  Still not very pretty, but 
seems easier

to read?

It'd be even better if we could just get perl to print out the flags 
in an

easier to parse way, but I couldn't immediately see a way.




Thanks for looking.

The attached should be easier to read. The perl package similar to 
shlex is Text::ParseWords, which is already a requirement.



It turns out that shellwords eats backslashes, so we would need 
something like this version, which I have tested on Windows. I will 
probably commit this in the next few days unless there's an objection.



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/meson.build b/meson.build
index 7de0371226..dfcfcfc15e 100644
--- a/meson.build
+++ b/meson.build
@@ -1065,20 +1065,19 @@ if not perlopt.disabled()
 # Config's ccdlflags and ldflags.  (Those are the choices of those who
 # built the Perl installation, which are not necessarily appropriate
 # for building PostgreSQL.)
-ldopts = run_command(perl, '-MExtUtils::Embed', '-e', 'ldopts', check: true).stdout().strip()
-undesired = run_command(perl_conf_cmd, 'ccdlflags', check: true).stdout().split()
-undesired += run_command(perl_conf_cmd, 'ldflags', check: true).stdout().split()
+perl_ldopts = run_command(perl, '-e', '''
+use ExtUtils::Embed;
+use Text::ParseWords;
+# tell perl to suppress including these in ldopts
+*ExtUtils::Embed::_ldflags =*ExtUtils::Embed::_ccdlflags = sub { return ""; };
+# adding an argument to ldopts makes it return a value instead of printing
+# print one of these per line so splitting will preserve spaces in file names.
+# shellwords eats backslashes, so we need to escape them.
+(my $opts = ldopts(undef)) =~ s!\\!!g;
+print "$_\n" foreach shellwords($opts);
+''',
+ check: true).stdout().strip().split('\n')
 
-perl_ldopts = []
-foreach ldopt : ldopts.split(' ')
-  if ldopt == '' or ldopt in undesired
-continue
-  endif
-
-  perl_ldopts += ldopt.strip('"')
-endforeach
-
-message('LDFLAGS recommended by perl: "@0@"'.format(ldopts))
 message('LDFLAGS for embedding perl: "@0@"'.format(' '.join(perl_ldopts)))
 
 perl_dep_int = declare_dependency(


Re: jsonpath: Inconsistency of timestamp_tz() Output

2024-07-30 Thread Andrew Dunstan


On 2024-07-22 Mo 3:12 AM, Jeevan Chalke wrote:



On Fri, Jul 19, 2024 at 7:35 PM David E. Wheeler 
 wrote:


On Jul 10, 2024, at 11:19, David E. Wheeler
 wrote:

> Oh, and the time and date were wrong, too, because I blindly
used the same conversion for dates as for timestamps. Fixed in v2.
>
> PR: https://github.com/theory/postgres/pull/7
> CF: https://commitfest.postgresql.org/49/5119/

Rebase on 5784a49. No other changes. I would consider this a bug
in features added for 17.


I agree with David that we need to set the tz explicitly as the 
JsonbValue struct maintains that separately.


However, in the attached version, I have added some comments and also, 
fixed some indentation.




I have pushed this.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: xid_wraparound tests intermittent failure.

2024-07-30 Thread Andrew Dunstan


On 2024-07-29 Mo 5:25 PM, Masahiko Sawada wrote:

I've attached the patch. Could you please test if the patch fixes the
instability you observed?

Since we turn off autovacuum on all three tests and we wait for
autovacuum to complete processing databases, these tests potentially
have a similar (but lower) risk. So I modified these tests to turn it
on so we can ensure the autovacuum runs periodically.


I assume you actually meant to remove the "autovacuum = off" in 
003_wraparound.pl. With that change in your patch I retried my test, but on iteration 100 
out of 100 it failed on test 002_limits.pl.


I think we need to remove the "autovacuum = off' also in 002_limits.pl
as it waits for autovacuum to process both template0 and template1
databases. Just to be clear, the failure happened even without
"autovacuum = off"?


The attached patch, a slight modification of yours, removes "autovacuum
= off" for all three tests, and given that a set of 200 runs was clean
for me.

Oh I missed that I left "autovacuum = off' for some reason in 002
test. Thank you for attaching the patch, it looks good to me.



Thanks, pushed.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: why is pg_upgrade's regression run so slow?

2024-07-29 Thread Andrew Dunstan



On 2024-07-29 Mo 4:00 AM, Alexander Lakhin wrote:

Hello Andrew,

28.07.2024 22:43, Andrew Dunstan wrote:


Maybe, IDK. Meanwhile, I disabled "debug_parallel_query = regress" on 
HEAD for fairywren and drongo -  fairwren has just gone green, and I 
expect drongo will when it reports in a few hours.


I'm at a loss for an explanation.



I'm observing the same here (using a Windows 10 VM).

With no TEMP_CONFIG set, `meson test` gives me these numbers:
test: postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade
duration: 72.34s

test: postgresql:regress / regress/regress
duration: 41.98s

But with debug_parallel_query=regress in TEMP_CONFIG, I see:
test: postgresql:regress / regress/regress
duration: 50.08s

test: postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade
duration: 398.45s

With log_min_messages=DEBUG2 added to TEMP_CONFIG, I can see how many
parallel workers were started during the test:
...\postgresql\build>grep "starting background worker process" -r 
testrun/pg_upgrade | wc -l

17532

And another interesting fact is that TEMP_CONFIG is apparently ignored by
`meson test regress/regress`. For example, with temp.config containing
invalid settings, `meson test pg_upgrade/002_pg_upgrade` fails, but
`meson test regress/regress` passes just fine.




Well, that last fact explains the discrepancy I originally complained 
about. How the heck did that happen? It looks like we just ignored its 
use in Makefile.global.in :-(



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: why is pg_upgrade's regression run so slow?

2024-07-28 Thread Andrew Dunstan



On 2024-07-27 Sa 6:48 PM, Tom Lane wrote:

Andrew Dunstan  writes:

On 2024-07-27 Sa 10:20 AM, Tom Lane wrote:

Just to add some more fuel to the fire: I do *not* observe such an
effect on my own animals.

The culprit appears to be meson. When I tested running crake with
"using_meson => 0" I got results in line with yours.

Interesting.  Maybe meson is over-aggressively trying to run these
test suites in parallel?




Maybe, IDK. Meanwhile, I disabled "debug_parallel_query = regress" on 
HEAD for fairywren and drongo -  fairwren has just gone green, and I 
expect drongo will when it reports in a few hours.


I'm at a loss for an explanation.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: why is pg_upgrade's regression run so slow?

2024-07-27 Thread Andrew Dunstan



On 2024-07-27 Sa 10:20 AM, Tom Lane wrote:

Andrew Dunstan  writes:

As part of its 002_pgupgrade.pl, the pg_upgrade tests do a rerun of the
normal regression tests. That in itself is sad, but what concerns me
here is why it's so much slower than the regular run? This is apparent
everywhere (e.g. on crake the standard run takes about 30 to 90 s, but
pg_upgrade's run takes 5 minutes or more).

Just to add some more fuel to the fire: I do *not* observe such an
effect on my own animals.  For instance, sifaka's last run shows
00:09 for install-check-C and the same (within rounding error)
for the regression test step in 002_pgupgrade; on the much slower
mamba, the last run took 07:33 for install-check-C and 07:40 for the
002_pgupgrade regression test step.

I'm still using the makefile-based build, and I'm not using
debug_parallel_query, but it doesn't make a lot of sense to me
that either of those things should affect this.

Looking at crake's last run, there are other oddities: why does
the "check" step take 00:24 while "install-check-en_US.utf8" (which
should be doing strictly less work) takes 01:00?  Again, there are
not similar discrepancies on my animals.  Are you sure there's not
background load on the machine?





Quite sure. Running crake and koel all it does. It's Fedora 40 running 
on bare metal, a Lenovo Y70 with an Intel Core i7-4720HQ CPU @ 2.60GHz 
and a Samsung SSD.


The culprit appears to be meson. When I tested running crake with 
"using_meson => 0" I got results in line with yours. The regression test 
times were consistent, including the installcheck tests.  That's 
especially ugly on Windows as we don't have any alternative way of 
running, and the results are so much more catastrophic. 
"debug_parallel_query" didn't seem to matter.



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: xid_wraparound tests intermittent failure.

2024-07-27 Thread Andrew Dunstan


On 2024-07-26 Fr 1:46 PM, Masahiko Sawada wrote:

On Thu, Jul 25, 2024 at 6:52 PM Andrew Dunstan  wrote:


On 2024-07-25 Th 3:40 PM, Masahiko Sawada wrote:

On Thu, Jul 25, 2024 at 11:06 AM Masahiko Sawada  wrote:

On Thu, Jul 25, 2024 at 10:56 AM Andrew Dunstan  wrote:

On 2024-07-23 Tu 6:59 PM, Masahiko Sawada wrote:

See 
<https://bitbucket.org/adunstan/rotfang-fdw/downloads/xid-wraparound-result.tar.bz2>


The failure logs are from a run where both tests 1 and 2 failed.

Thank you for sharing the logs.

I think that the problem seems to match what Alexander Lakhin
mentioned[1]. Probably we can fix such a race condition somehow but
I'm not sure it's worth it as setting autovacuum = off and
autovacuum_max_workers = 1 (or a low number) is an extremely rare
case. I think it would be better to stabilize these tests. One idea is
to turn the autovacuum GUC parameter on while setting
autovacuum_enabled = off for each table. That way, we can ensure that
autovacuum workers are launched. And I think it seems to align real
use cases.

Regards,

[1] 
https://www.postgresql.org/message-id/02373ec3-50c6-df5a-0d65-5b9b1c0c86d6%40gmail.com


OK, do you want to propose a patch?

Yes, I'll prepare and share it soon.

I've attached the patch. Could you please test if the patch fixes the
instability you observed?

Since we turn off autovacuum on all three tests and we wait for
autovacuum to complete processing databases, these tests potentially
have a similar (but lower) risk. So I modified these tests to turn it
on so we can ensure the autovacuum runs periodically.


I assume you actually meant to remove the "autovacuum = off" in 
003_wraparound.pl. With that change in your patch I retried my test, but on iteration 100 
out of 100 it failed on test 002_limits.pl.


I think we need to remove the "autovacuum = off' also in 002_limits.pl
as it waits for autovacuum to process both template0 and template1
databases. Just to be clear, the failure happened even without
"autovacuum = off"?



The attached patch, a slight modification of yours, removes "autovacuum 
= off" for all three tests, and given that a set of 200 runs was clean 
for me.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/src/test/modules/xid_wraparound/t/001_emergency_vacuum.pl b/src/test/modules/xid_wraparound/t/001_emergency_vacuum.pl
index 37550b67a4..2692b35f34 100644
--- a/src/test/modules/xid_wraparound/t/001_emergency_vacuum.pl
+++ b/src/test/modules/xid_wraparound/t/001_emergency_vacuum.pl
@@ -18,7 +18,6 @@ my $node = PostgreSQL::Test::Cluster->new('main');
 $node->init;
 $node->append_conf(
 	'postgresql.conf', qq[
-autovacuum = off # run autovacuum only when to anti wraparound
 autovacuum_naptime = 1s
 # so it's easier to verify the order of operations
 autovacuum_max_workers = 1
@@ -27,23 +26,25 @@ log_autovacuum_min_duration = 0
 $node->start;
 $node->safe_psql('postgres', 'CREATE EXTENSION xid_wraparound');
 
-# Create tables for a few different test scenarios
+# Create tables for a few different test scenarios. We disable autovacuum
+# on these tables to run it only to prevent wraparound.
 $node->safe_psql(
 	'postgres', qq[
-CREATE TABLE large(id serial primary key, data text, filler text default repeat(random()::text, 10));
+CREATE TABLE large(id serial primary key, data text, filler text default repeat(random()::text, 10))
+   WITH (autovacuum_enabled = off);
 INSERT INTO large(data) SELECT generate_series(1,3);
 
-CREATE TABLE large_trunc(id serial primary key, data text, filler text default repeat(random()::text, 10));
+CREATE TABLE large_trunc(id serial primary key, data text, filler text default repeat(random()::text, 10))
+   WITH (autovacuum_enabled = off);
 INSERT INTO large_trunc(data) SELECT generate_series(1,3);
 
-CREATE TABLE small(id serial primary key, data text, filler text default repeat(random()::text, 10));
+CREATE TABLE small(id serial primary key, data text, filler text default repeat(random()::text, 10))
+   WITH (autovacuum_enabled = off);
 INSERT INTO small(data) SELECT generate_series(1,15000);
 
-CREATE TABLE small_trunc(id serial primary key, data text, filler text default repeat(random()::text, 10));
+CREATE TABLE small_trunc(id serial primary key, data text, filler text default repeat(random()::text, 10))
+   WITH (autovacuum_enabled = off);
 INSERT INTO small_trunc(data) SELECT generate_series(1,15000);
-
-CREATE TABLE autovacuum_disabled(id serial primary key, data text) WITH (autovacuum_enabled=false);
-INSERT INTO autovacuum_disabled(data) SELECT generate_series(1,1000);
 ]);
 
 # Bump the query timeout to avoid false negatives on slow test systems.
@@ -63,7 +64,6 @@ $background_psql->query_safe(
 	DELETE FROM large_trunc WHERE id > 1;
 	DELETE FROM small WHERE id % 2 = 0;
 	DELETE FROM small_trunc WHERE id > 1000;

why is pg_upgrade's regression run so slow?

2024-07-27 Thread Andrew Dunstan



As part of its 002_pgupgrade.pl, the pg_upgrade tests do a rerun of the 
normal regression tests. That in itself is sad, but what concerns me 
here is why it's so much slower than the regular run? This is apparent 
everywhere (e.g. on crake the standard run takes about 30 to 90 s, but 
pg_upgrade's run takes 5 minutes or more). On Windows, it's 
catastrophic, and only hasn't been noticed because the buildfarm client 
wasn't counting a timeout as a failure. That was an error on my part and 
I have switched a few of my machines to code that checks more robustly 
for failure of meson tests - specifically by looking for the absence of 
test.success rather than the presence of test.fail. That means that 
drongo and fairywren are getting timeout errors. e.g. on the latest run 
on fairywren, the regular regression run took 226s, but pg_upgrade's run 
of what should be the same set of tests took 2418s. What the heck is 
going on here? Is it because there are the concurrent tests running? 
That doesn't seem enough to make the tests run more than 10 times as long.


I have a strong suspicion this is exacerbated by "debug_parallel_query = 
regress", especially since the tests run much faster on REL_17_STABLE 
where I am not setting that, but that can't be the whole explanation, 
since that setting should apply to both sets of tests.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Extension using Meson as build system

2024-07-26 Thread Andrew Dunstan



On 2024-07-26 Fr 11:06 AM, Peter Eisentraut wrote:

On 30.06.24 15:17, Junwang Zhao wrote:

Is there any extension that uses meson as build systems?
I'm starting a extension project that written in c++, cmake is my
initial choice as the build system, but since PostgreSQL has adopted
Meson, so I'm wondering if there is any extension that also uses
meson that I can reference.


I wrote a little demo:

https://github.com/petere/plsh/blob/meson/meson.build




nifty!


cheers


andew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: fairywren timeout failures on the pg_amcheck/005_opclass_damage test

2024-07-26 Thread Andrew Dunstan



On 2024-07-26 Fr 7:00 AM, Alexander Lakhin wrote:

25.07.2024 15:00, Alexander Lakhin wrote:



The other question is: why is 005_opclass_damage taking so much time 
there?

...
$ make -s check -C src/bin/pg_amcheck/ PROVE_TESTS="t/005*" 
PROVE_FLAGS="--timer"
[11:11:53] t/005_opclass_damage.pl .. ok 1370 ms ( 0.00 usr 0.00 
sys +  0.10 cusr  0.07 csys =  0.17 CPU)


$ echo "debug_parallel_query = regress" >/tmp/extra.config
$ TEMP_CONFIG=/tmp/extra.config make -s check -C src/bin/pg_amcheck/ 
PROVE_TESTS="t/005*" PROVE_FLAGS="--timer"
[11:12:46] t/005_opclass_damage.pl .. ok    40854 ms ( 0.00 usr 0.00 
sys +  0.10 cusr  0.10 csys =  0.20 CPU)


...
So maybe at least this test should be improved for testing with
debug_parallel_query enabled, if such active use of parallel workers by
pg_amcheck can't be an issue to end users?



When running this test with "log_min_messages = DEBUG2" in my 
extra.config,

I see thousands of the following messages in the test log:
2024-07-26 09:32:54.544 UTC [2572189:46] DEBUG:  postmaster received 
pmsignal signal
2024-07-26 09:32:54.544 UTC [2572189:47] DEBUG:  registering 
background worker "parallel worker for PID 2572197"
2024-07-26 09:32:54.544 UTC [2572189:48] DEBUG:  starting background 
worker process "parallel worker for PID 2572197"
2024-07-26 09:32:54.547 UTC [2572189:49] DEBUG:  unregistering 
background worker "parallel worker for PID 2572197"
2024-07-26 09:32:54.547 UTC [2572189:50] DEBUG:  background worker 
"parallel worker" (PID 2572205) exited with exit code 0
2024-07-26 09:32:54.547 UTC [2572189:51] DEBUG:  postmaster received 
pmsignal signal
2024-07-26 09:32:54.547 UTC [2572189:52] DEBUG:  registering 
background worker "parallel worker for PID 2572197"
2024-07-26 09:32:54.547 UTC [2572189:53] DEBUG:  starting background 
worker process "parallel worker for PID 2572197"
2024-07-26 09:32:54.549 UTC [2572189:54] DEBUG:  unregistering 
background worker "parallel worker for PID 2572197"
2024-07-26 09:32:54.549 UTC [2572189:55] DEBUG:  background worker 
"parallel worker" (PID 2572206) exited with exit code 0

...

grep ' registering background worker' \
 src/bin/pg_amcheck/tmp_check/log/005_opclass_damage_test.log | wc -l
15669

So this test launches more than 15000 processes when debug_parallel_query
is enabled.

As far as I can see, this is happening because of the "PARALLEL UNSAFE"
marking is ignored when the function is called by CREATE INDEX/amcheck.

Namely, with a function defined as:
    CREATE FUNCTION int4_asc_cmp (a int4, b int4) RETURNS int LANGUAGE 
sql AS $$
    SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN 1 ELSE -1 
END; $$;


SELECT int4_asc_cmp(1, 2);
executed without parallel workers. Whilst when it's used by an index:
CREATE OPERATOR CLASS int4_fickle_ops FOR TYPE int4 USING btree AS
...
OPERATOR 5 > (int4, int4), FUNCTION 1 int4_asc_cmp(int4, int4);

INSERT INTO int4tbl (SELECT * FROM generate_series(1,1000) gs);

CREATE INDEX fickleidx ON int4tbl USING btree (i int4_fickle_ops);
launches 1000 parallel workers.

(This is reminiscent of bug #18314.)

One way to workaround this is to disable debug_parallel_query in the test
and another I find possible is to set max_parallel_workers = 0.




But wouldn't either of those just be masking the problem?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: xid_wraparound tests intermittent failure.

2024-07-25 Thread Andrew Dunstan


On 2024-07-25 Th 3:40 PM, Masahiko Sawada wrote:

On Thu, Jul 25, 2024 at 11:06 AM Masahiko Sawada  wrote:

On Thu, Jul 25, 2024 at 10:56 AM Andrew Dunstan  wrote:


On 2024-07-23 Tu 6:59 PM, Masahiko Sawada wrote:

See<https://bitbucket.org/adunstan/rotfang-fdw/downloads/xid-wraparound-result.tar.bz2>


The failure logs are from a run where both tests 1 and 2 failed.

Thank you for sharing the logs.

I think that the problem seems to match what Alexander Lakhin
mentioned[1]. Probably we can fix such a race condition somehow but
I'm not sure it's worth it as setting autovacuum = off and
autovacuum_max_workers = 1 (or a low number) is an extremely rare
case. I think it would be better to stabilize these tests. One idea is
to turn the autovacuum GUC parameter on while setting
autovacuum_enabled = off for each table. That way, we can ensure that
autovacuum workers are launched. And I think it seems to align real
use cases.

Regards,

[1]https://www.postgresql.org/message-id/02373ec3-50c6-df5a-0d65-5b9b1c0c86d6%40gmail.com


OK, do you want to propose a patch?


Yes, I'll prepare and share it soon.


I've attached the patch. Could you please test if the patch fixes the
instability you observed?

Since we turn off autovacuum on all three tests and we wait for
autovacuum to complete processing databases, these tests potentially
have a similar (but lower) risk. So I modified these tests to turn it
on so we can ensure the autovacuum runs periodically.



I assume you actually meant to remove the "autovacuum = off" in 
003_wraparound.pl. With that change in your patch I retried my test, but 
on iteration 100 out of 100 it failed on test 002_limits.pl.


You can see the logs at 
<https://f001.backblazeb2.com/file/net-dunslane-public/002_limits-failure-log.tar.bz2>


cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Recent 027_streaming_regress.pl hangs

2024-07-25 Thread Andrew Dunstan


On 2024-07-25 Th 5:14 PM, Tom Lane wrote:

I wrote:

I'm confused by crake's buildfarm logs.  AFAICS it is not running
recovery-check at all in most of the runs; at least there is no
mention of that step, for example here:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2024-07-25%2013%3A27%3A02

Oh, I see it: the log file that is called recovery-check in a
failing run is called misc-check if successful.  That seems
mighty bizarre, and it's not how my own animals behave.
Something weird about the meson code path, perhaps?



Yes, it was discussed some time ago. As suggested by Andres, we run the 
meson test suite more or less all together (in effect like "make 
checkworld" but without the main regression suite, which is run on its 
own first), rather than in the separate (and serialized) way we do with 
the configure/make tests. That results in significant speedup. If the 
tests fail we report the failure as happening at the first failure we 
encounter, which is possibly less than ideal, but I haven't got a better 
idea.





Anyway, in this successful run:

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=crake&dt=2024-07-25%2018%3A57%3A02&stg=misc-check

here are some salient test timings:

   1/297 postgresql:pg_upgrade / pg_upgrade/001_basic   
 OK0.18s   9 subtests passed
   2/297 postgresql:pg_upgrade / pg_upgrade/003_logical_slots   
 OK   15.95s   12 subtests passed
   3/297 postgresql:pg_upgrade / pg_upgrade/004_subscription
 OK   16.29s   14 subtests passed
  17/297 postgresql:isolation / isolation/isolation 
 OK   71.60s   119 subtests passed
  41/297 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade  
 OK  169.13s   18 subtests passed
140/297 postgresql:initdb / initdb/001_initdb   
OK   41.34s   52 subtests passed
170/297 postgresql:recovery / recovery/027_stream_regress   
OK  469.49s   9 subtests passed

while in the next, failing run

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=crake&dt=2024-07-25%2020%3A18%3A05&stg=recovery-check

the same tests took:

   1/297 postgresql:pg_upgrade / pg_upgrade/001_basic   
 OK0.22s   9 subtests passed
   2/297 postgresql:pg_upgrade / pg_upgrade/003_logical_slots   
 OK   56.62s   12 subtests passed
   3/297 postgresql:pg_upgrade / pg_upgrade/004_subscription
 OK   71.92s   14 subtests passed
  21/297 postgresql:isolation / isolation/isolation 
 OK  299.12s   119 subtests passed
  31/297 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade  
 OK  344.42s   18 subtests passed
159/297 postgresql:initdb / initdb/001_initdb   
OK  344.46s   52 subtests passed
162/297 postgresql:recovery / recovery/027_stream_regress   
ERROR   840.84s   exit status 29

Based on this, it seems fairly likely that crake is simply timing out
as a consequence of intermittent heavy background activity.




The latest failure is this:


Waiting for replication conn standby_1's replay_lsn to pass 2/88E4E260 on 
primary
[16:40:29.481](208.545s) # poll_query_until timed out executing this query:
# SELECT '2/88E4E260' <= replay_lsn AND state = 'streaming'
#  FROM pg_catalog.pg_stat_replication
#  WHERE application_name IN ('standby_1', 'walreceiver')
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
timed out waiting for catchup at 
/home/andrew/bf/root/HEAD/pgsql/src/test/recovery/t/027_stream_regress.pl line 
103.


Maybe it's a case where the system is overloaded, I dunno. I wouldn't bet my 
house on it. Pretty much nothing else runs on this machine.

I have added a mild throttle to the buildfarm config so it doesn't try 
to run every branch at once. Maybe I also need to bring down the number 
or meson jobs too? But I suspect there's something deeper. Prior to the 
failure of this test 10 days ago it hadn't failed in a very long time. 
The OS was upgraded a month ago. Eight or so days ago I changed 
PG_TEST_EXTRA. I can't think of anything else.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Recent 027_streaming_regress.pl hangs

2024-07-25 Thread Andrew Dunstan



On 2024-07-25 Th 12:00 AM, Alexander Lakhin wrote:

Hello Andrew,

04.06.2024 13:00, Alexander Lakhin wrote:

Also, 027_stream_regress still fails due to the same reason:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2024-05-22%2021%3A55%3A03 

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2024-05-22%2021%3A54%3A50 


(It's remarkable that these two animals failed at the same time.)



It looks like crake is failing now due to other reasons (not just
concurrency) — it produced 10+ failures of the
027_stream_regress test starting from July, 9.

The first such failure on REL_16_STABLE was [1], and that was the 
first run

with 'PG_TEST_EXTRA' => '... wal_consistency_checking'.

There is one known issue related to wal_consistency_checking [2], but I
see no "incorrect resource manager data checksum" in the failure log...

Moreover, the first such failure on REL_17_STABLE was [3], but that run
was performed without wal_consistency_checking, as far as I can see.

Can that failure be also related to the OS upgrade (I see that back in
June crake was running on Fedora 39, but now it's running on Fedora 40)?

So maybe we have two factors combined or there is another one?

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2024-07-17%2014%3A56%3A09
[2] 
https://www.postgresql.org/message-id/055bb33c-bccc-bc1d-c2f8-859853444...@gmail.com
[3] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2024-07-09%2021%3A37%3A04




Unlikely. The change in OS version was on June 17, more than a month ago.

But yes we do seem to have seen a lot of recovery_check failures on 
crake in the last 8 days, which is roughly when I changed PG_TEST_EXTRA 
to get more coverage.



cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: xid_wraparound tests intermittent failure.

2024-07-25 Thread Andrew Dunstan


On 2024-07-23 Tu 6:59 PM, Masahiko Sawada wrote:

See<https://bitbucket.org/adunstan/rotfang-fdw/downloads/xid-wraparound-result.tar.bz2>


The failure logs are from a run where both tests 1 and 2 failed.


Thank you for sharing the logs.

I think that the problem seems to match what Alexander Lakhin
mentioned[1]. Probably we can fix such a race condition somehow but
I'm not sure it's worth it as setting autovacuum = off and
autovacuum_max_workers = 1 (or a low number) is an extremely rare
case. I think it would be better to stabilize these tests. One idea is
to turn the autovacuum GUC parameter on while setting
autovacuum_enabled = off for each table. That way, we can ensure that
autovacuum workers are launched. And I think it seems to align real
use cases.

Regards,

[1]https://www.postgresql.org/message-id/02373ec3-50c6-df5a-0d65-5b9b1c0c86d6%40gmail.com



OK, do you want to propose a patch?

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Sporadic connection-setup-related test failures on Cygwin in v15-

2024-07-25 Thread Andrew Dunstan


On 2024-07-24 We 4:58 PM, Thomas Munro wrote:

On Thu, Jul 25, 2024 at 1:00 AM Alexander Lakhin  wrote:

The important fact here is that this failure is not reproduced after
7389aad63 (in v16), so it seems that it's somehow related to signal
processing. Given that, I'm inclined to stop here, without digging deeper,
at least until there are plans to backport that fix or something...

+1.  I'm not planning to back-patch that work.  Perhaps lorikeet
could stop testing releases < 16?  They don't work and it's not our
bug[1].  We decided not to drop Cygwin support[2], but I don't think
we're learning anything from investigating that noise in the
known-broken branches.



Sure, it can. I've made that change.


cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: fairywren timeout failures on the pg_amcheck/005_opclass_damage test

2024-07-25 Thread Andrew Dunstan



On 2024-07-25 Th 8:00 AM, Alexander Lakhin wrote:

Hello hackers,

Please take a look at today's failure [1], that raises several questions
at once:
117/244 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade 
TIMEOUT    3001.48s   exit status 1
180/244 postgresql:pg_amcheck / pg_amcheck/005_opclass_damage 
TIMEOUT    3001.43s   exit status 1


Ok: 227
Expected Fail:  0
Fail:   0
Unexpected Pass:    0
Skipped:    15
Timeout:    2

But looking at the previous test run [2], marked 'OK', I can see almost
the same:
115/244 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade 
TIMEOUT    3001.54s   exit status 1
176/244 postgresql:pg_amcheck / pg_amcheck/005_opclass_damage 
TIMEOUT    3001.26s   exit status 1


Ok: 227
Expected Fail:  0
Fail:   0
Unexpected Pass:    0
Skipped:    15
Timeout:    2

So it's not clear to me, why is the latter test run considered failed
unlike the former?



Yesterday I changed the way we detect failure in the buildfarm client, 
and pushed that to fairywren (and a few more of my animals). Previously 
it did not consider a timeout to be a failure, but now it does. I'm 
going to release that soon.



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: xid_wraparound tests intermittent failure.

2024-07-23 Thread Andrew Dunstan



On 2024-07-22 Mo 10:11 PM, Tom Lane wrote:

Andrew Dunstan  writes:

On 2024-07-22 Mo 12:46 PM, Tom Lane wrote:

Masahiko Sawada  writes:

Looking at dodo's failures, it seems that while it passes
module-xid_wraparound-check, all failures happened only during
testmodules-install-check-C. Can we check the server logs written
during xid_wraparound test in testmodules-install-check-C?

Oooh, that is indeed an interesting observation.  There are enough
examples now that it's hard to dismiss it as chance, but why would
the two runs be different?

It's not deterministic.

Perhaps.  I tried "make check" on mamba's host and got exactly the
same failures as with "make installcheck", which counts in favor of
dodo's results being just luck.  Still, dodo has now shown 11 failures
in "make installcheck" and zero in "make check", so it's getting hard
to credit that there's no difference.





Yeah, I agree that's perplexing. That step doesn't run with "make -j 
nn", so it's a bit hard to see why it should get different results from 
one run rather than the other.  The only thing that's different is that 
there's another postgres instance running. Maybe that's just enough to 
slow the test down? After all, this is an RPi.



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: xid_wraparound tests intermittent failure.

2024-07-23 Thread Andrew Dunstan


On 2024-07-22 Mo 9:29 PM, Masahiko Sawada wrote:

On Mon, Jul 22, 2024 at 12:53 PM Andrew Dunstan  wrote:


On 2024-07-22 Mo 12:46 PM, Tom Lane wrote:

Masahiko Sawada  writes:

Looking at dodo's failures, it seems that while it passes
module-xid_wraparound-check, all failures happened only during
testmodules-install-check-C. Can we check the server logs written
during xid_wraparound test in testmodules-install-check-C?

Oooh, that is indeed an interesting observation.  There are enough
examples now that it's hard to dismiss it as chance, but why would
the two runs be different?


It's not deterministic.

I tested the theory that it was some other concurrent tests causing the issue, 
but that didn't wash. Here's what I did:

 for f in `seq 1 100`
   do echo iteration = $f
   meson test --suite xid_wraparound || break
 done

It took until iteration 6 to get an error. I don't think my Ubuntu instance is especially 
slow. e.g. "meson compile" normally takes a handful of seconds. Maybe 
concurrent tests make it more likely, but they can't be the only cause.

Could you provide server logs in both OK and NG tests? I want to see
if there's a difference in the rate at which tables are vacuumed.



See 
<https://bitbucket.org/adunstan/rotfang-fdw/downloads/xid-wraparound-result.tar.bz2>



The failure logs are from a run where both tests 1 and 2 failed.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: xid_wraparound tests intermittent failure.

2024-07-22 Thread Andrew Dunstan


On 2024-07-22 Mo 12:46 PM, Tom Lane wrote:

Masahiko Sawada  writes:

Looking at dodo's failures, it seems that while it passes
module-xid_wraparound-check, all failures happened only during
testmodules-install-check-C. Can we check the server logs written
during xid_wraparound test in testmodules-install-check-C?

Oooh, that is indeed an interesting observation.  There are enough
examples now that it's hard to dismiss it as chance, but why would
the two runs be different?



It's not deterministic.

I tested the theory that it was some other concurrent tests causing the 
issue, but that didn't wash. Here's what I did:


    for f in `seq 1 100`
  do echo iteration = $f
  meson test --suite xid_wraparound || break
    done

It took until iteration 6 to get an error. I don't think my Ubuntu 
instance is especially slow. e.g. "meson compile" normally takes a 
handful of seconds. Maybe concurrent tests make it more likely, but they 
can't be the only cause.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Enhance pg_dump multi-threaded streaming (WAS: Re: filesystem full during vacuum - space recovery issues)

2024-07-22 Thread Andrew Dunstan



On 2024-07-19 Fr 9:46 AM, Thomas Simpson wrote:


Hi Scott,

I realize some of the background was snipped on what I sent to the 
hacker list, I'll try to fill in the details.


Short background is very large database ran out of space during vacuum 
full taking down the server.  There is a replica which was applying 
the WALs and so it too ran out of space.  On restart after clearing 
some space, the database came back up but left over the in-progress 
rebuild files.  I've cleared that replica and am using it as my 
rebuild target just now.


Trying to identify the 'orphan' files and move them away always led to 
the database spotting the supposedly unused files having gone and 
refusing to start, so I had no successful way to clean up and get 
space back.


Last resort after discussion is pg_dumpall & reload.  I'm doing this 
via a network pipe (netcat) as I do not have the vast amount of 
storage necessary for the dump file to be stored (in any format).


On 19-Jul-2024 09:26, Scott Ribe wrote:

Do you actually have 100G networking between the nodes? Because if not, a 
single CPU should be able to saturate 10G.
Servers connect via 10G WAN; sending is not the issue, it's 
application of the incoming stream on the destination which is 
bottlenecked.

Likewise the receiving end would need disk capable of keeping up. Which brings 
up the question, why not write to disk, but directly to the destination rather 
than write locally then copy?

In this case, it's not a local write, it's piped via netcat.

Do you require dump-reload because of suspected corruption? That's a tough one. 
But if not, if the goal is just to get up and running on a new server, why not 
pg_basebackup, streaming replica, promote? That depends on the level of data 
modification activity being low enough that pg_basebackup can keep up with WAL 
as it's generated and apply it faster than new WAL comes in, but given that 
your server is currently keeping up with writing that much WAL and flushing 
that many changes, seems likely it would keep up as long as the network 
connection is fast enough. Anyway, in that scenario, you don't need to care how 
long pg_basebackup takes.

If you do need a dump/reload because of suspected corruption, the only thing I 
can think of is something like doing it a table at a time--partitioning would 
help here, if practical.


The basebackup is, to the best of my understanding, essentially just 
copying the database files.  Since the failed vacuum has left extra 
files, my expectation is these too would be copied, leaving me in the 
same position I started in.  If I'm wrong, please tell me as that 
would be vastly quicker - it is how I originally set up the replica 
and it took only a few hours on the 10G link.


The inability to get a clean start if I move any files out the way 
leads me to be concerned for some underlying corruption/issue and the 
recommendation earlier in the discussion was opt for dump/reload as 
the fail-safe.


Resigned to my fate, my thoughts were to see if there is a way to 
improve the dump-reload approach for the future.  Since dump-reload is 
the ultimate upgrade suggestion in the documentation, it seems 
worthwhile to see if there is a way to improve the performance of that 
especially as very large databases like mine are a thing with 
PostgreSQL.  From a quick review of pg_dump.c (I'm no expert on it 
obviously), it feels like it's already doing most of what needs done 
and the addition is some sort of multi-thread coordination with a 
restore client to ensure each thread can successfully complete each 
task it has before accepting more work.  I realize that's actually 
difficult to implement.





There is a plan for a non-text mode for pg_dumpall. I have started work 
on it, and hope to have a WIP patch in a month or so. It's not my 
intention to parallelize it for the first cut, but it could definitely 
be parallelizable in future. However, it will require writing to disk 
somewhere, albeit that the data will be compressed. It's well nigh 
impossible to parallelize text format dumps.


Restoration of custom and directory format dumps has long been 
parallelized. Parallel dumps require directory format, and so will 
non-text pg_dumpall.



cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Windows default locale vs initdb

2024-07-22 Thread Andrew Dunstan



On 2024-07-21 Su 10:51 PM, Thomas Munro wrote:

Ertan Küçükoglu offered to try to review and test this, so here's a rebase.

Some notes:

* it turned out that the Turkish i/I test problem I mentioned earlier
in this thread[1] was just always broken on Windows, we just didn't
ever test with UTF-8 before Meson took over; it's skipped now, see
commit cff4e5a3[2]

* it seems that you can't actually put encodings like .1252 on the end
(.UTF-8 must be a special case); I don't know if we should look into a
better UTF-8 mode for modern Windows, but that'd be a separate project

* this patch only benefits people who run initdb.exe without
explicitly specifying a locale; probably a good number of real systems
in the wild actually use EDB's graphical installer which initialises a
cluster and has its own way of choosing the locale, as discussed in
Ertan's thread[3]

[1] 
https://www.postgresql.org/message-id/flat/CA%2BhUKGJZskvCh%3DQm75UkHrY6c1QZUuC92Po9rponj1BbLmcMEA%40mail.gmail.com#3a00c08214a4285d2f3c4297b0ac2be2
[2] https://github.com/postgres/postgres/commit/cff4e5a3
[3] 
https://www.postgresql.org/message-id/flat/CAH2i4ydECHZPxEBB7gtRG3vROv7a0d3tqAFXzcJWQ9hRsc1znQ%40mail.gmail.com



I have an environment I can use for testing. But what exactly am I 
testing? :-) Install a few "problem" language/region settings, switch 
the system and ensure initdb runs ok?


Other than Turkish, which locales should I install?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: xid_wraparound tests intermittent failure.

2024-07-22 Thread Andrew Dunstan



On 2024-07-21 Su 4:08 PM, Alexander Lakhin wrote:

Hello,

21.07.2024 20:34, Tom Lane wrote:

Andrew Dunstan  writes:

I noticed this when working on the PostgreSQL::Test::Session project I
have in hand. All the tests pass except occasionally the xid_wraparound
tests fail. It's not always the same test script that fails either. I
tried everything but couldn't make the failure stop. So then I switched
out my patch so it's running on plain master and set things running 
in a

loop. Lo and behold it can be relied on to fail after only a few
iterations.

I have been noticing xid_wraparound failures in the buildfarm too.
They seemed quite infrequent, but it wasn't till just now that
I realized that xid_wraparound is not run by default.  (You have to
put "xid_wraparound" in PG_TEST_EXTRA to enable it.)  AFAICS the
only buildfarm animals that have enabled it are dodo and perentie.
dodo is failing this test fairly often:

https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=dodo&br=HEAD


I think this failure is counted at [1]. Please look at the linked message
[2], where I described what makes the test fail.

[1] 
https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures#001_emergency_vacuum.pl_fails_to_wait_for_datfrozenxid_advancing
[2] 
https://www.postgresql.org/message-id/5811175c-1a31-4869-032f-7af5e3e45...@gmail.com



It's sad nothing has happened abut this for 2 months.

There's no point in having unreliable tests. What's not 100% clear to me 
is whether this failure indicates a badly formulated test or the test is 
correct and has identified an underlying bug.


Regarding the point in [2] about the test being run twice in buildfarm 
clients, I think we should mark the module as NO_INSTALLCHECK in the 
Makefile.



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: xid_wraparound tests intermittent failure.

2024-07-21 Thread Andrew Dunstan



On 2024-07-21 Su 1:34 PM, Tom Lane wrote:

Andrew Dunstan  writes:

I noticed this when working on the PostgreSQL::Test::Session project I
have in hand. All the tests pass except occasionally the xid_wraparound
tests fail. It's not always the same test script that fails either. I
tried everything but couldn't make the failure stop. So then I switched
out my patch so it's running on plain master and set things running in a
loop. Lo and behold it can be relied on to fail after only a few
iterations.

I have been noticing xid_wraparound failures in the buildfarm too.
They seemed quite infrequent, but it wasn't till just now that
I realized that xid_wraparound is not run by default.  (You have to
put "xid_wraparound" in PG_TEST_EXTRA to enable it.)  AFAICS the
only buildfarm animals that have enabled it are dodo and perentie.
dodo is failing this test fairly often:

https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=dodo&br=HEAD

perentie doesn't seem to be having a problem, but I will bet that
part of the reason is it's running with cranked-up timeouts:

'build_env' => {
 'PG_TEST_EXTRA' => 'xid_wraparound',
 'PG_TEST_TIMEOUT_DEFAULT' => '360'
   },

One thing that seems quite interesting is that the test seems to
take about 10 minutes when successful on dodo, but when it fails
it's twice that.  Why the instability?  (Perhaps dodo has highly
variable background load, and the thing simply times out in some
runs but not others?)

Locally, I've not managed to reproduce the failure yet; so perhaps
there is some platform dependency.  What are you testing on?



Linux ub22arm 5.15.0-116-generic #126-Ubuntu SMP Mon Jul 1 10:08:40 UTC 
2024 aarch64 aarch64 aarch64 GNU/Linux


It's a VM running on UTM/Apple Silicon


cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com





xid_wraparound tests intermittent failure.

2024-07-21 Thread Andrew Dunstan



I noticed this when working on the PostgreSQL::Test::Session project I 
have in hand. All the tests pass except occasionally the xid_wraparound 
tests fail. It's not always the same test script that fails either. I 
tried everything but couldn't make the failure stop. So then I switched 
out my patch so it's running on plain master and set things running in a 
loop. Lo and behold it can be relied on to fail after only a few 
iterations.


In the latest iteration the failure looks like this


stderr:
# poll_query_until timed out executing this query:
#
# SELECT NOT EXISTS (
#   SELECT *
#   FROM pg_database
#   WHERE age(datfrozenxid) > 
current_setting('autovacuum_freeze_max_age')::int)

#
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
# Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 29 just after 1.

(test program exited with status code 29)
――


Summary of Failures:

295/295 postgresql:xid_wraparound / xid_wraparound/001_emergency_vacuum 
ERROR   211.76s   exit status 29




cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: meson vs windows perl

2024-07-20 Thread Andrew Dunstan


On 2024-05-28 Tu 6:13 PM, Andres Freund wrote:

Hi,

On 2024-04-05 16:12:12 -0400, Andrew Dunstan wrote:

OK, this has been fixed and checked. The attached is what I propose.

The perl command is pretty hard to read. What about using python's shlex
module instead? Rough draft attached.  Still not very pretty, but seems easier
to read?

It'd be even better if we could just get perl to print out the flags in an
easier to parse way, but I couldn't immediately see a way.




Thanks for looking.

The attached should be easier to read. The perl package similar to shlex 
is Text::ParseWords, which is already a requirement.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/meson.build b/meson.build
index 5387bb6d5f..e244cbac92 100644
--- a/meson.build
+++ b/meson.build
@@ -993,20 +993,17 @@ if not perlopt.disabled()
 # Config's ccdlflags and ldflags.  (Those are the choices of those who
 # built the Perl installation, which are not necessarily appropriate
 # for building PostgreSQL.)
-ldopts = run_command(perl, '-MExtUtils::Embed', '-e', 'ldopts', check: true).stdout().strip()
-undesired = run_command(perl_conf_cmd, 'ccdlflags', check: true).stdout().split()
-undesired += run_command(perl_conf_cmd, 'ldflags', check: true).stdout().split()
+perl_ldopts = run_command(perl, '-e', '''
+use ExtUtils::Embed;
+use Text::ParseWords;
+# tell perl to suppress including these in ldopts
+*ExtUtils::Embed::_ldflags =*ExtUtils::Embed::_ccdlflags = sub { return ""; };
+# adding an argument to ldopts makes it return a value instead of printing
+# print one of these per line so splitting will preserve spaces in file names.
+print "$_\n" foreach shellwords(ldopts(undef));
+''',
+ check: true).stdout().strip().split('\n')
 
-perl_ldopts = []
-foreach ldopt : ldopts.split(' ')
-  if ldopt == '' or ldopt in undesired
-continue
-  endif
-
-  perl_ldopts += ldopt.strip('"')
-endforeach
-
-message('LDFLAGS recommended by perl: "@0@"'.format(ldopts))
 message('LDFLAGS for embedding perl: "@0@"'.format(' '.join(perl_ldopts)))
 
 perl_dep_int = declare_dependency(


Re: Using LibPq in TAP tests via FFI

2024-07-19 Thread Andrew Dunstan


On 2024-07-18 Th 6:51 PM, Thomas Munro wrote:

On Wed, Jul 17, 2024 at 2:27 AM Andrew Dunstan  wrote:

Here's the latest version of this patch. It removes all use of
background_psql(). Instead it uses libpq's async interface, which seems
to me far more robust. There is one remaining use of interactive_psql(),
but that's reasonable as it's used for testing psql itself.

This looks really nice!  Works on my local FBSD machine.



cool




I pushed it to CI, and mostly saw environmental problems unrelated to
the patch, but you might be interested in the ASAN failure visible in
the cores section:

https://cirrus-ci.com/task/6607915962859520

Unfortunately I can't see the interesting log messages, because it
detected that the logs were still being appended to and declined to
upload them.  I think that means there must be subprocesses not being
waited for somewhere?



I couldn't see anything obvious either.





I spent yesterday creating an XS wrapper for just the 19 libpq functions
used in Session.pm. It's pretty simple. I have it passing a very basic
test, but haven't tried plugging it into Session.pm yet.

Neat.  I guess the libpq FFI/XS piece looks the same to the rest of
the test framework outside that module.



Yeah, that's the idea.



  It does sound pretty
convenient if the patch just works™ on CI/BF without any environment
changes, which I assume must be doable because we already build XS
stuff in sr/pl/plperl.  Looking forward to trying that version.



Still working on it. Meanwhile, here's a new version. It has some 
cleanup and also tries to use Session objects instead of psql in simple 
cases for safe_psql().


cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com
diff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl
index 481e4dbe4f..f821f9 100644
--- a/contrib/amcheck/t/001_verify_heapam.pl
+++ b/contrib/amcheck/t/001_verify_heapam.pl
@@ -5,6 +5,7 @@ use strict;
 use warnings FATAL => 'all';
 
 use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Session;
 use PostgreSQL::Test::Utils;
 
 use Test::More;
@@ -18,7 +19,9 @@ $node = PostgreSQL::Test::Cluster->new('test');
 $node->init;
 $node->append_conf('postgresql.conf', 'autovacuum=off');
 $node->start;
-$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+my $session = PostgreSQL::Test::Session->new(node => $node);
+
+$session->do(q(CREATE EXTENSION amcheck));
 
 #
 # Check a table with data loaded but no corruption, freezing, etc.
@@ -49,7 +52,7 @@ detects_heap_corruption(
 # Check a corrupt table with all-frozen data
 #
 fresh_test_table('test');
-$node->safe_psql('postgres', q(VACUUM (FREEZE, DISABLE_PAGE_SKIPPING) test));
+$session->do(q(VACUUM (FREEZE, DISABLE_PAGE_SKIPPING) test));
 detects_no_corruption("verify_heapam('test')",
 	"all-frozen not corrupted table");
 corrupt_first_page('test');
@@ -81,7 +84,7 @@ sub relation_filepath
 	my ($relname) = @_;
 
 	my $pgdata = $node->data_dir;
-	my $rel = $node->safe_psql('postgres',
+	my $rel = $session->query_oneval(
 		qq(SELECT pg_relation_filepath('$relname')));
 	die "path not found for relation $relname" unless defined $rel;
 	return "$pgdata/$rel";
@@ -92,8 +95,8 @@ sub fresh_test_table
 {
 	my ($relname) = @_;
 
-	return $node->safe_psql(
-		'postgres', qq(
+	return $session->do(
+		qq(
 		DROP TABLE IF EXISTS $relname CASCADE;
 		CREATE TABLE $relname (a integer, b text);
 		ALTER TABLE $relname SET (autovacuum_enabled=false);
@@ -117,8 +120,8 @@ sub fresh_test_sequence
 {
 	my ($seqname) = @_;
 
-	return $node->safe_psql(
-		'postgres', qq(
+	return $session->do(
+		qq(
 		DROP SEQUENCE IF EXISTS $seqname CASCADE;
 		CREATE SEQUENCE $seqname
 			INCREMENT BY 13
@@ -134,8 +137,8 @@ sub advance_test_sequence
 {
 	my ($seqname) = @_;
 
-	return $node->safe_psql(
-		'postgres', qq(
+	return $session->query_oneval(
+		qq(
 		SELECT nextval('$seqname');
 	));
 }
@@ -145,10 +148,7 @@ sub set_test_sequence
 {
 	my ($seqname) = @_;
 
-	return $node->safe_psql(
-		'postgres', qq(
-		SELECT setval('$seqname', 102);
-	));
+	return $session->query_oneval(qq(SELECT setval('$seqname', 102)));
 }
 
 # Call SQL functions to reset the sequence
@@ -156,8 +156,8 @@ sub reset_test_sequence
 {
 	my ($seqname) = @_;
 
-	return $node->safe_psql(
-		'postgres', qq(
+	return $session->do(
+		qq(
 		ALTER SEQUENCE $seqname RESTART WITH 51
 	));
 }
@@ -169,6 +169,7 @@ sub corrupt_first_page
 	my ($relname) = @_;
 	my $relpath = relation_filepath($relname);
 
+	$session->close;
 	$node->stop;
 
 	my $fh;
@@ -191,6 +192,7 @@ sub corrupt_first_page
 	  or BAIL_OUT("close failed: $!")

Re: documentation structure

2024-07-18 Thread Andrew Dunstan


On 2024-07-18 Th 4:16 PM, Corey Huinker wrote:



looking back.
The patch is big. no convenient way to review/validate it.


Perhaps we can break up the patches as follows:

1. create the filelist.sgml entries, and create new files as you 
detailed, empty with func.sgml still managing the sections, but each 
section now has it's corresponding &func-something; The files are 
included, but they're completely empty.


2 to 999+. one commit per function moved from func.sgml to it's 
corresponding func-something.sgml.


It'll be a ton of commits, but each one will be very easy to review.

Alternately, if we put each function in its own file, there would be a 
series of commits, one per function like such:


1. create the new func-my-function.sgml, copying the definition of the 
same name

2. delete the definition in func.sgml, replaced with the &func-include;
3. new entry in the filelist.

This approach looks (and IS) tedious, but it has several key advantages:

1. Each one is very easy to review.
2. Big reduction in future merge conflicts on func.sgml.
3. location of a given functions docs is now trivial.
4. separation of concerns with regard to content of function def vs 
placement of same.

5. Easy to ensure that all functions have an anchor.
6. The effort can stall and be resumed at our own pace.

Perhaps your python script can be adapted to this approach? I'm 
willing to review, or collaborate, or both.



I'm opposed to having a separate file for every function. I think 
breaking up func.sgml into one piece per sect1 is about right. If that 
proves cumbersome still we can look at breaking it up further, but let's 
start with that.


More concretely, sometimes the bits that relate to a particular function 
are not contiguous. e.g. you might have an entry in a table for a 
function and then at the end Notes relating to that function. That make 
doing a file per function not very practical.


Also, being able to view the context for your function documentation is 
useful when editing.



cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: PG_TEST_EXTRA and meson

2024-07-17 Thread Andrew Dunstan


On 2024-07-17 We 11:01 AM, Tom Lane wrote:

Jacob Champion  writes:

On Wed, Jul 17, 2024 at 3:34 AM Nazir Bilal Yavuz  wrote:

Sorry, the previous reply was wrong; I misunderstood what you said.
Yes, that is how the patch was coded and I agree that getting rid of
config time PG_TEST_EXTRA could be a better alternative.

Personally I use the config-time PG_TEST_EXTRA extensively. I'd be sad
to see it go, especially if developers are no longer forced to use it.

The existing and documented expectation is that PG_TEST_EXTRA is an
environment variable, ie it's a runtime option not a configure option.
Making it be the latter seems like a significant loss of flexibility
to me.





AIUI the only reason we have it as a configure option at all is that 
meson is *very* dogmatic about not using environment variables. I get 
their POV when it comes to building, but that should not extend to 
testing. That said, I don't mind if this is a configure option as long 
as it can be overridden at run time without having to run "meson 
configure".



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Windows perl/tcl requirement documentation

2024-07-17 Thread Andrew Dunstan



On 2024-07-16 Tu 7:21 PM, Michael Paquier wrote:

On Mon, Jul 01, 2024 at 11:27:26AM -0400, Andrew Dunstan wrote:

Our docs currently state this regarding the perl requirement for building on
Windows:


ActiveState Perl

ActiveState Perl is required to run the build generation scripts.
MinGW or Cygwin Perl will not work. It must also be present in the
PATH. Binaries can be downloaded from https://www.activestate.com
<https://www.activestate.com> (Note: version 5.14 or later is
required, the free Standard Distribution is sufficient).


This really hasn't been a true requirement for quite some time. I stopped
using AS perl quite a few years ago due to possible licensing issues, and
have been building with MSVC using Strawberry Perl ever since. Andres
recently complained that Strawberry was somewhat out of date, but that's no
longer really the case - it's on 5.38.2, which is the latest in that series,
and perl 5.40.0 was only releases a few weeks ago.

This is an area where I have proposed a set of changes in the last
commit fest of March, but it led nowehere as, at least it seems to me,
there was no strong consensus about what to mention as main
references:
https://commitfest.postgresql.org/47/4745/
https://www.postgresql.org/message-id/flat/zzegb7nqbkzm0...@paquier.xyz

Not everything should be gone, but I was wondering back on this thread
if it would make most sense to replace all these references to point
to the popular packaging systems used in the buildfarm.



At the very least we should stop recommending things we don't use or 
test with. Our default position of doing nothing is getting increasingly 
untenable here. We're actively misleading people IMNSHO.



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: recovery test error

2024-07-17 Thread Andrew Dunstan



On 2024-07-16 Tu 7:45 PM, Michael Paquier wrote:

On Tue, Jul 16, 2024 at 03:04:13PM -0400, Andrew Dunstan wrote:

This was called by poll_query_until(), which is changed by the patch to use
a libpq session rather than constantly forking psql. ISTM we should be
passing true as a second parameter so we keep going if the file doesn't
exist.

Thoughts?

Sounds like a good idea to me as this call could return ENOENT
depending on the timing of the archiver pushing the new history file,
as writeTimeLineHistory() at the end of recovery notifies the archiver
but does not wait for the fact to happen (history files are
prioritized, still there is a delay).



Thanks. Done.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





recovery test error

2024-07-16 Thread Andrew Dunstan



As I was trying out the libpq perl wrapper on Windows, I encountered a 
failure in recovery test 002_archiving.pl from this query:


SELECT size IS NOT NULL FROM 
pg_stat_file('c:/prog/postgresql/build/testrun/recovery/002_archiving/data/t_002_archiving_primary_data/archives/0002.history')


The test errored out because the file didn't exist.

This was called by poll_query_until(), which is changed by the patch to 
use a libpq session rather than constantly forking psql. ISTM we should 
be passing true as a second parameter so we keep going if the file 
doesn't exist.


Thoughts?


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Using LibPq in TAP tests via FFI

2024-07-16 Thread Andrew Dunstan


On 2024-06-17 Mo 10:01 AM, Andrew Dunstan wrote:





I agree with you that falling back on BackgroundPsql is not a terribly
satisfactory solution.

I'm somewhat doubtful we'll just agree on making FFI::Platypus a hard
dependency, but if we agree to do so...




Maybe not. If so your other suggestion of a small XS wrapper might 
make sense.



Here's the latest version of this patch. It removes all use of 
background_psql(). Instead it uses libpq's async interface, which seems 
to me far more robust. There is one remaining use of interactive_psql(), 
but that's reasonable as it's used for testing psql itself.


I spent yesterday creating an XS wrapper for just the 19 libpq functions 
used in Session.pm. It's pretty simple. I have it passing a very basic 
test, but haven't tried plugging it into Session.pm yet.



cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl
index 481e4dbe4f..f821f9 100644
--- a/contrib/amcheck/t/001_verify_heapam.pl
+++ b/contrib/amcheck/t/001_verify_heapam.pl
@@ -5,6 +5,7 @@ use strict;
 use warnings FATAL => 'all';
 
 use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Session;
 use PostgreSQL::Test::Utils;
 
 use Test::More;
@@ -18,7 +19,9 @@ $node = PostgreSQL::Test::Cluster->new('test');
 $node->init;
 $node->append_conf('postgresql.conf', 'autovacuum=off');
 $node->start;
-$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+my $session = PostgreSQL::Test::Session->new(node => $node);
+
+$session->do(q(CREATE EXTENSION amcheck));
 
 #
 # Check a table with data loaded but no corruption, freezing, etc.
@@ -49,7 +52,7 @@ detects_heap_corruption(
 # Check a corrupt table with all-frozen data
 #
 fresh_test_table('test');
-$node->safe_psql('postgres', q(VACUUM (FREEZE, DISABLE_PAGE_SKIPPING) test));
+$session->do(q(VACUUM (FREEZE, DISABLE_PAGE_SKIPPING) test));
 detects_no_corruption("verify_heapam('test')",
 	"all-frozen not corrupted table");
 corrupt_first_page('test');
@@ -81,7 +84,7 @@ sub relation_filepath
 	my ($relname) = @_;
 
 	my $pgdata = $node->data_dir;
-	my $rel = $node->safe_psql('postgres',
+	my $rel = $session->query_oneval(
 		qq(SELECT pg_relation_filepath('$relname')));
 	die "path not found for relation $relname" unless defined $rel;
 	return "$pgdata/$rel";
@@ -92,8 +95,8 @@ sub fresh_test_table
 {
 	my ($relname) = @_;
 
-	return $node->safe_psql(
-		'postgres', qq(
+	return $session->do(
+		qq(
 		DROP TABLE IF EXISTS $relname CASCADE;
 		CREATE TABLE $relname (a integer, b text);
 		ALTER TABLE $relname SET (autovacuum_enabled=false);
@@ -117,8 +120,8 @@ sub fresh_test_sequence
 {
 	my ($seqname) = @_;
 
-	return $node->safe_psql(
-		'postgres', qq(
+	return $session->do(
+		qq(
 		DROP SEQUENCE IF EXISTS $seqname CASCADE;
 		CREATE SEQUENCE $seqname
 			INCREMENT BY 13
@@ -134,8 +137,8 @@ sub advance_test_sequence
 {
 	my ($seqname) = @_;
 
-	return $node->safe_psql(
-		'postgres', qq(
+	return $session->query_oneval(
+		qq(
 		SELECT nextval('$seqname');
 	));
 }
@@ -145,10 +148,7 @@ sub set_test_sequence
 {
 	my ($seqname) = @_;
 
-	return $node->safe_psql(
-		'postgres', qq(
-		SELECT setval('$seqname', 102);
-	));
+	return $session->query_oneval(qq(SELECT setval('$seqname', 102)));
 }
 
 # Call SQL functions to reset the sequence
@@ -156,8 +156,8 @@ sub reset_test_sequence
 {
 	my ($seqname) = @_;
 
-	return $node->safe_psql(
-		'postgres', qq(
+	return $session->do(
+		qq(
 		ALTER SEQUENCE $seqname RESTART WITH 51
 	));
 }
@@ -169,6 +169,7 @@ sub corrupt_first_page
 	my ($relname) = @_;
 	my $relpath = relation_filepath($relname);
 
+	$session->close;
 	$node->stop;
 
 	my $fh;
@@ -191,6 +192,7 @@ sub corrupt_first_page
 	  or BAIL_OUT("close failed: $!");
 
 	$node->start;
+	$session->reconnect;
 }
 
 sub detects_heap_corruption
@@ -216,7 +218,7 @@ sub detects_corruption
 
 	my ($function, $testname, @re) = @_;
 
-	my $result = $node->safe_psql('postgres', qq(SELECT * FROM $function));
+	my $result = $session->query_tuples(qq(SELECT * FROM $function));
 	like($result, $_, $testname) for (@re);
 }
 
@@ -226,7 +228,7 @@ sub detects_no_corruption
 
 	my ($function, $testname) = @_;
 
-	my $result = $node->safe_psql('postgres', qq(SELECT * FROM $function));
+	my $result = $session->query_tuples(qq(SELECT * FROM $function));
 	is($result, '', $testname);
 }
 
diff --git a/contrib/amcheck/t/003_cic_2pc.pl b/contrib/amcheck/t/003_cic_2pc.pl
index fc314b8524..ff345f36ac 100644
--- a/contrib/amcheck/t/003_cic_2pc.pl
+++ b/cont

Re: ActiveState Perl is not valid anymore to build PG17 on the Windows 10/11 platforms, So Documentation still suggesting it should be updated

2024-07-16 Thread Andrew Dunstan


On 2024-07-16 Tu 7:46 AM, Yasir wrote:

Hi Hackers,

Recently, I compiled PG17 on the windows. Till PG16 "ActiveState 
Perl", as instructed in the documentation 
<https://www.postgresql.org/docs/16/install-windows-full.html#INSTALL-WINDOWS-FULL-REQUIREMENTS>, 
was being used successfully on the Windows 10/11 to compile PG.
However, it looks like that "ActiveState Perl" is not valid anymore to 
compile PG17 on Windows 10/11 but documentation 
<https://www.postgresql.org/docs/17/installation-platform-notes.html#WINDOWS-REQUIREMENTS> still 
suggests it. Therefore, I think documentation needs to be updated.
Moreover, I had to install "strawberry's perl" in order to compile 
PG17 on Windows 10/11. Please check out the thread "errors building on 
windows using meson 
<https://www.postgresql.org/message-id/flat/CADK3HHLQ1MNmfXqEvQi36D_MQrheOZPcXv2H3s6otMbSmfwjzg%40mail.gmail.com>" 
highlighting the issue.




See https://postgr.es/m/4acddcd4-1c08-44e7-ba60-cab102259...@dunslane.net

I agree we should fix the docco.


cheers


andrew


--

Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: CFbot failed on Windows platform

2024-07-11 Thread Andrew Dunstan



On 2024-07-11 Th 9:50 AM, Tom Lane wrote:

Tatsuo Ishii  writes:

I think It is related to the '628c1d1f2c' commit. This commit changed
the output of the regress test in Windows.

Yeah, it seems that explains. I see few buildfarm window animals
complain too.

I think that the contents of
src/test/regress/expected/collate.windows.win1252.out are actually
wrong, and we'd not noticed because it was only checked with diff -w.
psql does put an extra trailing space in some lines of table output,
but that space isn't there in collate.windows.win1252.out.





Yeah, makes sense I will produce an updated output file and then 
re-enable --strip-trailing-cr (after a full test run).



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: tests fail on windows with default git settings

2024-07-11 Thread Andrew Dunstan



On 2024-07-11 Th 7:29 AM, Andrew Dunstan wrote:


On 2024-07-11 Th 4:59 AM, Nazir Bilal Yavuz wrote:

Hi,

On Wed, 10 Jul 2024 at 17:04, Andrew Dunstan  
wrote:


On 2024-07-10 We 9:25 AM, Tom Lane wrote:

Dave Page  writes:
On Wed, 10 Jul 2024 at 12:12, Andrew Dunstan  
wrote:
As I was looking at this I wondered if there might be anywhere 
else that
needed adjustment. One thing that occurred to me was that that 
maybe we

should replace the use of "-w" in pg_regress.c with this rather less
dangerous flag, so instead of ignoring any white space difference 
we would
only ignore line end differences. The use of "-w" apparently 
dates back to

2009.

That seems like a good improvement to me.

+1




OK, done.

It looks like Postgres CI did not like this change. 'Windows - Server
2019, VS 2019 - Meson & ninja' [1] task started to fail after this
commit, there is one extra space at the end of line in regress test's
output.

[1] https://cirrus-ci.com/task/6753781205958656



Oh, that's annoying. Will investigate. Thanks for the heads up.





I have reverted the pg_regress.c portion of the patch. I will 
investigate non line-end differences on Windows further.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: CFbot failed on Windows platform

2024-07-11 Thread Andrew Dunstan



On 2024-07-11 Th 5:34 AM, Tatsuo Ishii wrote:

The differences are that the result has an extra space at the end of
line.  I am not familiar with Windows and has no idea why this could
happen (I haven't changed the patch set since May 24. The last
succeeded test was on July 9 14:58:44 (I am not sure about time zone).
Also I see exactly the same test failures in some other tests (for
example http://cfbot.cputube.org/highlights/all.html#4337)

Any idea?

I think It is related to the '628c1d1f2c' commit. This commit changed
the output of the regress test in Windows.

Yeah, it seems that explains. I see few buildfarm window animals
complain too.



I have partially reverted that patch. Thanks for the report.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: tests fail on windows with default git settings

2024-07-11 Thread Andrew Dunstan



On 2024-07-11 Th 4:59 AM, Nazir Bilal Yavuz wrote:

Hi,

On Wed, 10 Jul 2024 at 17:04, Andrew Dunstan  wrote:


On 2024-07-10 We 9:25 AM, Tom Lane wrote:

Dave Page  writes:

On Wed, 10 Jul 2024 at 12:12, Andrew Dunstan  wrote:

As I was looking at this I wondered if there might be anywhere else that
needed adjustment. One thing that occurred to me was that that maybe we
should replace the use of "-w" in pg_regress.c with this rather less
dangerous flag, so instead of ignoring any white space difference we would
only ignore line end differences. The use of "-w" apparently dates back to
2009.

That seems like a good improvement to me.

+1




OK, done.

It looks like Postgres CI did not like this change. 'Windows - Server
2019, VS 2019 - Meson & ninja' [1] task started to fail after this
commit, there is one extra space at the end of line in regress test's
output.

[1] https://cirrus-ci.com/task/6753781205958656



Oh, that's annoying. Will investigate. Thanks for the heads up.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: tests fail on windows with default git settings

2024-07-10 Thread Andrew Dunstan



On 2024-07-10 We 9:25 AM, Tom Lane wrote:

Dave Page  writes:

On Wed, 10 Jul 2024 at 12:12, Andrew Dunstan  wrote:

As I was looking at this I wondered if there might be anywhere else that
needed adjustment. One thing that occurred to me was that that maybe we
should replace the use of "-w" in pg_regress.c with this rather less
dangerous flag, so instead of ignoring any white space difference we would
only ignore line end differences. The use of "-w" apparently dates back to
2009.

That seems like a good improvement to me.

+1





OK, done.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Should we work around msvc failing to compile tab-complete.c?

2024-07-10 Thread Andrew Dunstan


On 2024-07-10 We 5:55 AM, Dave Page wrote:




What is more relevant is that as far as I can see, we've never 
actually supported either libedit or readline in MSVC++ builds - which 
kinda makes sense because Windows terminals are very different from 
traditional *nix ones. Windows isn't supported by either libedit or 
readline as far as I can see, except through a couple of forks.


I imagine from mingw/cygwin builds, readline would only actually work 
properly in their respective terminals.




configure.ac explicitly forces --without-readline on win32, so no for 
mingw. configure.ac claims there are issues especially with non-US code 
pages.


One of the reasons I've continued to support building with Cygwin is 
that its readline does seem to work, making its psql the best I know of 
on Windows.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: tests fail on windows with default git settings

2024-07-10 Thread Andrew Dunstan


On 2024-07-09 Tu 11:34 AM, Andrew Dunstan wrote:



On 2024-07-09 Tu 9:52 AM, Dave Page wrote:



> What I suggest (see attached) is we run the diff command with
> --strip-trailing-cr on Windows. Then we just won't care if the
expected file
> and/or the output file has CRs.

I was wondering about that too, but I wasn't sure we can rely on
that flag
being supported...


I have 4 different diff.exe's on my ~6 week old build VM (not 
counting shims), all of which seem to support --strip-trailing-cr. 
Those builds came with:


- git
- VC++
- diffutils (installed by chocolatey)
- vcpkg

I think it's reasonable to assume it'll be supported.



Ok, cool. So I propose to patch the test_json_parser and pg_bsd_indent 
tests to use it on Windows, later today unless there's some objection.






As I was looking at this I wondered if there might be anywhere else that 
needed adjustment. One thing that occurred to me was that that maybe we 
should replace the use of "-w" in pg_regress.c with this rather less 
dangerous flag, so instead of ignoring any white space difference we 
would only ignore line end differences. The use of "-w" apparently dates 
back to 2009.



Thoughts?


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: tests fail on windows with default git settings

2024-07-09 Thread Andrew Dunstan


On 2024-07-09 Tu 9:52 AM, Dave Page wrote:



> What I suggest (see attached) is we run the diff command with
> --strip-trailing-cr on Windows. Then we just won't care if the
expected file
> and/or the output file has CRs.

I was wondering about that too, but I wasn't sure we can rely on
that flag
being supported...


I have 4 different diff.exe's on my ~6 week old build VM (not counting 
shims), all of which seem to support --strip-trailing-cr. Those builds 
came with:


- git
- VC++
- diffutils (installed by chocolatey)
- vcpkg

I think it's reasonable to assume it'll be supported.



Ok, cool. So I propose to patch the test_json_parser and pg_bsd_indent 
tests to use it on Windows, later today unless there's some objection.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: tests fail on windows with default git settings

2024-07-09 Thread Andrew Dunstan


On 2024-07-08 Mo 5:44 PM, Andres Freund wrote:

What I suggest (see attached) is we run the diff command with
--strip-trailing-cr on Windows. Then we just won't care if the expected file
and/or the output file has CRs.

I was wondering about that too, but I wasn't sure we can rely on that flag
being supported...



Well, my suggestion was to use it only on Windows. I'm using the 
diffutils from chocolatey, which has it, as does Msys2 diff. Not sure 
what you have in the CI setup.




Not sure what the issue is with pg_bsd_indent, though.

I think it's purely that we *read* with fopen("r") and write with
fopen("wb"). Which means that any \r\n in the input will be converted to \n in
the output. That's not a problem if the repo has been cloned without autocrlf,
as there are no crlf in the expected files, but if autocrlf has been used, the
expected files don't match.

It doesn't look like it'd be trivial to make indent remember what was used in
the input. So I think for now the best path is to just use .gitattributes to
exclude the expected files from crlf conversion.  If we don't want to do so
repo wide, we can do so just for these files.



either that or we could use the --strip-trailing-cr gadget here too.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: 010_pg_basebackup.pl vs multiple filesystems

2024-07-08 Thread Andrew Dunstan



On 2024-07-08 Mo 4:31 PM, Andres Freund wrote:

Hi,

On 2024-07-07 09:10:48 -0400, Andrew Dunstan wrote:

On 2024-07-07 Su 7:28 AM, Andrew Dunstan wrote:

I'll be happy to hear of one. I agree it's a mess.  Maybe we could test
that the temp directory is on the same device on Windows and skip the
test if not? You could still get the test to run by setting TMPDIR
and/or friends.

Maybe we should just not try to rename the directory. Looking at the test
I'm pretty sure the directory should be empty. Instead of trying to move it,
let's just remove it, and recreate it in the tmp location.

Good catch, yes, that'd be much better!



diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 489dde4adf..c0c334c6fc 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -363,8 +363,8 @@ my $sys_tempdir =
PostgreSQL::Test::Utils::tempdir_short;
  # Elsewhere use $tempdir to avoid file system boundary issues with moving.
  my $tmploc = $windows_os ? $sys_tempdir : $tempdir;

The comment would need a bit of editing, I guess. I think we should consider
just getting rid of the os-dependant switch now, it shouldn't be needed
anymore?



-rename("$pgdata/pg_replslot", "$tmploc/pg_replslot")
-  or BAIL_OUT "could not move $pgdata/pg_replslot";
+rmtree("$pgdata/pg_replslot");

Should this perhaps be an rmdir, to ensure that we're not removing something
we don't want (e.g. somebody adding an earlier test for slots that then gets
broken by the rmtree)?




OK, done like this.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: 010_pg_basebackup.pl vs multiple filesystems

2024-07-08 Thread Andrew Dunstan



On 2024-07-08 Mo 4:45 PM, Robert Haas wrote:

On Sun, Jul 7, 2024 at 3:02 AM Andres Freund  wrote:

While working on [1] I encountered the issue that, on github-actions,
010_pg_basebackup.pl fails on windows.

The reason for that is that github actions uses two drives, with TMP/TEMP
located on c:, the tested code on d:.  This causes the following code to fail:

   # Create a temporary directory in the system location.
   my $sys_tempdir = PostgreSQL::Test::Utils::tempdir_short;

Whatever we end up doing about this, it would be a good idea to check
the other places that use tempdir_short and see if they also need
adjustment.



I don't think it's a problem. There are lots of systems that have 
tempdir on a different device. That's why we previously saw lots of 
errors from this code, resulting in the present incomplete workaround. 
The fact that we haven't seen such errors from other tests means we 
should be ok.



cheers


andrew

--

Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: tests fail on windows with default git settings

2024-07-08 Thread Andrew Dunstan


On 2024-07-08 Mo 4:16 PM, Andres Freund wrote:

On 2024-07-07 06:30:57 -0400, Andrew Dunstan wrote:

On 2024-07-07 Su 1:26 AM, Tom Lane wrote:

Andres Freund  writes:

Do we want to support checking out with core.autocrlf?

-1.  It would be a constant source of breakage, and you could never
expect that (for example) making a tarball from such a checkout
would match anyone else's results.

Yeah, totally agree.



If we do not want to support that, ISTM we ought to raise an error somewhere?

+1, if we can figure out how.





ISTM the right fix is probably to use PG_BINARY_R mode instead of "r" when
opening the files, at least in the case if the test_json_parser tests.

That does seem like it'd fix this issue, assuming the parser can cope with
\r\n.



Yes, the parser can handle \r\n. Note that they can only be white space 
in JSON - they can only be present in string values via escapes.





I'm actually mildly surprised that the tests don't fail when *not* using
autocrlf, because afaict test_json_parser_incremental.c doesn't set stdout to
binary and thus we presumably end up with \r\n in the output? Except that that
can't be true, because the test does pass on repos without autocrlf...


That approach does seem to mildly conflict with Tom and your preference for
fixing this by disallowing core.autocrlf? If we do so, the test never ought to
see a crlf?



IDK. I normally use core.autocrlf=false core.eol=lf on Windows. The 
editors I use are reasonably well behaved ;-)


What I suggest (see attached) is we run the diff command with 
--strip-trailing-cr on Windows. Then we just won't care if the expected 
file and/or the output file has CRs.


Not sure what the issue is with pg_bsd_indent, though.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/src/test/modules/test_json_parser/t/003_test_semantic.pl b/src/test/modules/test_json_parser/t/003_test_semantic.pl
index 74e0fa5bb1..2a430ad0c4 100644
--- a/src/test/modules/test_json_parser/t/003_test_semantic.pl
+++ b/src/test/modules/test_json_parser/t/003_test_semantic.pl
@@ -29,7 +29,9 @@ print $fh $stdout, "\n";
 
 close($fh);
 
-($stdout, $stderr) = run_command([ "diff", "-u", $fname, $test_out ]);
+my @diffopts = ("-u");
+push(@diffopts, "--strip-railing-cr") if $windows_os;
+($stdout, $stderr) = run_command([ "diff", @diffopts, $fname, $test_out ]);
 
 is($stdout, "", "no output diff");
 is($stderr, "", "no diff error");
diff --git a/src/test/modules/test_json_parser/test_json_parser_incremental.c b/src/test/modules/test_json_parser/test_json_parser_incremental.c
index f4c442ac36..47040e1e42 100644
--- a/src/test/modules/test_json_parser/test_json_parser_incremental.c
+++ b/src/test/modules/test_json_parser/test_json_parser_incremental.c
@@ -124,7 +124,7 @@ main(int argc, char **argv)
 	makeJsonLexContextIncremental(&lex, PG_UTF8, need_strings);
 	initStringInfo(&json);
 
-	if ((json_file = fopen(testfile, "r")) == NULL)
+	if ((json_file = fopen(testfile, PG_BINARY_R)) == NULL)
 		pg_fatal("error opening input: %m");
 
 	if (fstat(fileno(json_file), &statbuf) != 0)
diff --git a/src/test/modules/test_json_parser/test_json_parser_perf.c b/src/test/modules/test_json_parser/test_json_parser_perf.c
index ea85626cbd..74cc5f3f54 100644
--- a/src/test/modules/test_json_parser/test_json_parser_perf.c
+++ b/src/test/modules/test_json_parser/test_json_parser_perf.c
@@ -55,7 +55,7 @@ main(int argc, char **argv)
 
 	sscanf(argv[1], "%d", &iter);
 
-	if ((json_file = fopen(argv[2], "r")) == NULL)
+	if ((json_file = fopen(argv[2], PG_BINARY_R)) == NULL)
 		pg_fatal("Could not open input file '%s': %m", argv[2]);
 
 	while ((n_read = fread(buff, 1, 6000, json_file)) > 0)


Re: ssl tests fail due to TCP port conflict

2024-07-08 Thread Andrew Dunstan



On 2024-07-08 Mo 8:00 AM, Alexander Lakhin wrote:

Hello,

07.06.2024 17:25, Tom Lane wrote:

Andrew Dunstan  writes:
I still think my patch to force TCP mode for the SSL test makes 
sense as

well.

+1 to both things.  If that doesn't get the failure rate down to an
acceptable level, we can look at the retry idea.



I have push patches for both of those (i.e. start SSL test nodes in TCP 
mode and change the range of ports we allocate server ports from)


I didn't see this email until after I had pushed them.




I'd like to add that the kerberos/001_auth test also suffers from the 
port

conflict, but slightly differently. Look for example at [1]:
krb5kdc.log contains:
Jul 02 09:29:41 andres-postgres-buildfarm-v5 krb5kdc[471964](info): 
setting up network...
Jul 02 09:29:41 andres-postgres-buildfarm-v5 krb5kdc[471964](Error): 
Address already in use - Cannot bind server socket on 127.0.0.1.55853
Jul 02 09:29:41 andres-postgres-buildfarm-v5 krb5kdc[471964](Error): 
Failed setting up a UDP socket (for 127.0.0.1.55853)
Jul 02 09:29:41 andres-postgres-buildfarm-v5 krb5kdc[471964](Error): 
Address already in use - Error setting up network


As far as I can see, the port for kdc is chosen by
PostgreSQL::Test::Kerberos, via
PostgreSQL::Test::Cluster::get_free_port(), which checks only for TCP
port availability (with can_bind()), but not for UDP, so this increases
the probability of the conflict for this test (a similar failure: [2]).
Although we can also find a failure with TCP: [3]

(It's not clear to me, what processes can use UDP ports while testing,
but maybe those buildfarm animals are running on the same logical
machine simultaneously?)

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2024-07-02%2009%3A27%3A15
[2] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2024-05-15%2001%3A25%3A07
[3] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit&dt=2024-07-04%2008%3A28%3A19





Let's see if this persists now we are testing for free ports in a 
different range, not the range usually used for ephemeral ports.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: 010_pg_basebackup.pl vs multiple filesystems

2024-07-07 Thread Andrew Dunstan



On 2024-07-07 Su 7:28 AM, Andrew Dunstan wrote:


On 2024-07-07 Su 3:02 AM, Andres Freund wrote:

Hi,

While working on [1] I encountered the issue that, on github-actions,
010_pg_basebackup.pl fails on windows.

The reason for that is that github actions uses two drives, with 
TMP/TEMP
located on c:, the tested code on d:.  This causes the following code 
to fail:


   # Create a temporary directory in the system location.
   my $sys_tempdir = PostgreSQL::Test::Utils::tempdir_short;

   # On Windows use the short location to avoid path length issues.
   # Elsewhere use $tempdir to avoid file system boundary issues with 
moving.

   my $tmploc = $windows_os ? $sys_tempdir : $tempdir;

   rename("$pgdata/pg_replslot", "$tmploc/pg_replslot")
 or BAIL_OUT "could not move $pgdata/pg_replslot";
   dir_symlink("$tmploc/pg_replslot", "$pgdata/pg_replslot")
 or BAIL_OUT "could not symlink to $pgdata/pg_replslot";

It's not possible to move (vs copy) a file between two filesystems, 
on most

operating systems. Which thus leads to:

[23:02:03.114](0.288s) Bail out!  could not move 
D:\a\winpgbuild\winpgbuild\postgresql-17beta2\build/testrun/pg_basebackup/010_pg_basebackup\data/t_010_pg_basebackup_main_data/pgdata/pg_replslot



This logic was added in

commit e213de8e785aac4e2ebc44282b8dc0fcc74834e8
Author: Andrew Dunstan 
Date:   2023-07-08 11:21:58 -0400

 Use shorter location for pg_replslot in pg_basebackup test

and revised in

commit e9f15bc9db7564a29460d089c0917590bc13fffc
Author: Andrew Dunstan 
Date:   2023-07-08 12:34:25 -0400

 Fix tmpdir issues with commit e213de8e78

The latter deals with precisely this issue, for !windows.


I've temporarily worked around this by setting TMP/TEMP to something 
else, but

ISTM we need a better solution.



I'll be happy to hear of one. I agree it's a mess.  Maybe we could 
test that the temp directory is on the same device on Windows and skip 
the test if not? You could still get the test to run by setting TMPDIR 
and/or friends.






Maybe we should just not try to rename the directory. Looking at the 
test I'm pretty sure the directory should be empty. Instead of trying to 
move it, let's just remove it, and recreate it in the tmp location.


Something like


diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl 
b/src/bin/pg_basebackup/t/010_pg_basebackup.pl

index 489dde4adf..c0c334c6fc 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -363,8 +363,8 @@ my $sys_tempdir = 
PostgreSQL::Test::Utils::tempdir_short;

 # Elsewhere use $tempdir to avoid file system boundary issues with moving.
 my $tmploc = $windows_os ? $sys_tempdir : $tempdir;

-rename("$pgdata/pg_replslot", "$tmploc/pg_replslot")
-  or BAIL_OUT "could not move $pgdata/pg_replslot";
+rmtree("$pgdata/pg_replslot");
+mkdir ("$tmploc/pg_replslot", 0700);
 dir_symlink("$tmploc/pg_replslot", "$pgdata/pg_replslot")
   or BAIL_OUT "could not symlink to $pgdata/pg_replslot";



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: 010_pg_basebackup.pl vs multiple filesystems

2024-07-07 Thread Andrew Dunstan



On 2024-07-07 Su 3:02 AM, Andres Freund wrote:

Hi,

While working on [1] I encountered the issue that, on github-actions,
010_pg_basebackup.pl fails on windows.

The reason for that is that github actions uses two drives, with TMP/TEMP
located on c:, the tested code on d:.  This causes the following code to fail:

   # Create a temporary directory in the system location.
   my $sys_tempdir = PostgreSQL::Test::Utils::tempdir_short;

   # On Windows use the short location to avoid path length issues.
   # Elsewhere use $tempdir to avoid file system boundary issues with moving.
   my $tmploc = $windows_os ? $sys_tempdir : $tempdir;

   rename("$pgdata/pg_replslot", "$tmploc/pg_replslot")
 or BAIL_OUT "could not move $pgdata/pg_replslot";
   dir_symlink("$tmploc/pg_replslot", "$pgdata/pg_replslot")
 or BAIL_OUT "could not symlink to $pgdata/pg_replslot";

It's not possible to move (vs copy) a file between two filesystems, on most
operating systems. Which thus leads to:

[23:02:03.114](0.288s) Bail out!  could not move 
D:\a\winpgbuild\winpgbuild\postgresql-17beta2\build/testrun/pg_basebackup/010_pg_basebackup\data/t_010_pg_basebackup_main_data/pgdata/pg_replslot


This logic was added in

commit e213de8e785aac4e2ebc44282b8dc0fcc74834e8
Author: Andrew Dunstan 
Date:   2023-07-08 11:21:58 -0400

 Use shorter location for pg_replslot in pg_basebackup test

and revised in

commit e9f15bc9db7564a29460d089c0917590bc13fffc
Author: Andrew Dunstan 
Date:   2023-07-08 12:34:25 -0400

 Fix tmpdir issues with commit e213de8e78

The latter deals with precisely this issue, for !windows.


I've temporarily worked around this by setting TMP/TEMP to something else, but
ISTM we need a better solution.



I'll be happy to hear of one. I agree it's a mess.  Maybe we could test 
that the temp directory is on the same device on Windows and skip the 
test if not? You could still get the test to run by setting TMPDIR 
and/or friends.



cheers


andrew




--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: tests fail on windows with default git settings

2024-07-07 Thread Andrew Dunstan



On 2024-07-07 Su 1:26 AM, Tom Lane wrote:

Andres Freund  writes:

Do we want to support checking out with core.autocrlf?

-1.  It would be a constant source of breakage, and you could never
expect that (for example) making a tarball from such a checkout
would match anyone else's results.



Yeah, totally agree.



If we do not want to support that, ISTM we ought to raise an error somewhere?

+1, if we can figure out how.






ISTM the right fix is probably to use PG_BINARY_R mode instead of "r" 
when opening the files, at least in the case if the test_json_parser tests.



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Problem while installing PostgreSQL using make

2024-07-05 Thread Andrew Dunstan


On 2024-07-04 Th 6:22 AM, Mohab Yaser wrote:
can you send me a link to download this version on windows as I didn't 
find anything other than the one I already have downloaded


On Thu, Jul 4, 2024 at 1:21 PM Aleksander Alekseev 
 wrote:


Hi,

> While I was trying to install PostgreSQL from the git repository
to start contributing I faced this issue. When I try to type
./configure it gives me this error
>
> checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
> checking for bison... /c/GnuWin32/bin/bison
> configure: using bison (GNU Bison) 2.4.1
> checking for flex... configure: error:
> *** The installed version of Flex, /c/GnuWin32/bin/flex, is too
old to use with PostgreSQL.
> *** Flex version 2.5.35 or later is required, but this is
C:\GnuWin32\bin\flex.exe version 2.5.4.
>
> Look at the last two lines, the error says that the installed
version of flex is too old and is 2.4 which is correct and not too
old and should be valid but actually I can't proceed beyond this
point. And I double checked the version of flex
>
> $ flex --version
> C:\GnuWin32\bin\flex.exe version 2.5.4
>
> and made sure that it is properly included in PATH
>
> $ which flex
> /c/GnuWin32/bin/flex

Flex 2.5.4 is ancient. Version 2.5.39 was released in 2020 and I
didn't look further to figure out the exact release year of 2.5.4

You need something like flex 2.6.4 and bison >= 2.3. That's what I
use.



I assume this configure script is running under Msys2? Just install its 
flex and bison (and remove these ancient versions):



   pacman -S bison flex


cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


  1   2   3   4   5   6   7   8   9   10   >