Hello Marina,

While trying to test a patch that adds a synchronization barrier in pgbench [1] on Windows,

Thanks for trying that, I do not have a windows setup for testing, and the sync code I wrote for Windows is basically blind coding:-(

I found that since the commit "Use ppoll(2), if available, to wait for input in pgbench." [2] I cannot use a large number of client connections in pgbench on my Windows virtual machines (Windows Server 2008 R2 and Windows 2019), for example:

bin\pgbench.exe -c 90 -S -T 3 postgres
starting vacuum...end.

ISTM that 1 thread with 90 clients is a bad idea, see below.

The almost same thing happens with reindexdb and vacuumdb (build on commit [3]):

Windows fd implementation is somehow buggy because it does not return the smallest number available, and then with the assumption that select uses a dense array indexed with them (true on linux, less so on Windows which probably uses a sparse array), so that the number gets over the limit, even if less are actually used, hence the catch, as you noted.

Another point is windows has a hardcoded number of objects one thread can really wait for, typically 64, so that waiting for more requires actually forking threads to do the waiting. But if you are ready to fork threads just to wait, then probaly you could have started pgbench with more threads in the first place. Now it would probably not make the problem go away because fd numbers would be per process, not per thread, but it really suggests that one should not load a thread is more than 64 clients.

IIUC the checks below are not correct on Windows, since on this system sockets can have values equal to or greater than FD_SETSIZE (see Windows documentation [4] and pgbench debug output in attached pgbench_debug.txt).

Okay.

But then, how may one detect that there are too many fds in the set?

I think that an earlier version of the code needed to make assumptions about the internal implementation of windows (there is a counter somewhere in windows fd_set struct), which was rejected because if was breaking the interface. Now your patch is basically resurrecting that. Why not if there is no other solution, but this is quite depressing, and because it breaks the interface it would be broken if windows changed its internals for some reason:-(

Doesn't windows has "ppoll"? Should we implement the stuff above windows polling capabilities and coldly skip its failed posix portability attempts? This raises again the issue that you should not have more that 64 clients per thread anyway, because it is an intrinsic limit on windows.

I think that at one point it was suggested to error or warn if nclients/nthreads is too great, but that was not kept in the end.

I tried to fix this, see attached fix_max_client_conn_on_Windows.patch (based on commit [3]). I checked it for reindexdb and vacuumdb, and it works for simple databases (1025 jobs are not allowed and 1024 jobs is ok). Unfortunately, pgbench was getting connection errors when it tried to use 1000 jobs on my virtual machines, although there were no errors for fewer jobs (500) and the same number of clients (1000)...

It seems that the max number of threads you can start depends on available memory, because each thread is given its own stack, so it would depend on your vm settings?

Any suggestions are welcome!

Use ppoll, and start more threads but not too many?

--
Fabien.


Reply via email to