Am 22.11.21 um 00:04 schrieb Tom Lane:
Do we know that that actually happens in an arm's-length connection
(ie two separate machines)? I wonder if the data loss is strictly
an artifact of a localhost connection. There'd be a lot more pressure
on them to make cross-machine TCP work per spec, one would think.
But in any case, if we can avoid sending RST in this situation,
it seems mostly moot for our usage.
Sorry it took some days to get a setup to check this!
The result is as expected:
1. Windows client to Linux server works without dropping the error message
2. Linux client to Windows server works without dropping the error message
3. Windows client to remote Windows server drops the error message,
depending on the timing of the event loop
In 1. the Linux server doesn't end the connection with a RST packet, so
that the Windows client enqueues the error message properly and doesn't
drop it.
In 2. the Linux client doesn't care about the RST packet of the Windows
server and properly enqueues and raises the error message.
In 3. the combination of the bad RST behavior of client and server leads
to data loss. It depends on the network timing. A delay of 0.5 ms in the
event loop was enough in a localhost setup and as wall as in some LAN
setup. On the contrary over some slower WLAN connection a delay of less
than 15 ms did not loose data, but higher delays still did.
The idea of running a second process, pass the socket handle to it,
observe the parent process and close the socket when it exited, could
work, but I guess it's overly complicated and creates more issues than
it solves. Probably the same if the master process handles the socket
closing.
So I still think it's best to close the socket as proposed in the patch.
--
Regards,
Lars Kanis