Hi all,
I believe I found a server bug. If a client connects from unreliable network,
server may lock up. In that state, it does not accept new connections, it
cannot be stopped, and if connection improves, it does not become responsive
again. The problem is confirmed on Raspbian and macOS so far.
I debugged it and found the root cause. The code below makes a socket blocking
temporarily while writing into it. Then it reverts it to non-blocking only if
operation succeeds, leaving it blocking if it fails:
liveMedia/RTPInterface.cpp:398:
```
makeSocketBlocking(socketNum, RTPINTERFACE_BLOCKING_WRITE_TIMEOUT_MS);
....
? tlsState->write((char const*)(&data[numBytesSentSoFar]),
numBytesRemainingToSend)
: send(socketNum, (char const*)(&data[numBytesSentSoFar]),
numBytesRemainingToSend, 0/*flags*/);
if ((unsigned)sendResult != numBytesRemainingToSend) {
....
removeStreamSocket(socketNum, 0xFF);
return False;
}
makeSocketNonBlocking(socketNum);
return True;
}
```
What happens next is when it attempts to read from a blocking socket in the
loop of 2000 iterations below, it locks up forever (2000 x timeout is a long
time):
liveMedia/RTPInterface.cpp:510:
```
void SocketDescriptor::tcpReadHandler(SocketDescriptor* socketDescriptor, int
mask) {
// Call the read handler until it returns false, with a limit to avoid
starving other sockets
unsigned count = 2000;
socketDescriptor->fAreInReadHandlerLoop = True;
while (!socketDescriptor->fDeleteMyselfNext &&
socketDescriptor->tcpReadHandler1(mask) && --count > 0) {}
socketDescriptor->fAreInReadHandlerLoop = False;
if (socketDescriptor->fDeleteMyselfNext) delete socketDescriptor;
}
```
I was able to mitigate it using the patch:
```
--- live/liveMedia/RTPInterface.cpp 2021-12-07 21:33:13.000000000 +0000
+++ live555/liveMedia/RTPInterface.cpp 2021-12-14 07:07:19.257748176 +0000
@@ -399,6 +399,7 @@
sendResult = (tlsState != NULL && tlsState->isNeeded)
? tlsState->write((char const*)(&data[numBytesSentSoFar]),
numBytesRemainingToSend)
: send(socketNum, (char const*)(&data[numBytesSentSoFar]),
numBytesRemainingToSend, 0/*flags*/);
+ makeSocketNonBlocking(socketNum);
if ((unsigned)sendResult != numBytesRemainingToSend) {
// The blocking "send()" failed, or timed out. In either case, we assume that
the
// TCP connection has failed (or is 'hanging' indefinitely), and we stop using
it
@@ -411,7 +412,6 @@
removeStreamSocket(socketNum, 0xFF);
return False;
}
- makeSocketNonBlocking(socketNum);
return True;
} else if (sendResult < 0 && envir().getErrno() != EAGAIN) {
```
Any chance we can get this into the official build to avoid patching hell?
Thanks
Andrei
_______________________________________________
live-devel mailing list
[email protected]
http://lists.live555.com/mailman/listinfo/live-devel