[ https://issues.apache.org/jira/browse/QPID-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102297#comment-14102297 ]
Cliff Jansen commented on QPID-5033: ------------------------------------ I managed to reproduce using a simple ./qpid-perftest --count 100000 -b somebroker.com -P ssl -p 5671 --subscribe The stack trace showed buffers in use: 1 async write 1 current read buffer 1 leftoverPlaintext buff plus a "wanted" extraBuff There was a spare buffer in the bufferQueue but getqueuedBuffer() would not give it up, holding it in reserve just in case it might be an "unread" buffer that had existing data that should not be clobbered. The test did not check if the buffer really was unread (i.e. contained any data). So increasing the buffer count to 5 buffers would allow for the fallow unread buffer, and has been seen to work on some systems. It turns out that the Linux driver (SSL and non-SSL) only needs two buffers. The other two are never used. The Windows AsynchIO driver needs at least three, but not necessarily the four it has reserved (and the fifth it hogs for no purpose). The existing implementation uses a spare buffer for partial plaintext frames waiting for the next SSL block/segment to be decoded, and another for extra SSL segments when there are more than one in a read buffer. However it doesn't need both at once, so I have made the fix work with a single extra buffer. I can't explain reports that increasing the buffer count even beyond 5 only delays the occurrence of this bug. I have manipulated timing in the AIO layer to force even 10 levels of recursion of sslDataIn without problem. I have tried all sorts of tests with varying numbers of IO threads, debug and release mode, 32 bit and 64 bit, recent Windows and older Windows. In case the bug persists, the patch provides some debugging information that will hopefully zero in on it. See https://reviews.apache.org/r/24851 > [Windows C++ client] An operation on a socket could not be performed because > the system lacked sufficient buffer space or because a queue was full > -------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: QPID-5033 > URL: https://issues.apache.org/jira/browse/QPID-5033 > Project: Qpid > Issue Type: Bug > Components: C++ Client > Environment: Windows, different versions > Reporter: JAkub Scholz > Assignee: Cliff Jansen > Attachments: client-trace.log, client.cpp > > > As discussed on the user mailing list > (http://qpid.2158936.n2.nabble.com/Qpid-throwed-WSAENOBUFS-while-receiving-data-from-a-broker-td7592938.html), > when receiving a large amounts of messages over SSL using a receiver > prefetch, the clients fails with an exception "An operation on a socket could > not be performed because the system lacked sufficient buffer space or because > a queue was full". This exception seems to originate from the SslAsynchIO > class, method sslDataIn. > Decreasing the capacity seems to improve the frequency with which the problem > appears. However with 1MB messages, even capacity 1 doesn't seem to work. The > problem seems to be quite easy to reproduce using following scenario: > 1) Create a large queue on a broker (C++ / Linux) > 2) Start feeding messages into the queue using C++/Linux program (in my case > I used approximately 1kB messages) > 3) Connect with a receiver (C++/Windows) using SSL and prefetch 1000 (no > client authentication, I used username & password) > 4) Wait few seconds to see the error in the receiver > The source code of the receiver as well as the full trace+ log are attached. > Please let me know should you need some additional information. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org