[GitHub] thrift issue #1497: Do not call workSocket() in TNonblockigServer without en...
Github user bgedik commented on the issue: https://github.com/apache/thrift/pull/1497 @jeking3 There is already a Jira item: https://issues.apache.org/jira/browse/THRIFT-4465 Btw, it makes sense that the fix resolves the framed transport failures. Because the bug was about the transition from the frame size reading to the frame content reading. ---
[GitHub] thrift issue #1497: Do not call workSocket() in TNonblockigServer without en...
Github user bgedik commented on the issue: https://github.com/apache/thrift/pull/1497 @jeking3 I think all tests have passed. ---
[GitHub] thrift pull request #1497: Do not call workSocket() in TNonblockigServer wit...
Github user bgedik commented on a diff in the pull request: https://github.com/apache/thrift/pull/1497#discussion_r172754060 --- Diff: lib/cpp/src/thrift/server/TNonblockingServer.cpp --- @@ -472,6 +472,18 @@ void TNonblockingServer::TConnection::workSocket() { } // size known; now get the rest of the frame transition(); + +// If the socket has more data than the frame header, continue to work on it. This is not strictly necessary for +// regular sockets, because if there is more data, libevent will fire the event handler registered for read +// readiness, which will in turn call workSocket(). However, some socket types (such as TSSLSocket) may have the +// data sitting in their internal buffers and from libevent's perspective, there is no further data available. In +// that case, not having this workSocket() call here would result in a hang as we will never get to work the socket, +// despite having more data. +if (tSocket_->hasPendingDataToRead()) --- End diff -- It can't. Note that the line above is ``transition()``. So the next time we call ``workSocket()`` it is a different state. Btw, I assume you notice that the original ``workSocket`` call I removed, which was causing the superfluous exception, was inside ``workSocket`` as well. ---
[GitHub] thrift pull request #1497: Do not call workSocket() in TNonblockigServer wit...
Github user bgedik commented on a diff in the pull request: https://github.com/apache/thrift/pull/1497#discussion_r172752961 --- Diff: lib/cpp/src/thrift/server/TNonblockingServer.cpp --- @@ -472,6 +472,18 @@ void TNonblockingServer::TConnection::workSocket() { } // size known; now get the rest of the frame transition(); + +// If the socket has more data than the frame header, continue to work on it. This is not strictly necessary for +// regular sockets, because if there is more data, libevent will fire the event handler registered for read +// readiness, which will in turn call workSocket(). However, some socket types (such as TSSLSocket) may have the +// data sitting in their internal buffers and from libevent's perspective, there is no further data available. In +// that case, not having this workSocket() call here would result in a hang as we will never get to work the socket, +// despite having more data. +if (tSocket_->hasPendingDataToRead()) --- End diff -- It should be an ``if``. At this point, we know the frame size. The next step is to read the frame data itself. And for that, there is already logic to work the socket until all the data is read. For SSL specifically, there is no concern that the SSL buffers can keep pending data forever, because when the entire data frame arrives, there won't be any incomplete SSL bytes. In any case, the code should work generically for all socket types. ---
[GitHub] thrift issue #1497: Do not call workSocket() in TNonblockigServer without en...
Github user bgedik commented on the issue: https://github.com/apache/thrift/pull/1497 @jeking3 The ``client_pool_tests`` failed in 2 of the bots within Travis-CI. I don't think it is related to my change. Are these known to fail every now and then? ``` thrift.transport.base.TTransportException@src/thrift/transport/socket.d(255): Failed to connect to 127.0.0.1:9090. ??:? void thrift.transport.socket.TSocket.open() [0x29e331ca] ??:? int thrift.codegen.client_pool.TClientPool!(client_pool_test.ExTestService).TClientPool.executeOnPool!(int).executeOnPool(scope int delegate(thrift.codegen.client.TClientBase!(client_pool_test.ExTestService).TClientBase)) [0x29e204ca] ??:? int thrift.codegen.client_pool.TClientPool!(client_pool_test.ExTestService).TClientPool.getPort() [0x29e08916] ??:? void client_pool_test.syncClientPoolTest(const(ushort)[], client_pool_test.ExTestHandler[]) [0x29df090a] ??:? _Dmain [0x29df077b] thrift.base.TCompoundOperationException@src/thrift/transport/socket.d(256): All addresses tried failed (127.0.0.1:9090: "Unable to connect socket: Connection refused", 127.0.0.1:9090: "Unable to connect socket: Connection refused", 127.0.0.1:9090: "Unable to connect socket: Connection refused"). FAIL: client_pool_test ``` ---
[GitHub] thrift issue #1476: Remove premature call to workSocket() in TNonblockingSer...
Github user bgedik commented on the issue: https://github.com/apache/thrift/pull/1476 @jeking3 Here: https://github.com/apache/thrift/pull/1497 ---
[GitHub] thrift pull request #1497: Remove workSocket call that is too early
GitHub user bgedik opened a pull request: https://github.com/apache/thrift/pull/1497 Remove workSocket call that is too early Make the work socket conditional Revert back the changes Only call workSocket() when there is pending data to read Minor update to a comment Only call workSocket() when there is pending data to read Only call workSocket() when there is pending data to read Fix the CMake build Fix the CMake build Fix the CMake build Use the correct type for ioctlsocket call Make non-blocking peek optional Revert "Make non-blocking peek optional" This reverts commit 501c4401ddb93cf9bf451dd289d4c3768c0a7b15. Review fixes Review fixes Review fixes You can merge this pull request into a Git repository by running: $ git pull https://github.com/bgedik/thrift bugfix/non-blocking-server-calls-workSocket-too-early Alternatively you can review and apply these changes as the patch at: https://github.com/apache/thrift/pull/1497.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1497 commit 6665029561d40d5b89b7ee9cb2487f6c905b904a Author: Bugra Gedik Date: 2018-01-21T17:43:49Z remove workSocket call that is too early Make the work socket conditional Revert back the changes Only call workSocket() when there is pending data to read Minor update to a comment Only call workSocket() when there is pending data to read Only call workSocket() when there is pending data to read Fix the CMake build Fix the CMake build Fix the CMake build Use the correct type for ioctlsocket call Make non-blocking peek optional Revert "Make non-blocking peek optional" This reverts commit 501c4401ddb93cf9bf451dd289d4c3768c0a7b15. Review fixes Review fixes Review fixes ---
[GitHub] thrift pull request #1476: Remove premature call to workSocket() in TNonbloc...
GitHub user bgedik reopened a pull request: https://github.com/apache/thrift/pull/1476 Remove premature call to workSocket() in TNonblockingServer You can merge this pull request into a Git repository by running: $ git pull https://github.com/bgedik/thrift bugfix/non-blocking-server-calls-workSocket-too-early Alternatively you can review and apply these changes as the patch at: https://github.com/apache/thrift/pull/1476.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1476 commit f3c682e9c41d67d8cd1ab2d75f6259b6e0af40da Author: Bugra Gedik Date: 2018-01-21T17:43:49Z remove workSocket call that is too early commit e115634a2a9844d08a6d017859e3d838414d2194 Author: Bugra Gedik Date: 2018-01-22T00:35:31Z Make the work socket conditional commit 8e5ba48c9d62ee4ae79aac5f2cc4f98d2f535866 Author: Bugra Gedik Date: 2018-01-22T01:03:04Z Revert back the changes commit 017d242479c985a9cb476bcd2da22abaa62ed052 Author: Bugra Gedik Date: 2018-01-22T18:52:33Z Only call workSocket() when there is pending data to read commit c929047d4a43c3969cc40b204e3717877667edb2 Author: Bugra Gedik Date: 2018-01-22T19:29:12Z Minor update to a comment commit fb72b94b513f665e49488d8aaee7eda3f4b57a94 Author: Bugra Gedik Date: 2018-01-22T20:51:55Z Only call workSocket() when there is pending data to read commit 4777cab48641d23bd2246248a07a7a4249a3709f Author: Bugra Gedik Date: 2018-01-22T21:43:04Z Only call workSocket() when there is pending data to read commit f69e91be02275c8a06f489030fcda24449329ce2 Author: Bugra Gedik Date: 2018-01-22T23:44:10Z Fix the CMake build commit 84051190db3234468ffed11a2a61ab8ba6d7e1fd Author: Bugra Gedik Date: 2018-01-22T23:49:27Z Fix the CMake build commit 7b60db3a5535195ec6096fea8ead757266d716b5 Author: Bugra Gedik Date: 2018-01-22T23:58:34Z Fix the CMake build commit 081096091cc828dd7dbcf0962cdaa6c791317e0c Author: Bugra Gedik Date: 2018-01-23T00:14:09Z Use the correct type for ioctlsocket call commit 501c4401ddb93cf9bf451dd289d4c3768c0a7b15 Author: Bugra Gedik Date: 2018-01-24T19:19:57Z Make non-blocking peek optional commit a9dc3f118acb820189696bd13735444ad2d129a3 Author: Bugra Gedik Date: 2018-01-24T20:46:40Z Revert "Make non-blocking peek optional" This reverts commit 501c4401ddb93cf9bf451dd289d4c3768c0a7b15. commit c8d422a633a6b848982c48cb769285906f2ea027 Author: Bugra Gedik Date: 2018-01-24T21:12:35Z Review fixes commit d64a8e0d4709ece4086f054fae7c1c98e3a0ca77 Author: Bugra Gedik Date: 2018-01-24T21:15:14Z Review fixes commit 83d3c13f09238831e2d06e8b67a018acbcc656c4 Author: Bugra Gedik Date: 2018-01-24T21:16:28Z Review fixes ---
[GitHub] thrift pull request #1476: Remove premature call to workSocket() in TNonbloc...
Github user bgedik closed the pull request at: https://github.com/apache/thrift/pull/1476 ---
[GitHub] thrift pull request #1476: Remove premature call to workSocket() in TNonbloc...
Github user bgedik closed the pull request at: https://github.com/apache/thrift/pull/1476 ---
[GitHub] thrift issue #1476: Remove premature call to workSocket() in TNonblockingSer...
Github user bgedik commented on the issue: https://github.com/apache/thrift/pull/1476 @jeking3 anything I can do to get this into a mergable state? ---
[GitHub] thrift pull request #1476: Remove premature call to workSocket() in TNonbloc...
Github user bgedik commented on a diff in the pull request: https://github.com/apache/thrift/pull/1476#discussion_r163681677 --- Diff: lib/cpp/src/thrift/transport/TSSLSocket.h --- @@ -78,6 +78,7 @@ class TSSLSocket : public TSocket { bool peek(); void open(); void close(); + bool hasPendingDataToRead(); --- End diff -- Sam applies to ``read``. This goes back to my original point. TSocket should behave differently for blocking vs non-blocking socket. Look at how ``TSocket`` and ``TSSLSocket`` handle ``read``s on a non-blocking socket. They throw exceptions. Luckily for me, when not using SSL, ``TNonBlockingServer`` does not ever call ``read`` on a socket that is not ready to consume data. However, that is not the case for ``SSL``. Read calls in ``TNonBlockingServer`` are retried by catching the exception and searching for ``"retry"`` in the exception message. The correct way to fix all this mess is to make ``TSocket`` aware of whether the socket is blocking and non-blocking and behave accordingly. Converting EAGAIN to an exception in the case of non-blocking sockets is a major performance hit. My suggesting is to handle the issue of making ``TSocket`` and ``TSSLSocket`` aware of the blocking/non-blocking nature of the socket in a separate Jira issue. ---
[GitHub] thrift pull request #1476: Remove premature call to workSocket() in TNonbloc...
Github user bgedik commented on a diff in the pull request: https://github.com/apache/thrift/pull/1476#discussion_r163679874 --- Diff: lib/cpp/src/thrift/transport/TSSLSocket.cpp --- @@ -249,6 +249,17 @@ TSSLSocket::~TSSLSocket() { close(); } +bool TSSLSocket::hasPendingDataToRead() { + if (!isOpen()) { +return false; + } + initializeHandshake(); + if (!checkHandshake()) +throw TSSLException("SSL_peek: Handshake is not completed"); + // data may be available in SSL buffers (note: SSL_pending does not have a failure mode) + return TSocket::hasPendingDataToRead() || SSL_pending(ssl_) > 0; --- End diff -- Right. Nice catch. ---
[GitHub] thrift pull request #1476: Remove premature call to workSocket() in TNonbloc...
Github user bgedik commented on a diff in the pull request: https://github.com/apache/thrift/pull/1476#discussion_r163673632 --- Diff: lib/cpp/src/thrift/transport/TSSLSocket.cpp --- @@ -249,6 +249,17 @@ TSSLSocket::~TSSLSocket() { close(); } +bool TSSLSocket::hasPendingDataToRead() { + if (!isOpen()) { +return false; + } + initializeHandshake(); + if (!checkHandshake()) +throw TSSLException("SSL_peek: Handshake is not completed"); --- End diff -- Fixed. ---
[GitHub] thrift issue #1476: Remove premature call to workSocket() in TNonblockingSer...
Github user bgedik commented on the issue: https://github.com/apache/thrift/pull/1476 Ok, reverted back. ---
[GitHub] thrift pull request #1476: Remove premature call to workSocket() in TNonbloc...
Github user bgedik commented on a diff in the pull request: https://github.com/apache/thrift/pull/1476#discussion_r163651048 --- Diff: lib/cpp/src/thrift/transport/TSSLSocket.h --- @@ -78,6 +78,7 @@ class TSSLSocket : public TSocket { bool peek(); void open(); void close(); + bool hasPendingDataToRead(); --- End diff -- I implemented option #2. In terms of your matrix choice suggestion: How can we make that happen? Is that something I can help with? I am bothered by the TNonblockingSeverTest not catching the problem reported in this bug. The moment we switched to 0.11, it started throwing unexpected exceptions for us, when using Java client against C++ server. This should have been caught by the existing tests, but it did not happen. I would like to help with this. So if you have a suggestion for me, please let me know. ---
[GitHub] thrift pull request #1476: Remove premature call to workSocket() in TNonbloc...
Github user bgedik commented on a diff in the pull request: https://github.com/apache/thrift/pull/1476#discussion_r163640014 --- Diff: lib/cpp/src/thrift/transport/TSSLSocket.h --- @@ -78,6 +78,7 @@ class TSSLSocket : public TSocket { bool peek(); void open(); void close(); + bool hasPendingDataToRead(); --- End diff -- @jeking3 You said it would be preferable to have one way to ask, "is there any data I can read" that does not block. I completely agree. But the existing ``peek`` method is not designed to be non-blocking. So I have a few choices here: * Update the doxygen comment of ``peek`` to reflect what it does and keep ``hasPendingDataToRead`` * Add an optional argument to peek that says ``nonBlocking=false`` and if it is provided as ``true``, do what ``hasPendingDataToRead`` used to do and remove ``hasPendingDataToRead``. * Change ``peek`` to be always non-blocking. How about #2 above? I think #3 is not easy, as it will require code changes in other places that I am not comfortable with. ---
[GitHub] thrift pull request #1476: Remove premature call to workSocket() in TNonbloc...
Github user bgedik commented on a diff in the pull request: https://github.com/apache/thrift/pull/1476#discussion_r163420333 --- Diff: lib/cpp/src/thrift/transport/TSSLSocket.h --- @@ -78,6 +78,7 @@ class TSSLSocket : public TSocket { bool peek(); void open(); void close(); + bool hasPendingDataToRead(); --- End diff -- Peek is designed to be blocking for blocking sockets, non-blocking for non-blocking sockets. This one is non-blocking for both. I can replace the existing peek() with this implementation if you prefer that. ---
[GitHub] thrift issue #1476: Remove premature call to workSocket() in TNonblockingSer...
Github user bgedik commented on the issue: https://github.com/apache/thrift/pull/1476 @jeking3 This is ready for review now. ---
[GitHub] thrift issue #1476: Remove premature call to workSocket() in TNonblockingSer...
Github user bgedik commented on the issue: https://github.com/apache/thrift/pull/1476 @jeking3 My latest fix worked fine, but failed to compile on windows. I'll rework to something that works for all platforms. ---
[GitHub] thrift issue #1476: Remove premature call to workSocket() in TNonblockingSer...
Github user bgedik commented on the issue: https://github.com/apache/thrift/pull/1476 The following tests FAILED: 16 - concurrency_test (Timeout) 20 - TNonblockingSSLServerTest (Timeout) I think the failure in {{TNonblockingSSLServerTest}} could be related. concurrency_test passes locally for me, so I am guessing that there may be a different problem with it. ---
[GitHub] thrift pull request #1476: remove workSocket call that is too early
GitHub user bgedik opened a pull request: https://github.com/apache/thrift/pull/1476 remove workSocket call that is too early You can merge this pull request into a Git repository by running: $ git pull https://github.com/bgedik/thrift bugfix/non-blocking-server-calls-workSocket-too-early Alternatively you can review and apply these changes as the patch at: https://github.com/apache/thrift/pull/1476.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1476 commit f3c682e9c41d67d8cd1ab2d75f6259b6e0af40da Author: Bugra Gedik Date: 2018-01-21T17:43:49Z remove workSocket call that is too early ---
[GitHub] thrift pull request #1196: THRIFT-3891 TNonblockingServer configured with mo...
Github user bgedik commented on a diff in the pull request: https://github.com/apache/thrift/pull/1196#discussion_r102025490 --- Diff: lib/cpp/src/thrift/server/TNonblockingServer.cpp --- @@ -1566,24 +1562,26 @@ void TNonblockingIOThread::setCurrentThreadHighPriority(bool value) { } void TNonblockingIOThread::run() { - if (eventBase_ == NULL) + if (eventBase_ == NULL) { registerEvents(); - - GlobalOutput.printf("TNonblockingServer: IO thread #%d entering loop...", number_); - + } if (useHighPriority_) { setCurrentThreadHighPriority(true); } - // Run libevent engine, never returns, invokes calls to eventHandler - event_base_loop(eventBase_, 0); + if (eventBase_ != NULL) + { +GlobalOutput.printf("TNonblockingServer: IO thread #%d entering loop...", number_); +// Run libevent engine, never returns, invokes calls to eventHandler +event_base_loop(eventBase_, 0); - if (useHighPriority_) { -setCurrentThreadHighPriority(false); - } +if (useHighPriority_) { + setCurrentThreadHighPriority(false); +} - // cleans up our registered events - cleanupEvents(); +// cleans up our registered events +cleanupEvents(); + } GlobalOutput.printf("TNonblockingServer: IO thread #%d run() done!", number_); --- End diff -- Thanks, but for a message like {{"TNonblockingServer: IO thread #0 run() done!"}} is there any kind of metadata that tells me if this is a info message or a warning/error message. It seems to be being output without any information attached to it. That makes it hard to filter the messages that we consider unnecessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] thrift pull request #1196: THRIFT-3891 TNonblockingServer configured with mo...
Github user bgedik commented on a diff in the pull request: https://github.com/apache/thrift/pull/1196#discussion_r102023002 --- Diff: lib/cpp/src/thrift/server/TNonblockingServer.cpp --- @@ -1566,24 +1562,26 @@ void TNonblockingIOThread::setCurrentThreadHighPriority(bool value) { } void TNonblockingIOThread::run() { - if (eventBase_ == NULL) + if (eventBase_ == NULL) { registerEvents(); - - GlobalOutput.printf("TNonblockingServer: IO thread #%d entering loop...", number_); - + } if (useHighPriority_) { setCurrentThreadHighPriority(true); } - // Run libevent engine, never returns, invokes calls to eventHandler - event_base_loop(eventBase_, 0); + if (eventBase_ != NULL) + { +GlobalOutput.printf("TNonblockingServer: IO thread #%d entering loop...", number_); +// Run libevent engine, never returns, invokes calls to eventHandler +event_base_loop(eventBase_, 0); - if (useHighPriority_) { -setCurrentThreadHighPriority(false); - } +if (useHighPriority_) { + setCurrentThreadHighPriority(false); +} - // cleans up our registered events - cleanupEvents(); +// cleans up our registered events +cleanupEvents(); + } GlobalOutput.printf("TNonblockingServer: IO thread #%d run() done!", number_); --- End diff -- Is there any way to control the outputting of these messages? When we run with TNonblockingServer, we get these output. I couldn't find a way to mask them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] thrift pull request #1080: THRIFT-3891 TNonblockingServer configured with mo...
Github user bgedik commented on a diff in the pull request: https://github.com/apache/thrift/pull/1080#discussion_r97798270 --- Diff: lib/cpp/src/thrift/server/TNonblockingServer.cpp --- @@ -1524,7 +1532,7 @@ void TNonblockingIOThread::breakLoop(bool error) { } // sets a flag so that the loop exits on the next event - event_base_loopbreak(eventBase_); + event_base_loopexit(eventBase_, 0); --- End diff -- I created the JIRA entry, with a patch that works for us (in production and testing). I know it is not the ideal fix, but I was hoping that someone who is more versed in the ``TNonblockingServer`` implementation can work it. I don't have the time to dig into the design of ``TNonblockingServer``. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] thrift pull request #1080: THRIFT-3891 TNonblockingServer configured with mo...
Github user bgedik commented on a diff in the pull request: https://github.com/apache/thrift/pull/1080#discussion_r97712737 --- Diff: lib/cpp/src/thrift/server/TNonblockingServer.cpp --- @@ -1524,7 +1532,7 @@ void TNonblockingIOThread::breakLoop(bool error) { } // sets a flag so that the loop exits on the next event --- End diff -- I guess the comment is stale now. The loop now exits once the current iteration is complete. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] thrift pull request #1080: THRIFT-3891 TNonblockingServer configured with mo...
Github user bgedik commented on a diff in the pull request: https://github.com/apache/thrift/pull/1080#discussion_r97712383 --- Diff: lib/cpp/src/thrift/server/TNonblockingServer.cpp --- @@ -1524,7 +1532,7 @@ void TNonblockingIOThread::breakLoop(bool error) { } // sets a flag so that the loop exits on the next event - event_base_loopbreak(eventBase_); + event_base_loopexit(eventBase_, 0); --- End diff -- There are some subtle differences between the two. The first difference is what you already mentioned. The second difference is that, event_base_loopbreak will have no effect when there is no loop running: http://www.wangafu.net/~nickm/libevent-book/Ref3_eventloop.html. In contrast, event_base_loopexit is not `lost` if it is called without having a loop running. In the TNonblockingServer code, event_base_loopbreak is called without checking if the loop started running. An alternative fix would be to keep the event_base_loopbreak, but somehow introduce additional synchronization to make sure it is not lost. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---