[ https://issues.apache.org/jira/browse/TS-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139222#comment-14139222 ]
Sudheer Vinukonda commented on TS-3085: --------------------------------------- Per Leif's suggestion on a different jira, I've marked this for 5.2 and added a back port to 5.1.1, but, this is a blocker for our ats5 roll out, and, perhaps, whoever has use cases involving large POSTs may need to cherry pick the fix. > Large POSTs over (relatively) slower connections failing in ats5 > ---------------------------------------------------------------- > > Key: TS-3085 > URL: https://issues.apache.org/jira/browse/TS-3085 > Project: Traffic Server > Issue Type: Bug > Components: SSL > Affects Versions: 5.0.1 > Reporter: Sudheer Vinukonda > Assignee: Sudheer Vinukonda > Labels: yahoo > Fix For: 5.2.0 > > > We ran into a production issue where large POSTs (30MB or high) are failing > over slower connection speeds after ats5 roll out (the problem could be > easily reproduced using a charles proxy with throttling enabled). > Further debugging isolated the issue to uploads over SSL connections and > after a lot of debugging the issue appears to be the below: > ATS calls SSL_read() followed by SSL_get_error() to check if there was any > error in the read. This is repeated until either the complete data is read or > an error occurs. However, from the openssl documentation, it is recommended > to call ERR_clear_error() prior to calling SSL_read() + SSL_get_error() to > ensure the error queue is clean of any leftover/garbage errors. It's not > clear what might be corrupting the error queue of the SSL context in a tight > loop - possibly, some new feature in ats5. In any case, calling > ERR_clear_error() is a good idea and adding this seems to resolve the post > failures. > Documentation from openSSL and some related notes on stackoverflow: > https://www.openssl.org/docs/ssl/SSL_get_error.html > http://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error > {code} > "SSL_get_error() returns a result code (suitable for the C ``switch'' > statement) for a preceding call to SSL_connect(), SSL_accept(), > SSL_do_handshake(), SSL_read(), SSL_peek(), or SSL_write() on ssl. The value > returned by that TLS/SSL I/O function must be passed to SSL_get_error() in > parameter ret. > In addition to ssl and ret, SSL_get_error() inspects the current thread's > OpenSSL error queue. Thus, SSL_get_error() must be used in the same thread > that > performed the TLS/SSL I/O operation, and no other OpenSSL function calls > should > appear in between. The current thread's error queue must be empty before the > TLS/SSL I/O operation is attempted, or SSL_get_error() will not work > reliably." > "SSL_get_error does not call ERR_get_error. So if you just call SSL_get_error, > the error stays in the queue. > You should be calling ERR_clear_error prior to ANY SSL-call(SSL_read, > SSL_write > etc) that is followed by SSL_get_error, otherwise you may be reading an old > error that occurred previously in the current thread." > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)