[ 
https://issues.apache.org/jira/browse/TS-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheer Vinukonda updated TS-3085:
----------------------------------
    Attachment: TS-3085.diff

> Large POSTs over (relatively) slower connections failing in ats5
> ----------------------------------------------------------------
>
>                 Key: TS-3085
>                 URL: https://issues.apache.org/jira/browse/TS-3085
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: SSL
>    Affects Versions: 5.0.1
>            Reporter: Sudheer Vinukonda
>            Assignee: Sudheer Vinukonda
>              Labels: yahoo
>             Fix For: 5.2.0
>
>         Attachments: TS-3085.diff
>
>
> We ran into a production issue where large POSTs (30MB or high) are failing 
> over slower connection speeds after ats5 roll out (the problem could be 
> easily reproduced using a charles proxy with throttling enabled). 
> Further debugging isolated the issue to uploads over SSL connections and 
> after a lot of debugging the issue appears to be the below:
> ATS calls SSL_read() followed by SSL_get_error() to check if there was any 
> error in the read. This is repeated until either the complete data is read or 
> an error occurs. However, from the openssl documentation, it is recommended 
> to call ERR_clear_error() prior to calling SSL_read() + SSL_get_error() to 
> ensure the error queue is clean of any leftover/garbage errors.  It's not 
> clear what might be corrupting the error queue of the SSL context in a tight 
> loop - possibly, some new feature in ats5. In any case, calling 
> ERR_clear_error() is a good idea and adding this seems to resolve the post 
> failures.
> Documentation from openSSL and some related notes on stackoverflow:
> https://www.openssl.org/docs/ssl/SSL_get_error.html
> http://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error
> {code}
> "SSL_get_error() returns a result code (suitable for the C ``switch''
> statement) for a preceding call to SSL_connect(), SSL_accept(),
> SSL_do_handshake(), SSL_read(), SSL_peek(), or SSL_write() on ssl. The value
> returned by that TLS/SSL I/O function must be passed to SSL_get_error() in
> parameter ret.
> In addition to ssl and ret, SSL_get_error() inspects the current thread's
> OpenSSL error queue. Thus, SSL_get_error() must be used in the same thread 
> that
> performed the TLS/SSL I/O operation, and no other OpenSSL function calls 
> should
> appear in between. The current thread's error queue must be empty before the
> TLS/SSL I/O operation is attempted, or SSL_get_error() will not work 
> reliably."
> "SSL_get_error does not call ERR_get_error. So if you just call SSL_get_error,
> the error stays in the queue.
> You should be calling ERR_clear_error prior to ANY SSL-call(SSL_read, 
> SSL_write
> etc) that is followed by SSL_get_error, otherwise you may be reading an old
> error that occurred previously in the current thread."
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to