[ 
https://issues.apache.org/jira/browse/TS-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheer Vinukonda reassigned TS-3226:
-------------------------------------

    Assignee: Sudheer Vinukonda

> SSL data not read from the socket sometimes causing transactions to timeout
> ---------------------------------------------------------------------------
>
>                 Key: TS-3226
>                 URL: https://issues.apache.org/jira/browse/TS-3226
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: SSL
>    Affects Versions: 5.1.1
>            Reporter: Sudheer Vinukonda
>            Assignee: Sudheer Vinukonda
>             Fix For: 5.2.0
>
>
> We have had a really long standing problem where some of our origins were 
> complaining of receiving POST requests with non-zero content-length header, 
> but, no body (or sometimes, partial body). Due to the way our network was 
> setup, this problem was not easy to be isolated due to the multiple hops 
> along the way. The post body could be lost anywhere along the path (e.g. 
> client, dns, routers/vips, edge, data center etc). After a lot of debugging 
> and with the help of some custom-built wire traces for SSL, we managed to 
> isolate the problem to our ATS hosts running on our edge layer. From the wire 
> traces, we could see that, the post body is coming in alright, but is just 
> sitting in the socket and not being read by the post ua tunnel producer.
> After further investigation, it seems that the producer is issuing the 
> correct do_io_read for the required number of bytes, but, there seems to be a 
> bug in the {{SSLNetVConnection::net_read_io}}, where the ntodo is being 
> calculated before acquiring the mutex on the read vio.
> https://github.com/apache/trafficserver/blob/master/iocore/net/SSLNetVConnection.cc#L391
> Instrumenting the code with further debug traces showed that, in the failed 
> transactions, I am noticing the ntodo being "0" when determined before the 
> mutex, whereas the (s->vio.nbytes - s->vio.ndone) is non-zero after the 
> mutex. I am not sure to understand how the nbytes on the read vio object can 
> be different before acquiring mutex, but, moving the ntodo calculation after 
> mutex seems to have resolved the problem. Note that this is how it is done in 
> the corresponding function {{read_from_net}} in {{UnixNetVConnection}}.
> Talking to [~amc] on the IRC, it seems that the mutex is needed coz, the 
> {{SSLNetVConnection::net_read_io}} could also be triggered by an incoming 
> socket data before the {{UnixNetVConnection::do_io_read}} could trigger it 
> and that could mess up the nbytes/ndone in the read vio.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to