[ https://issues.apache.org/jira/browse/TS-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615122#comment-14615122 ]
Sudheer Vinukonda commented on TS-3714: --------------------------------------- AFAICT, the commit is on master and it's neither on 5.3.x nor 6.0.0. I will work on repurposing the handshakeReader when I get time. In the mean time, pls feel free to do the changes if you have time. For SPDY/H2, I agree it might be a problem for the very first request (there's a TLS handshake only for the very first request). I haven't seen the issue specifically though, when I tested with SPDY on prod. In any case, I'll work on fixing that part with a new jira. > TS seems to stall between ua_begin and ua_first_read on some transactions > resulting in high response times. > ----------------------------------------------------------------------------------------------------------- > > Key: TS-3714 > URL: https://issues.apache.org/jira/browse/TS-3714 > Project: Traffic Server > Issue Type: Bug > Components: Core > Affects Versions: 5.3.0 > Reporter: Sudheer Vinukonda > Assignee: Sudheer Vinukonda > Fix For: 6.0.0 > > > An example slow log showing very high *ua_first_read*. > {code} > ERROR: [8624075] Slow Request: client_ip: xx.xx.xx.xxx url: > http://xxxxxxxxxxxxxxxxxx.... status: 200 unique id: bytes: 86 fd: 0 client > state: 0 serv > er state: 9 ua_begin: 0.000 ua_first_read: 42.224 ua_read_header_done: 42.224 > cache_open_rea > d_begin: 42.224 cache_open_read_end: 42.224 dns_lookup_begin: 42.224 > dns_lookup_end: 42.224 > server_connect: 42.224 server_first_read: 42.229 server_read_header_done: > 42.229 server_clos > e: 42.229 ua_begin_write: 42.229 ua_close: 42.229 sm_finish: 42.229 > {code} > Initially, I suspected that it might be caused by browser's connecting early > before sending any bytes to TS. However, this seems to be happening too > frequently and with unrealistically high delay between *ua_begin* and > *ua_first_read*. > I suspect it's caused due to the code that disables the read temporarily > before calling *TXN_START* hook and re-enables it after the API call out. The > read disable is done via a 0-byte *do_io_read* on the client vc, but, the > problem is that a valid *mbuf* is still passed. Based on what I am seeing, > it's possible this results in actually enabling the *read_vio* all the way > uptil *ssl_read_from_net* for instance (if there's a race condition and there > were bytes already from the client resulting in an epoll read event), which > would finally disable the read since, the *ntodo* (nbytes) is 0. However, > this may result in the epoll event being lost until a new i/o happens from > the client. I'm trying out further experiments to confirm the theory. In most > cases, the read buffer already has some bytes by the time the http session > and http sm is created, which makes it just work. But, if there's a slight > delay in the receiving bytes after making a connection, the epoll mechanism > should read it, but, due to the way the temporary read disable is being done, > the event may be lost (this is coz, ATS uses the epoll edge triggered mode). > Some history on this issue - > This issue has been a problem for a long time and affects both http and https > requests. When this issue was first reported, our router operations team > eventually closed it indicating that disabling *accept* threads resolved it > ([~yzlai] also reported similar observations and conclusions). It's possible > that the race condition window may be slightly reduced by disabling accept > threads, but, to the overall performance and through put, accept threads are > very important. I now notice that the issue still exists (regardless of > whether or not accept threads are enabled/disabled) and am testing further to > confirm the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)