[
https://issues.apache.org/jira/browse/TS-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608274#comment-14608274
]
ASF GitHub Bot commented on TS-3714:
------------------------------------
Github user shinrich commented on the pull request:
https://github.com/apache/trafficserver/pull/237#issuecomment-117186342
The timeout logic looks good. And I agree with your comment to Brian that
it could probably moved into a separate commit. Not a lot of code, but this is
all tricky logic, so separation might be worth it.
The main thing here seems to be the logic to ensure that any data sent
along with the last handshake record is processed immediately. What you have
seems to do the trick.
However, I wonder if we could get by with fewer additional mechanisms
(buffers and bits) by using the trigger bit? When the client session starts
(all them them, Spdy, H2, and Http), we set read.vio.triggered to true. The
first non-zero do_io_read will add the vc to the read_ready_list and set
read_vio.enabled to true. The next event poll will have a timeout of 0 and
return immediately. The new VC will be processed when the read_ready_list is
walked. Since enabled and triggered will be true a READ_READY event will be
sent. At least that is my reading of things.
The downside is that in many cases you would be issuing a useless
READ_READY at the start of each session. Perhaps one could do a pending check
before setting the trigger bit.
Also due to the nesting of SPDY/H2 and Http clients, you might set the
trigger bit unnecessarily on the nested Http clients. Any lingering SSL data
would have been dealt with in the SPDY/H2 client session.
> TS seems to stall between ua_begin and ua_first_read on some transactions
> resulting in high response times.
> -----------------------------------------------------------------------------------------------------------
>
> Key: TS-3714
> URL: https://issues.apache.org/jira/browse/TS-3714
> Project: Traffic Server
> Issue Type: Bug
> Components: Core
> Affects Versions: 5.3.0
> Reporter: Sudheer Vinukonda
> Assignee: Sudheer Vinukonda
>
> An example slow log showing very high *ua_first_read*.
> {code}
> ERROR: [8624075] Slow Request: client_ip: xx.xx.xx.xxx url:
> http://xxxxxxxxxxxxxxxxxx.... status: 200 unique id: bytes: 86 fd: 0 client
> state: 0 serv
> er state: 9 ua_begin: 0.000 ua_first_read: 42.224 ua_read_header_done: 42.224
> cache_open_rea
> d_begin: 42.224 cache_open_read_end: 42.224 dns_lookup_begin: 42.224
> dns_lookup_end: 42.224
> server_connect: 42.224 server_first_read: 42.229 server_read_header_done:
> 42.229 server_clos
> e: 42.229 ua_begin_write: 42.229 ua_close: 42.229 sm_finish: 42.229
> {code}
> Initially, I suspected that it might be caused by browser's connecting early
> before sending any bytes to TS. However, this seems to be happening too
> frequently and with unrealistically high delay between *ua_begin* and
> *ua_first_read*.
> I suspect it's caused due to the code that disables the read temporarily
> before calling *TXN_START* hook and re-enables it after the API call out. The
> read disable is done via a 0-byte *do_io_read* on the client vc, but, the
> problem is that a valid *mbuf* is still passed. Based on what I am seeing,
> it's possible this results in actually enabling the *read_vio* all the way
> uptil *ssl_read_from_net* for instance (if there's a race condition and there
> were bytes already from the client resulting in an epoll read event), which
> would finally disable the read since, the *ntodo* (nbytes) is 0. However,
> this may result in the epoll event being lost until a new i/o happens from
> the client. I'm trying out further experiments to confirm the theory. In most
> cases, the read buffer already has some bytes by the time the http session
> and http sm is created, which makes it just work. But, if there's a slight
> delay in the receiving bytes after making a connection, the epoll mechanism
> should read it, but, due to the way the temporary read disable is being done,
> the event may be lost (this is coz, ATS uses the epoll edge triggered mode).
> Some history on this issue -
> This issue has been a problem for a long time and affects both http and https
> requests. When this issue was first reported, our router operations team
> eventually closed it indicating that disabling *accept* threads resolved it
> ([~yzlai] also reported similar observations and conclusions). It's possible
> that the race condition window may be slightly reduced by disabling accept
> threads, but, to the overall performance and through put, accept threads are
> very important. I now notice that the issue still exists (regardless of
> whether or not accept threads are enabled/disabled) and am testing further to
> confirm the issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)