[jira] [Work logged] (TS-4796) ATS not closing origin connections on first RST from client

ASF GitHub Bot (JIRA) Thu, 01 Sep 2016 19:10:38 -0700

     [ 
https://issues.apache.org/jira/browse/TS-4796?focusedWorklogId=27886&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-27886
 ]


ASF GitHub Bot logged work on TS-4796:
--------------------------------------

                Author: ASF GitHub Bot
            Created on: 02/Sep/16 02:09
            Start Date: 02/Sep/16 02:09
    Worklog Time Spent: 10m 
      Work Description: Github user oknet commented on the issue:

    https://github.com/apache/trafficserver/pull/947
  
    EVENTIO_ERROR  means EPOLLHUP | EPOLLERR | EPOLLPRI.
    
    EPOLLPRI means OOB or TCP URG is set. You will always receive EPOLLPRI with 
EPOLLIN.
    To EPOLLPRI, we need handle READ first and ignore EPOLLPRI, this is what 
NetHandler does.
    
    EPOLLHUP & EPOLLRDHUP
    reference : 
http://stackoverflow.com/questions/8707458/epoll-and-remote-1-way-shutdown
    ```
    A socket listening for epoll events will typically receive an EPOLLRDHUP 
(in addition to EPOLLIN) event flag upon the remote peer calling close or 
shutdown(SHUT_WR). This does not neccessarily mean the socket is dead. 
Subsequent calls to recv() will return any unread data on the socket and 
eventually "0" will be returned to indicate EOF. It may even be possible to 
send data back if the remote peer only did a half-close of its socket.
    
    The one notable exception is if the remote peer is using the SO_LINGER 
option enabled on its socket with a linger value of "0". The result of closing 
such a socket may result in a TCP RST getting sent instead of a FIN. From what 
I've read, a connection reset event will generate either a EPOLLHUP or 
EPOLLERR. (I haven't had time to confirm, but it makes sense).
    
    There is some documentation to suggest there are older Linux 
implementations that don't support EPOLLRDHUP, as such EPOLLHUP gets generated 
instead.
    
    And for what it is worth, in my particular case, I found that it is not too 
interesting to have code that special cases EPOLLHUP or EPOLLRDHUP events. 
Instead, just treat these events the same as EPOLLIN/EPOLLOUT and call recv() 
(or send() as appropriate). But pay close attention to return codes returned 
back from recv() and send().
    ```
    
    EPOLLERR means the possible non-fatal errors on socket fd such as EAGAIN, 
EINTR, EWOULDBLOCK and fatal errors.
    
    When you receive EPLLERR, it means an error of socketfd and also there may 
be data before this error. Therefore we should call read() and write() to 
figure out the actual meanning of this error
    
    Currently, NetHandler try to perform read() & write() on the socket fd 
first.
    We will get non-fatal errors or fatal erros from read() or write().
    if it is non-fatal error, just put socket fd into wait list.
    if it is fatal error, signal SM to close socket fd.
    e.g.
    There is a Fatal ERROR if EPIPE is returned from write().
    If "0" is returned from read() there is EOF.
    
    So the currently implement of NetHandler is enough to handle all of this 
and doesn't need to change.


Issue Time Tracking
-------------------

    Worklog Id:     (was: 27886)
    Time Spent: 2h 50m  (was: 2h 40m)

> ATS not closing origin connections on first RST from client
> -----------------------------------------------------------
>
>                 Key: TS-4796
>                 URL: https://issues.apache.org/jira/browse/TS-4796
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: HTTP
>            Reporter: Thomas Jackson
>            Assignee: Thomas Jackson
>          Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> *TLDR; similar to TS-4720 -- slower to close than it should, instead of never 
> closing*
> As a continuation of TS-4720, while testing that the session is closed when 
> we expect-- I found that it isn't.
> Although we are now closing the sessions, we aren't doing it as quickly as we 
> should. In this client abort case we expect the client to abort, and ATS 
> should initially continue to send bytes to the client-- as we are in the 
> half-open state. After the first set of bytes are sent to the client-- the 
> client will send an RST-- which should signal ATS to stop sending the request 
> (and tear down the origin connection etc.).
> I'm able to reproduce this locally, and the debug output (with some 
> additional comments) looks like below:
> {code}
> < FIN FROM CLIENT >
> [Aug 29 18:25:07.491] Server {0x7effa538a800} DEBUG: <HttpSM.cc:2649 
> (main_handler)> (http) [0] [HttpSM::main_handler, VC_EVENT_EOS]
> [Aug 29 18:25:07.491] Server {0x7effa538a800} DEBUG: <HttpSM.cc:892 
> (state_watch_for_client_abort)> (http) [0] 
> [&HttpSM::state_watch_for_client_abort, VC_EVENT_EOS]
> < RST FROM CLIENT >
> Got an HttpTunnel event 100 
> [Aug 29 18:25:13.062] Server {0x7effa538a800} DEBUG: <HttpTunnel.cc:1173 
> (producer_handler)> (http_tunnel) [0] producer_handler [http server 
> VC_EVENT_READ_READY]
> [Aug 29 18:25:13.062] Server {0x7effa538a800} DEBUG: <HttpTunnel.cc:1108 
> (producer_handler_chunked)> (http_tunnel) [0] producer_handler_chunked [http 
> server VC_EVENT_READ_READY]
> [Aug 29 18:25:13.062] Server {0x7effa538a800} DEBUG: <HttpTunnel.cc:203 
> (read_size)> (http_chunk) read chunk size of 15 bytes
> [Aug 29 18:25:13.062] Server {0x7effa538a800} DEBUG: <HttpTunnel.cc:279 
> (read_chunk)> (http_chunk) completed read of chunk of 15 bytes
> [Aug 29 18:25:13.062] Server {0x7effa538a800} DEBUG: <HttpTunnel.cc:1213 
> (producer_handler)> (http_redirect) [HttpTunnel::producer_handler] 
> enable_redirection: [1 0 0] event: 100
> Got an HttpTunnel event 101 
> [Aug 29 18:25:13.062] Server {0x7effa538a800} DEBUG: <HttpTunnel.cc:1373 
> (consumer_handler)> (http_tunnel) [0] consumer_handler [user agent 
> VC_EVENT_WRITE_READY]
> write ready consumer_handler
> {code}
> In this situation the connection doesn't close here at the RST-- but rather 
> on the next set of bytes from the origin to send-- which end up tripping a 
> VC_EVENT_ERROR-- and tearing down the connection.
> When the client sends the first RST epoll returns a WRITE_READY event -- 
> which the HTTPTunnel consumer ignores completely. It seems then that when we 
> recieve the WRITE_READY event we need to determine if we are already in the 
> writing state-- and if so, then we should stop the transaction (since we are 
> already edge-triggered).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work logged] (TS-4796) ATS not closing origin connections on first RST from client

Reply via email to