[jira] Commented: (HADOOP-255) Client Calls are not cancelled after a call timeout

Doug Cutting (JIRA) Fri, 26 May 2006 08:56:17 -0700

    [ 
http://issues.apache.org/jira/browse/HADOOP-255?page=comments#action_12413490 ]


Doug Cutting commented on HADOOP-255:
-------------------------------------

I think this is, in general, something that we won't fix.  It might be possible 
to improve things, but we cannot, without elaborate handshake protocols, 
guarantee that RPC responses are received by clients.  As Owen has indicated, 
we must instead make our applications tolerant of that.

Note that this is generally a problem for HTTP-based services too.  When 
someone hits "stop" in their browser for a slow request, the server generally 
continues to compute the request and only discovers that the connection has 
been closed when it attempts to write the response, if at all.  If the client 
times out after the server has flushed the response, then there's no way for 
the server to know this.

You sugguest that we might queue requests on the client so that only a single 
request to a particular server is outstanding at a time.  That would not work 
well for distributed search (Nutch's original IPC application).  In distributed 
search a front end typically has many queries outstanding.  Each query is 
broadcast to a number of back end servers.  Different queries take different 
amounts of time.  We do not want to make fast queries wait for slow queries, as 
that would make all queries slow and increase the burden on the front end 
servers.

Would folks object if I resolve this as a WONTFIX bug?

> Client Calls are not cancelled after a call timeout
> ---------------------------------------------------
>
>          Key: HADOOP-255
>          URL: http://issues.apache.org/jira/browse/HADOOP-255
>      Project: Hadoop
>         Type: Bug

>   Components: ipc
>     Versions: 0.2.1
>  Environment: Tested on Linux 2.6
>     Reporter: Naveen Nalam

>
> In ipc/Client.java, if a call times out, a SocketTimeoutException is thrown 
> but the Call object still exists on the queue.
> What I found was that when transferring very large amounts of data, it's 
> common for queued up calls to timeout. Yet even though the caller has is no 
> longer waiting, the request is still serviced on the server and the data is 
> sent to the client. The client after receiving the full response calls 
> callComplete() which is a noop since nobody is waiting.
> The problem is that the calls that timeout will retry and the system gets 
> into a situation where data is being transferred around, but it's all data 
> for timed out requests and no progress is ever made.
> My quick solution to this was to add a "boolean timedout" to the Call object 
> which I set to true whenever the queued caller times out. And then when the 
> client starts to pull over the response data (in Connection::run) to first 
> check if the Call is timedout and immediately close the connection.
> I think a good fix for this is to queue requests on the client, and do a 
> single sendParam only when there is no outstanding request. This will allow 
> closing the connection when receiving a response for a request we no longer 
> have pending, reopen the connection, and resend the next queued request. I 
> can provide a patch for this, but I've seen a lot of recent activity in this 
> area so I'd like to get some feedback first.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-255) Client Calls are not cancelled after a call timeout

Reply via email to