[ 
https://issues.apache.org/jira/browse/HADOOP-14519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136359#comment-16136359
 ] 

John Zhuge commented on HADOOP-14519:
-------------------------------------

Thanks [~vagarychen] and [~djp] for looking at this subtle issue carefully.

bq. I think the patch slightly changed the syntax of this function.

As pointed out by Junping, it should loop until any 1 out 3 conditions is 
broken, in order to avoid spurious wakeup.

bq. However, when we do stop() - where we set running to false, it sounds like 
we are missing to send notification. May be safe to add notification?

stop() does get it out of the loop by sending an interrupt:
{code:java|title=Client#close}
    if (!running.compareAndSet(true, false)) {
      return;
    }
    
    // wake up all connections
    for (Connection conn : connections.values()) {
      conn.interrupt();               <<<<<<<<<<<=========
    }
{code}
{code:java|title=Client$Connection#waitForWork}
      while (calls.isEmpty() && !shouldCloseConnection.get() && running.get())
      {
        long timeout = maxIdleTime-
              (Time.now()-lastActivity.get());
        if (timeout>0) {
          try {
            wait(timeout);
          } catch (InterruptedException e) {
            // Restore the interrupted status
            Thread.currentThread().interrupt();
            break;               <<<<<<<<<<<<<=========
          }
        }
      }
{code}

Since this is a day one bug, quite subtle, and the consequence is not that 
severe (or we have lived with it for so long). The worst case this can cause is 
extra TIME_WAIT sockets after it may prematurely quit the connection thread, 
thus not taking full advantage of the connection pool. I am ok if you'd like to 
push it out 2.8.2.

> Client$Connection#waitForWork may suffer from spurious wakeups
> --------------------------------------------------------------
>
>                 Key: HADOOP-14519
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14519
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 2.8.0
>            Reporter: John Zhuge
>            Assignee: John Zhuge
>            Priority: Critical
>         Attachments: HADOOP-14519.001.patch
>
>
> {{Client$Connection#waitForWork}} may suffer spurious wakeup because the 
> {{wait}} is not surrounded by a loop. See 
> [https://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#wait()].
> {code:title=Client$Connection#waitForWork}
>       if (calls.isEmpty() && !shouldCloseConnection.get() && running.get())  {
>         long timeout = maxIdleTime-
>               (Time.now()-lastActivity.get());
>         if (timeout>0) {
>           try {
>             wait(timeout);                          <<<<<<==== spurious wakeup
>           } catch (InterruptedException e) {}
>         }
>       }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to