[ https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eduard Shangareev updated IGNITE-11288: --------------------------------------- Description: Rootcause is we not set SO_TIMEOUT on discovery socket on retry: RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); So ring message worker blocks forever but SSLSOcketImpl.close() onTimeout hangs on writeLock. According to java8 SSLSocketImpl: {code} if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { boolean var3 = Thread.interrupted(); try { if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { try { this.writeRecordInternal(var1, var2); } finally \{ this.writeLock.unlock(); } } else \{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message cannot be sent."); if (this.isLayered() && !this.autoClose) { this.fatal((byte)-1, (Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) \{ System.out.println(Thread.currentThread().getName() + ", received Exception: " + var4); } this.sess.invalidate(); } } catch (InterruptedException var14) \{ var3 = true; } if (var3) \{ Thread.currentThread().interrupt(); } } else \{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } } {code} In case of soLinger is not set we fallback to this.writeLock.lock(); which wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero. U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative. Solution: 1) Set proper SO_TIMEOUT 2) Possibly add ability to override SO_LINGER to some reasonable value. Similar bug [https://bugs.openjdk.java.net/browse/JDK-6668261]. was: Rootcause is we not set SO_TIMEOUT on discovery socket on retry: RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); So ring message worker blocks forever but SSLSOcketImpl.close() onTimeout hangs on writeLock. According to java8 SSLSocketImpl: if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { boolean var3 = Thread.interrupted(); try { if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { try { this.writeRecordInternal(var1, var2); } finally \{ this.writeLock.unlock(); } } else \{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message cannot be sent."); if (this.isLayered() && !this.autoClose) { this.fatal((byte)-1, (Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) \{ System.out.println(Thread.currentThread().getName() + ", received Exception: " + var4); } this.sess.invalidate(); } } catch (InterruptedException var14) \{ var3 = true; } if (var3) \{ Thread.currentThread().interrupt(); } } else \{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } } In case of soLinger is not set we fallback to this.writeLock.lock(); which wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero. U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative. Solution: 1) Set proper SO_TIMEOUT 2) Possibly add ability to override SO_LINGER to some reasonable value. Similar bug [https://bugs.openjdk.java.net/browse/JDK-6668261]. > TcpDiscovery deadlock on SSLSocket.close(). > ------------------------------------------- > > Key: IGNITE-11288 > URL: https://issues.apache.org/jira/browse/IGNITE-11288 > Project: Ignite > Issue Type: Bug > Reporter: Pavel Voronkin > Assignee: Pavel Voronkin > Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > Rootcause is we not set SO_TIMEOUT on discovery socket on retry: > RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); > So ring message worker blocks forever but SSLSOcketImpl.close() onTimeout > hangs on writeLock. > > According to java8 SSLSocketImpl: > {code} > if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { > boolean var3 = Thread.interrupted(); > try { > if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { > try > { this.writeRecordInternal(var1, var2); } > > finally \{ this.writeLock.unlock(); } > } else > > \{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify > message cannot be sent."); if (this.isLayered() && !this.autoClose) { > this.fatal((byte)-1, (Throwable)var4); } > > else if (debug != null && Debug.isOn("ssl")) \{ > System.out.println(Thread.currentThread().getName() + ", received Exception: > " + var4); } > > this.sess.invalidate(); > } > } catch (InterruptedException var14) \{ var3 = true; } > > if (var3) \{ Thread.currentThread().interrupt(); } > } else > > \{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); } > finally > { this.writeLock.unlock(); } > } > {code} > In case of soLinger is not set we fallback to this.writeLock.lock(); which > wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero. > U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative. > > Solution: > 1) Set proper SO_TIMEOUT > 2) Possibly add ability to override SO_LINGER to some reasonable value. > > Similar bug [https://bugs.openjdk.java.net/browse/JDK-6668261]. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)