[ 
https://issues.apache.org/jira/browse/GEODE-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223208#comment-17223208
 ] 

ASF GitHub Bot commented on GEODE-8652:
---------------------------------------

Bill commented on a change in pull request #5666:
URL: https://github.com/apache/geode/pull/5666#discussion_r514565824



##########
File path: 
geode-core/src/main/java/org/apache/geode/internal/net/ByteBufferSharingImpl.java
##########
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements. See the NOTICE file distributed with this work for additional 
information regarding
+ * copyright ownership. The ASF licenses this file to You under the Apache 
License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the 
License. You may obtain a
+ * copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 
KIND, either express
+ * or implied. See the License for the specific language governing permissions 
and limitations under
+ * the License.
+ */
+
+package org.apache.geode.internal.net;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.locks.Lock;
+import java.util.concurrent.locks.ReentrantLock;
+
+import org.apache.geode.annotations.VisibleForTesting;
+import org.apache.geode.internal.net.BufferPool.BufferType;
+
+/**
+ * An {@link AutoCloseable} meant to be acquired in a try-with-resources 
statement. The resource (a
+ * {@link ByteBuffer}) is available (for reading and modification) in the 
scope of the
+ * try-with-resources.
+ */
+class ByteBufferSharingImpl implements ByteBufferSharing {
+
+  static class LockAttemptTimedOut extends Exception {
+  }
+
+  private final Lock lock;
+  private final AtomicBoolean isClosed;
+  // mutable because in general our ByteBuffer may need to be resized (grown 
or compacted)
+  private ByteBuffer buffer;
+  private final BufferType bufferType;
+  private final AtomicInteger counter;
+  private final BufferPool bufferPool;
+
+  /**
+   * This constructor is for use only by the owner of the shared resource (a 
{@link ByteBuffer}).
+   *
+   * A resource owner must invoke {@link #alias()} once for each reference 
that escapes (is passed
+   * to an external object or is returned to an external caller.)
+   *
+   * This constructor acquires no lock. The reference count will be 1 after 
this constructor
+   * completes.
+   */
+  ByteBufferSharingImpl(final ByteBuffer buffer, final BufferType bufferType,
+      final BufferPool bufferPool) {
+    this.buffer = buffer;
+    this.bufferType = bufferType;
+    this.bufferPool = bufferPool;
+    lock = new ReentrantLock();
+    counter = new AtomicInteger(1);
+    isClosed = new AtomicBoolean(false);
+  }
+
+  /**
+   * The destructor. Called by the resource owner to undo the work of the 
constructor.
+   */
+  void destruct() {
+    if (isClosed.compareAndSet(false, true)) {
+      dropReference();
+    }
+  }
+
+  /**
+   * This method is for use only by the owner of the shared resource. It's 
used for handing out
+   * references to the shared resource. So it does reference counting and also 
acquires a lock.
+   *
+   * Resource owners call this method as the last thing before returning a 
reference to the caller.
+   * That caller binds that reference to a variable in a try-with-resources 
statement and relies on
+   * the AutoCloseable protocol to invoke close() on the object at the end of 
the block.
+   */
+  ByteBufferSharing alias() {
+    lock.lock();
+    addReference();
+    return this;
+  }
+
+  /**
+   * This variant throws {@link LockAttemptTimedOut} if it can't acquire the 
lock in time.
+   */
+  ByteBufferSharing alias(final long time, final TimeUnit unit) throws 
LockAttemptTimedOut {
+    try {
+      if (!lock.tryLock(time, unit)) {
+        throw new LockAttemptTimedOut();
+      }
+    } catch (InterruptedException e) {
+      Thread.currentThread().interrupt();
+      throw new LockAttemptTimedOut();
+    }
+    addReference();
+    return this;
+  }
+
+  @Override
+  public ByteBuffer getBuffer() throws IOException {
+    if (isClosed.get()) {
+      throw new IOException("NioSslEngine has been closed");
+    } else {
+      return buffer;
+    }
+  }
+
+  @Override
+  public ByteBuffer expandWriteBufferIfNeeded(final int newCapacity) throws 
IOException {
+    return buffer = bufferPool.expandWriteBufferIfNeeded(bufferType, 
getBuffer(), newCapacity);
+  }
+
+  @Override
+  public void close() {
+    dropReference();
+    lock.unlock();
+  }
+
+  private int addReference() {
+    return counter.incrementAndGet();
+  }
+
+  private int dropReference() {

Review comment:
       What @bschuchardt's comment made me concerned about, is the prospect of 
a "client" of a `ByteBufferSharing` (`ByteBufferSharingImpl` particularly), 
calling `close()` too many times i.e. more than once.
   
   Turns our there was a bug there. In the version of the code he commented on, 
if a client called `close()` twice then the reference count was decremented on 
the second call, even though it was already `0` (leaving it at `-1`.) You can 
imagine the bad things that happen in that state.
   
   The latest commit adds a test for this bug (`ByteBufferSharingImplTest`) and 
also fixes the bug.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> member hung in Connection.notifyHandshakeWaiter() during disconnect waiting 
> for a lock held by another thread in Connection.readAck() 
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-8652
>                 URL: https://issues.apache.org/jira/browse/GEODE-8652
>             Project: Geode
>          Issue Type: Bug
>          Components: membership, messaging
>    Affects Versions: 1.14.0
>            Reporter: Bill Burcham
>            Assignee: Bill Burcham
>            Priority: Major
>              Labels: pull-request-available
>
> An application encountered the following hang in a TLS-enabled cluster.
> Let's call the cluster members ds3 -> ds1. 
> ds3 sends a {{PutAllPRMessage}} to ds1 and is stuck in 
> {{SocketChannel.read()}} waiting for the acknowledgement.
> {{ClusterDistributionManager.uncleanShutdown()}} is invoked on ds3 to shut 
> the member down. That thread blocks trying to acquire a lock on the 
> {{NioSslEngine}} held by the first thread (the one doing waiting for the ack 
> to the put-all.)
> Somehow the shutdown thread must be allowed to proceed.
> Here's the hung thread in ds3 (a.k.a. vm_3_thr_34_dataStore3_host1_8592) 
> trying to shut down the member but it's stuck waiting for the monitor on the 
> {{NioSslEngine}}:
> {noformat}
> "vm_3_thr_34_dataStore3_host1_8592" #795 daemon prio=5 os_prio=0 
> tid=0x00007fdb9c011000 nid=0x2fcc waiting for monitor entry 
> [0x00007fdb6f4b7000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>       at 
> org.apache.geode.internal.tcp.Connection.notifyHandshakeWaiter(Connection.java:804)
>       - waiting to lock <0x00000000f2635b28> (a 
> org.apache.geode.internal.net.NioSslEngine)
>       at org.apache.geode.internal.tcp.Connection.close(Connection.java:1350)
>       at 
> org.apache.geode.internal.tcp.Connection.closePartialConnect(Connection.java:1278)
>       at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:612)
>       at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:604)
>       at 
> org.apache.geode.internal.tcp.ConnectionTable.close(ConnectionTable.java:661)
>       - locked <0x00000000f2678cf8> (a java.util.ArrayList)
>       - locked <0x00000000f1187348> (a java.util.concurrent.ConcurrentHashMap)
>       at org.apache.geode.internal.tcp.TCPConduit.stop(TCPConduit.java:487)
>       at 
> org.apache.geode.distributed.internal.direct.DirectChannel.disconnect(DirectChannel.java:644)
>       - locked <0x00000000f11867a8> (a 
> org.apache.geode.distributed.internal.direct.DirectChannel)
>       at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnectDirectChannel(DistributionImpl.java:631)
>       at 
> org.apache.geode.distributed.internal.DistributionImpl.access$200(DistributionImpl.java:82)
>       at 
> org.apache.geode.distributed.internal.DistributionImpl$LifecycleListenerImpl.disconnect(DistributionImpl.java:904)
>       at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.stop(GMSMembership.java:1908)
>       at 
> org.apache.geode.distributed.internal.membership.gms.Services.stop(Services.java:302)
>       at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.shutdown(GMSMembership.java:1262)
>       at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.disconnect(GMSMembership.java:1847)
>       at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnect(DistributionImpl.java:501)
>       at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.uncleanShutdown(ClusterDistributionManager.java:1291)
> {noformat}
> That thread is waiting on a lock held by this thread (in ds3) which is 
> waiting on an acknowledgement to a PutAllPRMessage sent to ds1.
> {noformat}
> "vm_3_thr_37_dataStore3_host1_8592" #857 daemon prio=5 os_prio=0 
> tid=0x00007fdb9c030800 nid=0x30d1 runnable [0x00007fdb732f0000]
>    java.lang.Thread.State: RUNNABLE
>         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
>         - locked <0x00000000f2643380> (a java.lang.Object)
>         at 
> org.apache.geode.internal.net.NioSslEngine.readAtLeast(NioSslEngine.java:330)
>         at 
> org.apache.geode.internal.tcp.MsgReader.readAtLeast(MsgReader.java:129)
>         at 
> org.apache.geode.internal.tcp.MsgReader.readHeader(MsgReader.java:58)
> ==>     - locked <0x00000000f2635b28> (a 
> org.apache.geode.internal.net.NioSslEngine)
>         at 
> org.apache.geode.internal.tcp.Connection.readAck(Connection.java:2652)
>         at 
> org.apache.geode.distributed.internal.direct.DirectChannel.readAcks(DirectChannel.java:392)
>         at 
> org.apache.geode.distributed.internal.direct.DirectChannel.sendToMany(DirectChannel.java:342)
>         at 
> org.apache.geode.distributed.internal.direct.DirectChannel.sendToOne(DirectChannel.java:182)
>         at 
> org.apache.geode.distributed.internal.direct.DirectChannel.send(DirectChannel.java:511)
>         at 
> org.apache.geode.distributed.internal.DistributionImpl.directChannelSend(DistributionImpl.java:346)
>         at 
> org.apache.geode.distributed.internal.DistributionImpl.send(DistributionImpl.java:291)
>         at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.sendViaMembershipManager(ClusterDistributionManager.java:2053)
>         at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.sendOutgoing(ClusterDistributionManager.java:1981)
>         at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.sendMessage(ClusterDistributionManager.java:2018)
>         at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.putOutgoing(ClusterDistributionManager.java:1083)
>         at 
> org.apache.geode.internal.cache.partitioned.PutAllPRMessage.send(PutAllPRMessage.java:201)
>         at 
> org.apache.geode.internal.cache.PartitionedRegion.tryToSendOnePutAllMessage(PartitionedRegion.java:2839)
>         at 
> org.apache.geode.internal.cache.PartitionedRegion.sendMsgByBucket(PartitionedRegion.java:2621)
>         at 
> org.apache.geode.internal.cache.PartitionedRegion.postPutAllSend(PartitionedRegion.java:2392)
>         at 
> org.apache.geode.internal.cache.LocalRegionDataView.postPutAll(LocalRegionDataView.java:361)
>         at 
> org.apache.geode.internal.cache.LocalRegion.basicPutAll(LocalRegion.java:9154)
>         at 
> org.apache.geode.internal.cache.LocalRegion.putAll(LocalRegion.java:8903)
> {noformat}
> What we see is that the {{MsgReader}} in the second thread is not letting the 
> first thread close the socket. Until the socket is closed, the second thread 
> will be stuck in {{SocketChannel.read()}}.
> *But why is the second thread stuck in {{SocketChannelImpl.read}}? That may 
> be due to GEODE-8651!*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to