[
https://issues.apache.org/jira/browse/GEODE-9075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325895#comment-17325895
]
Mario Ivanac commented on GEODE-9075:
-------------------------------------
These are reproduction steps
In properties file gemfire1.properties set
membership-port-range=2025-2030
1. In gfsh execute:
start locator --name=locator1
start server --name=server1 --server-port=0
--properties-file=gemfire1.properties
start server --name=server2 --server-port=0
--properties-file=gemfire2.properties
create region --name=regionA --type=REPLICATE
put --region=regionA --key="1" --value="one"
2. Now in second terminal set iptables:
sudo iptables -I INPUT -p tcp --match multiport
--destination-port=2025,2026,2027,2028,2029,2030 -j REJECT --reject-with
tcp-reset
3. In gfsh execute:
put --region=regionA --key="1" --value="onev2"
4. Then in second terminal remove iptables:
sudo iptables -D INPUT -p tcp --match multiport
--destination-port=2025,2026,2027,2028,2029,2030 -j REJECT --reject-with
tcp-reset
After all these steps, gfsh is stuck.
> Thread stuck indefinitely when using Istio/Sidecar
> --------------------------------------------------
>
> Key: GEODE-9075
> URL: https://issues.apache.org/jira/browse/GEODE-9075
> Project: Geode
> Issue Type: Bug
> Reporter: Mario Ivanac
> Assignee: Mario Ivanac
> Priority: Major
> Labels: pull-request-available
>
> Geode cluster is deployed in kubernetes environment, and Istio/SideCars are
> injected between cluster members. While running traffic, if any Istio/SideCar
> is restarted, thread will get stuck indefinitely, while waiting for reply on
> sent message.
> After detail analysis, it seams that due to restarting of proxy, in some
> cases, message is lost, and sending side is waiting indefinitely for reply.
> What can be seen on sending side, is reception of "reset connection" or "EOF"
> on sending socket after message is sent.
>
> [warn 2021/03/25 21:04:47.282 CET server2 <ThreadsMonitor> tid=0x12] Thread
> <64> (0x40) that was executed at <25 Mar 2021 21:03:53 CET> has been stuck
> for <53.897 seconds> and number of thread monitor iteration <1>
> Thread Name <Function Execution Processor2> state <TIMED_WAITING>
> Waiting on <java.util.concurrent.CountDownLatch$Sync@7c7f9898>
> Executor Group <FunctionExecutionPooledExecutor>
> Monitored metric <ResourceManagerStats.numThreadsStuck>
> Thread stack:
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>
> org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
>
> org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:736)
>
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:811)
>
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:784)
>
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:874)
>
> org.apache.geode.internal.cache.DistributedCacheOperation.waitForAckIfNeeded(DistributedCacheOperation.java:811)
>
> org.apache.geode.internal.cache.DistributedCacheOperation._distribute(DistributedCacheOperation.java:699)
>
> org.apache.geode.internal.cache.DistributedCacheOperation.startOperation(DistributedCacheOperation.java:277)
>
> org.apache.geode.internal.cache.DistributedCacheOperation.distribute(DistributedCacheOperation.java:318)
>
> org.apache.geode.internal.cache.DistributedRegion.distributeUpdate(DistributedRegion.java:520)
> ...
--
This message was sent by Atlassian Jira
(v8.3.4#803005)