[
https://issues.apache.org/jira/browse/GEODE-9350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kamilla Aslami updated GEODE-9350:
----------------------------------
Summary: MemberJoinedEvent should be triggered after new view is installed
(was: New member is being shunned after joining the cluster)
> MemberJoinedEvent should be triggered after new view is installed
> -----------------------------------------------------------------
>
> Key: GEODE-9350
> URL: https://issues.apache.org/jira/browse/GEODE-9350
> Project: Geode
> Issue Type: Bug
> Components: membership
> Affects Versions: 1.14.0, 1.15.0
> Reporter: Kamilla Aslami
> Assignee: Kamilla Aslami
> Priority: Major
> Labels: pull-request-available, release-blocker
>
> While investigating GEODE-9070, we noticed a problem when a server tries to
> join a cluster, and soon after, membership fails with ShunnedMemberException:
> {noformat}
> org.apache.geode.distributed.internal.direct.ShunnedMemberException: Member
> is being shunned: ccf730fb2b62(161)<v2>:41002
> at
> org.apache.geode.distributed.internal.direct.DirectChannel.getConnections(DirectChannel.java:469)
> at
> org.apache.geode.distributed.internal.direct.DirectChannel.sendToMany(DirectChannel.java:283)
> at
> org.apache.geode.distributed.internal.direct.DirectChannel.sendToOne(DirectChannel.java:190)
> at
> org.apache.geode.distributed.internal.direct.DirectChannel.send(DirectChannel.java:550)
> at
> org.apache.geode.distributed.internal.DistributionImpl.directChannelSend(DistributionImpl.java:354)
> at
> org.apache.geode.distributed.internal.DistributionImpl.send(DistributionImpl.java:296)
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager.sendViaMembershipManager(ClusterDistributionManager.java:2068)
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager.sendOutgoing(ClusterDistributionManager.java:1983)
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager.sendMessage(ClusterDistributionManager.java:2028)
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager.putOutgoing(ClusterDistributionManager.java:1085)
> at
> org.apache.geode.internal.cache.execute.StreamingFunctionOperation.getFunctionResultFrom(StreamingFunctionOperation.java:113)
> at
> org.apache.geode.internal.cache.execute.MemberFunctionExecutor.executeFunction(MemberFunctionExecutor.java:149)
> at
> org.apache.geode.internal.cache.execute.MemberFunctionExecutor.executeFunction(MemberFunctionExecutor.java:191)
> at
> org.apache.geode.internal.cache.execute.AbstractExecution.execute(AbstractExecution.java:397)
> at
> org.apache.geode.internal.cache.execute.AbstractExecution.execute(AbstractExecution.java:402)
> at
> org.apache.geode.modules.util.BootstrappingFunction.bootstrapMember(BootstrappingFunction.java:170)
> at
> org.apache.geode.modules.util.BootstrappingFunction.memberJoined(BootstrappingFunction.java:240)
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager$MemberJoinedEvent.handleEvent(ClusterDistributionManager.java:2498)
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:2451)
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:2440)
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager.handleMemberEvent(ClusterDistributionManager.java:1406)
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager.access$200(ClusterDistributionManager.java:109)
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEventInvoker.run(ClusterDistributionManager.java:1438)
> at java.base/java.lang.Thread.run(Thread.java:834){noformat}
> Further analysis showed that ShunnedMemberException is thrown because
> GMSMembership.memberExists() method returns false, which means that the
> member ccf730fb2b62(161)<v2>:41002 was not in the view. Looking at the
> stacktrace, we noticed that BootstrappingFunction.bootstrapMember() gets
> executed on MemberJoinedEvent, which is triggered by
> MembershipListener.newMemberConnected(). newMemberConnected() is called in
> GMSMembership.processView() before the new view is installed, so it's likely
> that the failure happens because BootstrappingFunction receives the event
> before the view was actually updated. Possible solution for this problem
> could be to change GMSMembership.processView() to call
> MembershipListener.newMemberConnected() only after the new view is installed.
> This issue was introduced by the fix for GEODE-7245 which removed latestView
> lock from GMSMembership.memberExists(). Before GEODE-7245, this method was
> waiting until GMSMembership.processView() released the lock, so the problem
> described above could never happen. GEODE-7245 was back-ported to 1.14.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)