[
https://issues.apache.org/jira/browse/GEODE-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537734#comment-16537734
]
ASF subversion and git services commented on GEODE-5393:
--------------------------------------------------------
Commit 9d87117b131172e05ba28fc6154aca3370115aa1 in geode's branch
refs/heads/develop from [~bschuchardt]
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=9d87117 ]
GEODE-5393: StateFlushOperation hangs waiting for non-existant operation to
complete
I've added additional debugging to DistributionAdvisor so that it knows
which threads are performing operations and can log them at debug level.
This let me determine that a putAll operation was the source of the hang
due to an exception being thrown during message distribution in
DistributedCacheOperation.startOperation(). The exception resulted in
DistributionAdvisor.endOperation() not being invoked correctly.
> StateFlushOperation hangs waiting for non-existant operation to complete
> ------------------------------------------------------------------------
>
> Key: GEODE-5393
> URL: https://issues.apache.org/jira/browse/GEODE-5393
> Project: Geode
> Issue Type: Bug
> Reporter: Bruce Schuchardt
> Assignee: Bruce Schuchardt
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> We had a state-flush operation hang with no threads performing flushable
> messaging. That indicates that there is a book-keeping error in invoking
> startOperation/endOperation. It looks like it's due to an exception being
> thrown during distribution:
> {noformat}
> Exception occurred while processing
> DistributedPutAllOperation(EntryEventImpl[op=PUTALL_CREATE;region=/replicate_8;key=null;oldValue=null;newValue=null;callbackArg=null;originRemote=true;originMember=10.32.108.122(accessorgemfire1_rs-StorageBTTest30102851a1i3xlarge-hydra-client-5_15068:15068)<v189>:1033;id=EventID[id=92
> bytes;threadID=97;sequenceID=180]])
> org.apache.geode.cache.persistence.PersistentReplicatesOfflineException
> at
> org.apache.geode.internal.cache.DistributedPutAllOperation.initMessage(DistributedPutAllOperation.java:923)
> at
> org.apache.geode.internal.cache.DistributedCacheOperation._distribute(DistributedCacheOperation.java:506)
> at
> org.apache.geode.internal.cache.DistributedCacheOperation.startOperation(DistributedCacheOperation.java:264)
> at
> org.apache.geode.internal.cache.DistributedRegion.postPutAllSend(DistributedRegion.java:3214)
> at
> org.apache.geode.internal.cache.LocalRegionDataView.postPutAll(LocalRegionDataView.java:326)
> at
> org.apache.geode.internal.cache.LocalRegion.basicPutAll(LocalRegion.java:9745)
> at
> org.apache.geode.internal.cache.DistributedRegion.basicPutAll(DistributedRegion.java:3240)
> at
> org.apache.geode.internal.cache.LocalRegion.putAll(LocalRegion.java:9493)
> at
> org.apache.geode.internal.cache.LocalRegion.putAll(LocalRegion.java:9505)
> at
> diskRecovery.StartupShutdownTest.HydraTask_doContinuousUpdates(StartupShutdownTest.java:482)
> at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at hydra.MethExecutor.execute(MethExecutor.java:181)
> at hydra.MethExecutor.execute(MethExecutor.java:149)
> at hydra.TestTask.execute(TestTask.java:192)
> at hydra.RemoteTestModule$1.run(RemoteTestModule.java:212)
> {noformat}
> This is causing endOperation to not be invoked with the correct view version.
> Error handling was moved from DistributedCacheOperation to other classes but
> it's incorrectly implemented. The other classes do not know the view version
> and so end up invoking endOperation with an invalid version number.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)