[ 
https://issues.apache.org/jira/browse/GEODE-6975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Shu reassigned GEODE-6975:
-------------------------------

    Assignee: Eric Shu

> When a redundant copy or replica of a distributed region failed to persistent 
> remote member's new persistence id, it should send reply exception back to 
> indicate what happened
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-6975
>                 URL: https://issues.apache.org/jira/browse/GEODE-6975
>             Project: Geode
>          Issue Type: Bug
>          Components: persistence, regions
>            Reporter: Eric Shu
>            Assignee: Eric Shu
>            Priority: Major
>
> Currently, when a persistent bucket or distributed region is created on 
> member A, member A will send its new PersistentMemberID to other hosts (e.g 
> member B), so that member B will know and persist A's new ID for the region. 
> However, when member B is being shut down during processing the 
> PrepareNewPersistentMemberMessage (did not persist A's id), it just send a 
> reply message indicate it had persisted. This will cause Member A removes its 
> old member id and only persists its new member id. This is wrong as the 
> member A could also been shut down at the same time. There is a race that 
> member B could be recognized as hosting the last copy for the region. This 
> will lead to member B to recover first, and member B can only recover member 
> A's old persistent id. This will lead to Member A not able to restart, as B 
> does not recognize A's new persistent id.
> [error 2018/09/19 01:18:00.972 PDT dataStoregemfire6_host1_6131 <Recovery 
> thread for bucket _B__partitionedRegion_0> tid=0x77] A DiskAccessException 
> has occurred while writing to the disk for region 
> /__PR/_B__partitionedRegion_0. The cache will be closed.
> org.apache.geode.cache.persistence.ConflictingPersistentDataException: Region 
> /__PR/_B__partitionedRegion_0 remote member 
> rs-FullRegression19041704a3i3large-hydra-client-62(dataStoregemfire1_host1_5862:5862)<ec><v8>:1025
>  with persistent data 
> /10.32.109.230:/var/vcap/data/rundir/concParRegHAPersistPdxVA57H/concParRegHAPersistPdx-0919-011540/vm_1_dataStore1_disk_1
>  created at timestamp 1537345060760 version 0 diskStoreId 
> a35a937a082b4066-af019365b6a5114b name null was not part of the same 
> distributed system as the local data from 
> /10.32.109.230:/var/vcap/data/rundir/concParRegHAPersistPdxVA57H/concParRegHAPersistPdx-0919-011540/vm_6_dataStore6_disk_1
>  created at timestamp 1537344996470 version 0 diskStoreId 
> 108be5a03966418f-980c1d88e9b26d1d name null
>         at 
> org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.checkMyStateOnMembers(PersistenceAdvisorImpl.java:521)
>         at 
> org.apache.geode.internal.cache.persistence.PersistenceInitialImageAdvisor.removeReplicatesIfWeAreEqualToAnyOrElseClearEqualMembers(PersistenceInitialImageAdvisor.java:181)
>         at 
> org.apache.geode.internal.cache.persistence.PersistenceInitialImageAdvisor.getAdvice(PersistenceInitialImageAdvisor.java:69)
>         at 
> org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:831)
>         at 
> org.apache.geode.internal.cache.persistence.CreatePersistentRegionProcessor.getInitialImageAdvice(CreatePersistentRegionProcessor.java:52)
>         at 
> org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1200)
>         at 
> org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1081)
>         at 
> org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:258)
>         at 
> org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:1014)
>         at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:779)
>         at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:454)
>         at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2895)
>         at 
> org.apache.geode.internal.cache.ProxyBucketRegion.recoverFromDisk(ProxyBucketRegion.java:447)
>         at 
> org.apache.geode.internal.cache.ProxyBucketRegion.recoverFromDiskRecursively(ProxyBucketRegion.java:390)
>         at 
> org.apache.geode.internal.cache.PRHARedundancyProvider$4.run2(PRHARedundancyProvider.java:1756)
>         at 
> org.apache.geode.internal.cache.partitioned.RecoveryRunnable.run(RecoveryRunnable.java:58)
>         at 
> org.apache.geode.internal.cache.PRHARedundancyProvider$4.run(PRHARedundancyProvider.java:1748)
>         at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to