[
https://issues.apache.org/jira/browse/GEODE-8465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193125#comment-17193125
]
ASF subversion and git services commented on GEODE-8465:
--------------------------------------------------------
Commit 8a7b1fc7bc91d772d51282ea527c7e58711569c1 in geode's branch
refs/heads/develop from Eric Shu
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=8a7b1fc ]
GEODE-8465: secondary HARegionQueue to sync with primary queue (#5496)
A seconary HARegionQueue will try to sync with primary after GII
to remove any events have been already dispatched from primary.
> A stray event can resurrect on a client due to a race condition
> ---------------------------------------------------------------
>
> Key: GEODE-8465
> URL: https://issues.apache.org/jira/browse/GEODE-8465
> Project: Geode
> Issue Type: Bug
> Components: client queues
> Affects Versions: 1.1.0
> Reporter: Eric Shu
> Assignee: Eric Shu
> Priority: Major
> Labels: caching-applications, pull-request-available
>
> A member with primary queue could send the QRM for the event before it
> removes the event from its HARegionQueue – this causes a race that a newly
> started member can gii and get the events from the member with primary queue
> after it has already sent the QRM for the event to existing members holding
> the secondary queue.
> The following analysis showing that the race exists.
> {noformat}
> The stray create of Object_5234on edge 10634 shows this issue with this cause.
> From edge 10634, we know the stray create is for v3 of the object and from
> bridge 15141.
> [fine 2020/06/17 16:38:49.621 PDT edgegemfire_2_1_host1_10634 <Cache Client
> Updater Thread on
> rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_15141:15141)<ec><v35>:41007(version:UNKNOWN[ordinal=125])
> port 26905> tid=0x541] VersionedThinRegionEntryHeapStringKey2@1b56c386
> (key=Object_5234; rawValue=VMCachedDeserializable@526996532; version=
> {v3; rv28;
> mbr=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750)<ec><v27>:41007;
> ds=2; time=1592434556093}
> ;member=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750)<ec><v27>:41007)
> dispatching event
> EntryEventImpl[op=CREATE;region=/DefaultRegion;key=Object_5234;oldValue=null;newValue=VMCachedDeserializable@526996532;callbackArg=Update
> event originated in pid
> 10655;originRemote=true;originMember=rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_2_host1_10655:loner):60216:b0d873c4:edgegemfire_2_2_host1_10655;version=
> {v3; rv28;
> mbr=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750)<ec><v27>:41007;
> ds=2; time=1592434556093; remote}
> ;id=EventIDid=52bytes;threadID=52;sequenceID=108];isFromServer]
> [fine 2020/06/17 16:38:49.621 PDT edgegemfire_2_1_host1_10634 <Cache Client
> Updater Thread on
> rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_15141:15141)<ec><v35>:41007(version:UNKNOWN[ordinal=125])
> port 26905> tid=0x541] Put entry for region: /DefaultRegion key: Object_5234
> callbackArgument: Update event originated in pid 10655
> ***This version (v3) is an update. The primary queue at the time is
> bridgegemfire_2_2_host1_10538. The event is enqueued and dispatched ***
> [fine 2020/06/17 15:55:56.098 PDT bridgegemfire_2_2_host1_10538 <P2P message
> reader for
> rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125])
> unshared ordered uid=358 dom #1 port=43914> tid=0x16d] DefaultRegion:
> notifying 1 bridge servers of event:
> EntryEventImpl[op=UPDATE;region=/DefaultRegion;key=Object_5234;oldValue=VMCachedDeserializable@1538889434;newValue=VMCachedDeserializable@2141181294;callbackArg=Update
> event originated in pid
> 10655;originRemote=true;originMember=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125]);version=
> {v3; rv28;
> mbr=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125]);
> ds=2; time=1592434556093; remote}
> ;context=identity(rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_2_host1_10655:10655:loner):60216:b0d873c4:edgegemfire_2_2_host1_10655,connection=1;id=EventIDid=52bytes;threadID=52;sequenceID=108];routing=]
> [fine 2020/06/17 15:55:56.098 PDT bridgegemfire_2_2_host1_10538 <P2P message
> reader for
> rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125])
> unshared ordered uid=358 dom #1 port=43914> tid=0x16d]
> VMStatsRegionEntryHeapLongKey@38ccc0e5 (key=3867;
> rawValue=HAEventWrapper[refCount=2;
> msg=ClientUpdateMessageImpl[op=AFTER_UPDATE;region=/DefaultRegion;key=Object_5234;isObject=1;cbArg=Update
> event originated in pid
> 10655;memberId=identity(rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_2_host1_10655:10655:loner):60216:b0d873c4:edgegemfire_2_2_host1_10655,connection=1;eventId=EventIDid=52bytes;threadID=52;sequenceID=108];shouldConflate=false;versionTag=
> {v3; rv28;
> mbr=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125]);
> ds=2; time=1592434556093; remote}
> ;hasCqs=false]]) dispatching event
> EntryEventImpl[op=CREATE;region=/_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue;key=3867;oldValue=null;newValue=HAEventWrapper[refCount=2;
>
> msg=ClientUpdateMessageImpl[op=AFTER_UPDATE;region=/DefaultRegion;key=Object_5234;isObject=1;cbArg=Update
> event originated in pid
> 10655;memberId=identity(rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_2_host1_10655:10655:loner):60216:b0d873c4:edgegemfire_2_2_host1_10655,connection=1;eventId=EventIDid=52bytes;threadID=52;sequenceID=108];shouldConflate=false;versionTag=
> {v3; rv28;
> mbr=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125]);
> ds=2; time=1592434556093; remote}
> ;hasCqs=false]];callbackArg=null;originRemote=false;originMember=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_2_host1_10538:10538)<ec><v2>:41006]
> ***QRM is sent to the 2 peers at the time with the region ***
> [fine 2020/06/17 15:55:56.213 PDT bridgegemfire_2_2_host1_10538 <Queue
> Removal Thread> tid=0x83] Sending
> (QueueRemovalMessage[_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue,
> 1, EventIDid=52bytes;threadID=52;sequenceID=108],
> _gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_3_host1_10666:10666:loner):60462:26db73c4:edgegemfire_2_3_host1_10666_1_queue,
> 1, EventIDid=52bytes;threadID=52;sequenceID=108]]) to 2 peers
> ([rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125])@277,
>
> rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_4_host1_23574:23574)<ec><v31>:41011(version:UNKNOWN[ordinal=125])@566])
> via tcp/ip
> ***the newly started bridgegemfire_2_3_24320 creates the HARegionQueue and
> gii from the member with primary queue (10538) ***
> [info 2020/06/17 15:56:08.800 PDT <Client Queue Initialization Thread 1>
> tid=0x76] Initializing region
> _gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue
> [info 2020/06/17 15:56:09.285 PDT <Client Queue Initialization Thread 1>
> tid=0x76]
> _gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue
> is done getting image from
> rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_2_host1_10538:10538)<ec><v2>:41006(version:GEODE
> 1.3.0). isDeltaGII is false
> [debug 2020/06/17 15:56:09.377 PDT <Client Queue Initialization Thread 1>
> tid=0x76] adding haContainerKey to HARegion at
> 1:HAEventWrapper[region=/DefaultRegion; key=Object_5234; refCount=1;
> putInProgress=0;
> event=EventID[rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_2_host1_10655:loner):60216:b0d873c4:edgegemfire_2_2_host1_10655;threadID=52;sequenceID=108];
> no message] for
> _gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue
> [debug 2020/06/17 15:56:09.377 PDT <Client Queue Initialization Thread 1>
> tid=0x76] invoking listeners:
> [org.apache.geode.internal.cache.ha.HARegionQueue$1@2e69472b]
> [debug 2020/06/17 15:56:09.377 PDT <Client Queue Initialization Thread 1>
> tid=0x76] Adding position 1 to available IDs. Region:
> _gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue
> [debug 2020/06/17 15:56:09.377 PDT <Client Queue Initialization Thread 1>
> tid=0x76] RegionQueue on
> _gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue(backup)
> done putting GII data into queue
> ***Please note that the event was being removed from its primary queue after
> QRM has been sent and bridge 24320 gii the events from it ***
> [fine 2020/06/17 15:56:09.666 PDT bridgegemfire_2_2_host1_10538 <Client
> Message Dispatcher for
> rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634>
> tid=0x84] VMStatsRegionEntryHeapLongKey@38ccc0e5 (key=3867;
> rawValue=REMOVED_PHASE2) dispatching event
> EntryEventImpl[op=LOCAL_DESTROY;region=/_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue;key=3867;oldValue=HAEventWrapper[region=/DefaultRegion;
> key=Object_5234; refCount=1; inContainer=true; putInProgress=false;
> event=EventIDid=52bytes;threadID=52;sequenceID=108]; no
> message];newValue=null;callbackArg=null;originRemote=false;originMember=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_2_host1_10538:10538)<ec><v2>:41006]
> ***Bridge 24320 now has the event for Object_5234(v3) that has been sent to
> the client and won't get QRM again ***
> ***Later, bridge 15141 gii from 24320 and got the event ***
> [info 2020/06/17 16:17:36.454 PDT <Client Queue Initialization Thread 2>
> tid=0x78] Initializing region
> _gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue
> [info 2020/06/17 16:17:40.187 PDT <Client Queue Initialization Thread 2>
> tid=0x78]
> _gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue
> is done getting image from
> rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_3_host1_24320:24320)<ec><v33>:41010.
> isDeltaGII is false
> [debug 2020/06/17 16:17:40.415 PDT <Client Queue Initialization Thread 2>
> tid=0x78]
> _gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue
> processing queue key 1 and value HAEventWrapper[region=/DefaultRegion;
> key=Object_5234; refCount=1; putInProgress=0;
> event=EventID[rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_2_host1_10655:loner):60216:b0d873c4:edgegemfire_2_2_host1_10655;threadID=52;sequenceID=108];
> no message]
> ***It will send the event to client after it becomes the primary queue holder
> for client 10634 ***
> [debug 2020/06/17 16:38:49.125 PDT <ServerConnection on port 26905 Thread 5>
> tid=0xb4]
> CacheClientProxy[identity(rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634,connection=1;
> port=53006; primary=true; version=GEODE 1.3.0]: Queueing marker message.
> <org.apache.geode.internal.cache.tier.sockets.ClientMarkerMessageImpl@75bf95c8>.
> The queue contains 57 entries.
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)