Re: MTCGA: IGNITE-7791 and GridDhtPartitionsSingleMessage
Igniters, especially who participated in implementaion of IEP-4. I need your advice. I've prepeared fixes for this flaky test. Can you review my changes [1] and share your thoughts? In my opinion, we should not perform caches info removing from LocalJoinCachesContext captured state on reconnect event happened. It leads to incorrect CacheAffinityChangeMessage message processing after client node reconnects. I've prepeared all needs for review: 1) PR [1] 2) JIRA issue [2] 3) 100% reproducer for this case [3] (passes after changes) 4) Report [4] run testReconnectCacheDestroyedAndCreated 200 times (passes OK) 5) Run IgniteClientNodesTestSuite [5] (passes OK) 6) Run ALL Build [6] (have some FAIL but not related to this issue) 7) My vision to these fixes in JIRA comment [7] 8) Upsource review created [8] [1] https://github.com/apache/ignite/pull/3779 [2] https://issues.apache.org/jira/browse/IGNITE-7791 [3] https://ci.ignite.apache.org/viewLog.html?currentGroup=test=org.apache.ignite.testsuites.IgniteClientNodesTestSuite%23teamcity%23org.apache.ignite.internal%23teamcity%23IgniteClientReconnectDelayedSpiTest=1=DURATION_DESC=20===IgniteTests24Java8_ClientNodes=1192536=testsInfo [4] https://ci.ignite.apache.org/viewLog.html?buildId=1196532=IgniteTests24Java8_CacheFailover1=testsInfo [5] https://ci.ignite.apache.org/viewLog.html?buildId=1192536=IgniteTests24Java8_ClientNodes=testsInfo [6] https://ci.ignite.apache.org/viewLog.html?buildId=1211532 [7] https://issues.apache.org/jira/browse/IGNITE-7791?focusedCommentId=16437292=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16437292 [8] https://reviews.ignite.apache.org/ignite/review/IGNT-CR-558 пн, 19 мар. 2018 г. в 11:50, Alexey Goncharuk: > Hello Maxim, > > SingleMessage with exchId=null is sent when a node updates local > partitions' state and schedules a background cluster notification. In > contrast, when a partition map exchange happends, it is completed with > exchId != null. > > I need more context regarding how this message interferes with the exchange > and what the difference between the two messages is so that during the > regular scenario the assertion does not happen. > > 2018-03-13 20:58 GMT+03:00 Maxim Muzafarov : > > > Hi all, > > > > I'm working on [1] IgniteClientReconnectCacheTest class with frakly > > test-case testReconnectCacheDestroyedAndCreated with success rate 32.4%. > > > > I've leaved comment in JIRA [2] and new test-case with reproducing this > > issue. > > Basicly, when we receiving GridDhtPartitionsSingleMessage with > exchId=null > > not > > in proper time we've got this Assertion error. Ignite client instance > > erases all it's caches after reconnect, so it has no information about > > cache named 'static-cache' that persists on server nodes and when he > > recieve this SignleMessage after reconnection it will have 'Failed to > > reinitialize local partitions (preloading will be stopped)'. > > > > Should we perform clean-up [3] client caches in case of reconnect client > > ignite instance? > > Why we should clean clinent caches after node reconnects? Can't catch the > > idea of it. > > > > [1] https://issues.apache.org/jira/browse/IGNITE-7791 > > [2] > > https://issues.apache.org/jira/browse/IGNITE-7791? > > focusedCommentId=16391409=com.atlassian.jira. > > plugin.system.issuetabpanels:comment-tabpanel#comment-16391409 > > [3] > > https://github.com/apache/ignite/blob/master/modules/ > > core/src/main/java/org/apache/ignite/internal/processors/cache/ > > CacheAffinitySharedManager.java#L190 > > >
Re: MTCGA: IGNITE-7791 and GridDhtPartitionsSingleMessage
Hello Maxim, SingleMessage with exchId=null is sent when a node updates local partitions' state and schedules a background cluster notification. In contrast, when a partition map exchange happends, it is completed with exchId != null. I need more context regarding how this message interferes with the exchange and what the difference between the two messages is so that during the regular scenario the assertion does not happen. 2018-03-13 20:58 GMT+03:00 Maxim Muzafarov: > Hi all, > > I'm working on [1] IgniteClientReconnectCacheTest class with frakly > test-case testReconnectCacheDestroyedAndCreated with success rate 32.4%. > > I've leaved comment in JIRA [2] and new test-case with reproducing this > issue. > Basicly, when we receiving GridDhtPartitionsSingleMessage with exchId=null > not > in proper time we've got this Assertion error. Ignite client instance > erases all it's caches after reconnect, so it has no information about > cache named 'static-cache' that persists on server nodes and when he > recieve this SignleMessage after reconnection it will have 'Failed to > reinitialize local partitions (preloading will be stopped)'. > > Should we perform clean-up [3] client caches in case of reconnect client > ignite instance? > Why we should clean clinent caches after node reconnects? Can't catch the > idea of it. > > [1] https://issues.apache.org/jira/browse/IGNITE-7791 > [2] > https://issues.apache.org/jira/browse/IGNITE-7791? > focusedCommentId=16391409=com.atlassian.jira. > plugin.system.issuetabpanels:comment-tabpanel#comment-16391409 > [3] > https://github.com/apache/ignite/blob/master/modules/ > core/src/main/java/org/apache/ignite/internal/processors/cache/ > CacheAffinitySharedManager.java#L190 >
Re: MTCGA: IGNITE-7791 and GridDhtPartitionsSingleMessage
Hi Maxim, I didn't know answer, so I decided to provide at least general intro information. I hope it would be useful for you and for newcomers. https://cwiki.apache.org/confluence/display/IGNITE/%28Partition+Map%29+Exchange+-+under+the+hood Sincerely, Dmitriy Pavlov вт, 13 мар. 2018 г. в 21:28, Dmitry Pavlov: > Hi Alexey, > > Could you help with this question? > > I've observed such messages, it were probably sent by timeout, but not > sure their purpose. > > Sincerely, > Dmitriy Pavlov > > вт, 13 мар. 2018 г. в 20:58, Maxim Muzafarov : > >> Hi all, >> >> I'm working on [1] IgniteClientReconnectCacheTest class with frakly >> test-case testReconnectCacheDestroyedAndCreated with success rate 32.4%. >> >> I've leaved comment in JIRA [2] and new test-case with reproducing this >> issue. >> Basicly, when we receiving GridDhtPartitionsSingleMessage with >> exchId=null not >> in proper time we've got this Assertion error. Ignite client instance >> erases all it's caches after reconnect, so it has no information about >> cache named 'static-cache' that persists on server nodes and when he >> recieve this SignleMessage after reconnection it will have 'Failed to >> reinitialize local partitions (preloading will be stopped)'. >> >> Should we perform clean-up [3] client caches in case of reconnect client >> ignite instance? >> Why we should clean clinent caches after node reconnects? Can't catch the >> idea of it. >> >> [1] https://issues.apache.org/jira/browse/IGNITE-7791 >> [2] >> >> https://issues.apache.org/jira/browse/IGNITE-7791?focusedCommentId=16391409=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16391409 >> [3] >> >> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/CacheAffinitySharedManager.java#L190 >> >
Re: MTCGA: IGNITE-7791 and GridDhtPartitionsSingleMessage
Hi Alexey, Could you help with this question? I've observed such messages, it were probably sent by timeout, but not sure their purpose. Sincerely, Dmitriy Pavlov вт, 13 мар. 2018 г. в 20:58, Maxim Muzafarov: > Hi all, > > I'm working on [1] IgniteClientReconnectCacheTest class with frakly > test-case testReconnectCacheDestroyedAndCreated with success rate 32.4%. > > I've leaved comment in JIRA [2] and new test-case with reproducing this > issue. > Basicly, when we receiving GridDhtPartitionsSingleMessage with exchId=null > not > in proper time we've got this Assertion error. Ignite client instance > erases all it's caches after reconnect, so it has no information about > cache named 'static-cache' that persists on server nodes and when he > recieve this SignleMessage after reconnection it will have 'Failed to > reinitialize local partitions (preloading will be stopped)'. > > Should we perform clean-up [3] client caches in case of reconnect client > ignite instance? > Why we should clean clinent caches after node reconnects? Can't catch the > idea of it. > > [1] https://issues.apache.org/jira/browse/IGNITE-7791 > [2] > > https://issues.apache.org/jira/browse/IGNITE-7791?focusedCommentId=16391409=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16391409 > [3] > > https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/CacheAffinitySharedManager.java#L190 >
MTCGA: IGNITE-7791 and GridDhtPartitionsSingleMessage
Hi all, I'm working on [1] IgniteClientReconnectCacheTest class with frakly test-case testReconnectCacheDestroyedAndCreated with success rate 32.4%. I've leaved comment in JIRA [2] and new test-case with reproducing this issue. Basicly, when we receiving GridDhtPartitionsSingleMessage with exchId=null not in proper time we've got this Assertion error. Ignite client instance erases all it's caches after reconnect, so it has no information about cache named 'static-cache' that persists on server nodes and when he recieve this SignleMessage after reconnection it will have 'Failed to reinitialize local partitions (preloading will be stopped)'. Should we perform clean-up [3] client caches in case of reconnect client ignite instance? Why we should clean clinent caches after node reconnects? Can't catch the idea of it. [1] https://issues.apache.org/jira/browse/IGNITE-7791 [2] https://issues.apache.org/jira/browse/IGNITE-7791?focusedCommentId=16391409=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16391409 [3] https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/CacheAffinitySharedManager.java#L190