Igniters, especially who participated in implementaion of IEP-4. I need your advice.
I've prepeared fixes for this flaky test. Can you review my changes [1] and share your thoughts? In my opinion, we should not perform caches info removing from LocalJoinCachesContext captured state on reconnect event happened. It leads to incorrect CacheAffinityChangeMessage message processing after client node reconnects. I've prepeared all needs for review: 1) PR [1] 2) JIRA issue [2] 3) 100% reproducer for this case [3] (passes after changes) 4) Report [4] run testReconnectCacheDestroyedAndCreated 200 times (passes OK) 5) Run IgniteClientNodesTestSuite [5] (passes OK) 6) Run ALL Build [6] (have some FAIL but not related to this issue) 7) My vision to these fixes in JIRA comment [7] 8) Upsource review created [8] [1] https://github.com/apache/ignite/pull/3779 [2] https://issues.apache.org/jira/browse/IGNITE-7791 [3] https://ci.ignite.apache.org/viewLog.html?currentGroup=test&scope=org.apache.ignite.testsuites.IgniteClientNodesTestSuite%23teamcity%23org.apache.ignite.internal%23teamcity%23IgniteClientReconnectDelayedSpiTest&pager.currentPage=1&order=DURATION_DESC&recordsPerPage=20&filterText=&status=&buildTypeId=IgniteTests24Java8_ClientNodes&buildId=1192536&tab=testsInfo [4] https://ci.ignite.apache.org/viewLog.html?buildId=1196532&buildTypeId=IgniteTests24Java8_CacheFailover1&tab=testsInfo [5] https://ci.ignite.apache.org/viewLog.html?buildId=1192536&buildTypeId=IgniteTests24Java8_ClientNodes&tab=testsInfo [6] https://ci.ignite.apache.org/viewLog.html?buildId=1211532 [7] https://issues.apache.org/jira/browse/IGNITE-7791?focusedCommentId=16437292&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16437292 [8] https://reviews.ignite.apache.org/ignite/review/IGNT-CR-558 пн, 19 мар. 2018 г. в 11:50, Alexey Goncharuk <[email protected]>: > Hello Maxim, > > SingleMessage with exchId=null is sent when a node updates local > partitions' state and schedules a background cluster notification. In > contrast, when a partition map exchange happends, it is completed with > exchId != null. > > I need more context regarding how this message interferes with the exchange > and what the difference between the two messages is so that during the > regular scenario the assertion does not happen. > > 2018-03-13 20:58 GMT+03:00 Maxim Muzafarov <[email protected]>: > > > Hi all, > > > > I'm working on [1] IgniteClientReconnectCacheTest class with frakly > > test-case testReconnectCacheDestroyedAndCreated with success rate 32.4%. > > > > I've leaved comment in JIRA [2] and new test-case with reproducing this > > issue. > > Basicly, when we receiving GridDhtPartitionsSingleMessage with > exchId=null > > not > > in proper time we've got this Assertion error. Ignite client instance > > erases all it's caches after reconnect, so it has no information about > > cache named 'static-cache' that persists on server nodes and when he > > recieve this SignleMessage after reconnection it will have 'Failed to > > reinitialize local partitions (preloading will be stopped)'. > > > > Should we perform clean-up [3] client caches in case of reconnect client > > ignite instance? > > Why we should clean clinent caches after node reconnects? Can't catch the > > idea of it. > > > > [1] https://issues.apache.org/jira/browse/IGNITE-7791 > > [2] > > https://issues.apache.org/jira/browse/IGNITE-7791? > > focusedCommentId=16391409&page=com.atlassian.jira. > > plugin.system.issuetabpanels:comment-tabpanel#comment-16391409 > > [3] > > https://github.com/apache/ignite/blob/master/modules/ > > core/src/main/java/org/apache/ignite/internal/processors/cache/ > > CacheAffinitySharedManager.java#L190 > > >
