Re: MTCGA: IGNITE-7791 and GridDhtPartitionsSingleMessage

2018-04-18 Thread Maxim Muzafarov
Igniters,
especially who participated in implementaion of IEP-4. I need your advice.

I've prepeared fixes for this flaky test. Can you review my changes [1] and
share your thoughts?

In my opinion, we should not perform caches info removing from
LocalJoinCachesContext captured state on reconnect event happened.
It leads to incorrect CacheAffinityChangeMessage message processing after
client node reconnects.

I've prepeared all needs for review:
1) PR [1]
2) JIRA issue [2]
3) 100% reproducer for this case [3] (passes after changes)
4) Report [4] run testReconnectCacheDestroyedAndCreated 200 times (passes
OK)
5) Run IgniteClientNodesTestSuite [5] (passes OK)
6) Run ALL Build [6] (have some FAIL but not related to this issue)
7) My vision to these fixes in JIRA comment [7]
8) Upsource review created [8]

[1] https://github.com/apache/ignite/pull/3779
[2] https://issues.apache.org/jira/browse/IGNITE-7791
[3]
https://ci.ignite.apache.org/viewLog.html?currentGroup=test=org.apache.ignite.testsuites.IgniteClientNodesTestSuite%23teamcity%23org.apache.ignite.internal%23teamcity%23IgniteClientReconnectDelayedSpiTest=1=DURATION_DESC=20===IgniteTests24Java8_ClientNodes=1192536=testsInfo
[4]
https://ci.ignite.apache.org/viewLog.html?buildId=1196532=IgniteTests24Java8_CacheFailover1=testsInfo
[5]
https://ci.ignite.apache.org/viewLog.html?buildId=1192536=IgniteTests24Java8_ClientNodes=testsInfo
[6] https://ci.ignite.apache.org/viewLog.html?buildId=1211532
[7]
https://issues.apache.org/jira/browse/IGNITE-7791?focusedCommentId=16437292=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16437292
[8] https://reviews.ignite.apache.org/ignite/review/IGNT-CR-558

пн, 19 мар. 2018 г. в 11:50, Alexey Goncharuk :

> Hello Maxim,
>
> SingleMessage with exchId=null is sent when a node updates local
> partitions' state and schedules a background cluster notification. In
> contrast, when a partition map exchange happends, it is completed with
> exchId != null.
>
> I need more context regarding how this message interferes with the exchange
> and what the difference between the two messages is so that during the
> regular scenario the assertion does not happen.
>
> 2018-03-13 20:58 GMT+03:00 Maxim Muzafarov :
>
> > Hi all,
> >
> > I'm working on [1] IgniteClientReconnectCacheTest class with frakly
> > test-case testReconnectCacheDestroyedAndCreated with success rate 32.4%.
> >
> > I've leaved comment in JIRA [2] and new test-case with reproducing this
> > issue.
> > Basicly, when we receiving GridDhtPartitionsSingleMessage with
> exchId=null
> > not
> > in proper time we've got this Assertion error. Ignite client instance
> > erases all it's caches after reconnect, so it has no information about
> > cache named 'static-cache' that persists on server nodes and when he
> > recieve this SignleMessage after reconnection it will have 'Failed to
> > reinitialize local partitions (preloading will be stopped)'.
> >
> > Should we perform clean-up [3] client caches in case of reconnect client
> > ignite instance?
> > Why we should clean clinent caches after node reconnects? Can't catch the
> > idea of it.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-7791
> > [2]
> > https://issues.apache.org/jira/browse/IGNITE-7791?
> > focusedCommentId=16391409=com.atlassian.jira.
> > plugin.system.issuetabpanels:comment-tabpanel#comment-16391409
> > [3]
> > https://github.com/apache/ignite/blob/master/modules/
> > core/src/main/java/org/apache/ignite/internal/processors/cache/
> > CacheAffinitySharedManager.java#L190
> >
>


Re: MTCGA: IGNITE-7791 and GridDhtPartitionsSingleMessage

2018-03-19 Thread Alexey Goncharuk
Hello Maxim,

SingleMessage with exchId=null is sent when a node updates local
partitions' state and schedules a background cluster notification. In
contrast, when a partition map exchange happends, it is completed with
exchId != null.

I need more context regarding how this message interferes with the exchange
and what the difference between the two messages is so that during the
regular scenario the assertion does not happen.

2018-03-13 20:58 GMT+03:00 Maxim Muzafarov :

> Hi all,
>
> I'm working on [1] IgniteClientReconnectCacheTest class with frakly
> test-case testReconnectCacheDestroyedAndCreated with success rate 32.4%.
>
> I've leaved comment in JIRA [2] and new test-case with reproducing this
> issue.
> Basicly, when we receiving GridDhtPartitionsSingleMessage with exchId=null
> not
> in proper time we've got this Assertion error. Ignite client instance
> erases all it's caches after reconnect, so it has no information about
> cache named 'static-cache' that persists on server nodes and when he
> recieve this SignleMessage after reconnection it will have 'Failed to
> reinitialize local partitions (preloading will be stopped)'.
>
> Should we perform clean-up [3] client caches in case of reconnect client
> ignite instance?
> Why we should clean clinent caches after node reconnects? Can't catch the
> idea of it.
>
> [1] https://issues.apache.org/jira/browse/IGNITE-7791
> [2]
> https://issues.apache.org/jira/browse/IGNITE-7791?
> focusedCommentId=16391409=com.atlassian.jira.
> plugin.system.issuetabpanels:comment-tabpanel#comment-16391409
> [3]
> https://github.com/apache/ignite/blob/master/modules/
> core/src/main/java/org/apache/ignite/internal/processors/cache/
> CacheAffinitySharedManager.java#L190
>


Re: MTCGA: IGNITE-7791 and GridDhtPartitionsSingleMessage

2018-03-16 Thread Dmitry Pavlov
Hi Maxim,

I didn't know answer, so I decided to provide at least general intro
information. I hope it would be useful for you and for newcomers.

https://cwiki.apache.org/confluence/display/IGNITE/%28Partition+Map%29+Exchange+-+under+the+hood

Sincerely,
Dmitriy Pavlov

вт, 13 мар. 2018 г. в 21:28, Dmitry Pavlov :

> Hi Alexey,
>
> Could you help with this question?
>
> I've observed such messages, it were probably sent by timeout, but not
> sure their purpose.
>
> Sincerely,
> Dmitriy Pavlov
>
> вт, 13 мар. 2018 г. в 20:58, Maxim Muzafarov :
>
>> Hi all,
>>
>> I'm working on [1] IgniteClientReconnectCacheTest class with frakly
>> test-case testReconnectCacheDestroyedAndCreated with success rate 32.4%.
>>
>> I've leaved comment in JIRA [2] and new test-case with reproducing this
>> issue.
>> Basicly, when we receiving GridDhtPartitionsSingleMessage with
>> exchId=null not
>> in proper time we've got this Assertion error. Ignite client instance
>> erases all it's caches after reconnect, so it has no information about
>> cache named 'static-cache' that persists on server nodes and when he
>> recieve this SignleMessage after reconnection it will have 'Failed to
>> reinitialize local partitions (preloading will be stopped)'.
>>
>> Should we perform clean-up [3] client caches in case of reconnect client
>> ignite instance?
>> Why we should clean clinent caches after node reconnects? Can't catch the
>> idea of it.
>>
>> [1] https://issues.apache.org/jira/browse/IGNITE-7791
>> [2]
>>
>> https://issues.apache.org/jira/browse/IGNITE-7791?focusedCommentId=16391409=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16391409
>> [3]
>>
>> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/CacheAffinitySharedManager.java#L190
>>
>


Re: MTCGA: IGNITE-7791 and GridDhtPartitionsSingleMessage

2018-03-13 Thread Dmitry Pavlov
Hi Alexey,

Could you help with this question?

I've observed such messages, it were probably sent by timeout, but not sure
their purpose.

Sincerely,
Dmitriy Pavlov

вт, 13 мар. 2018 г. в 20:58, Maxim Muzafarov :

> Hi all,
>
> I'm working on [1] IgniteClientReconnectCacheTest class with frakly
> test-case testReconnectCacheDestroyedAndCreated with success rate 32.4%.
>
> I've leaved comment in JIRA [2] and new test-case with reproducing this
> issue.
> Basicly, when we receiving GridDhtPartitionsSingleMessage with exchId=null
> not
> in proper time we've got this Assertion error. Ignite client instance
> erases all it's caches after reconnect, so it has no information about
> cache named 'static-cache' that persists on server nodes and when he
> recieve this SignleMessage after reconnection it will have 'Failed to
> reinitialize local partitions (preloading will be stopped)'.
>
> Should we perform clean-up [3] client caches in case of reconnect client
> ignite instance?
> Why we should clean clinent caches after node reconnects? Can't catch the
> idea of it.
>
> [1] https://issues.apache.org/jira/browse/IGNITE-7791
> [2]
>
> https://issues.apache.org/jira/browse/IGNITE-7791?focusedCommentId=16391409=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16391409
> [3]
>
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/CacheAffinitySharedManager.java#L190
>


MTCGA: IGNITE-7791 and GridDhtPartitionsSingleMessage

2018-03-13 Thread Maxim Muzafarov
Hi all,

I'm working on [1] IgniteClientReconnectCacheTest class with frakly
test-case testReconnectCacheDestroyedAndCreated with success rate 32.4%.

I've leaved comment in JIRA [2] and new test-case with reproducing this
issue.
Basicly, when we receiving GridDhtPartitionsSingleMessage with exchId=null not
in proper time we've got this Assertion error. Ignite client instance
erases all it's caches after reconnect, so it has no information about
cache named 'static-cache' that persists on server nodes and when he
recieve this SignleMessage after reconnection it will have 'Failed to
reinitialize local partitions (preloading will be stopped)'.

Should we perform clean-up [3] client caches in case of reconnect client
ignite instance?
Why we should clean clinent caches after node reconnects? Can't catch the
idea of it.

[1] https://issues.apache.org/jira/browse/IGNITE-7791
[2]
https://issues.apache.org/jira/browse/IGNITE-7791?focusedCommentId=16391409=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16391409
[3]
https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/CacheAffinitySharedManager.java#L190