[ 
https://issues.apache.org/jira/browse/IGNITE-28804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18090585#comment-18090585
 ] 

Aleksandr Chesnokov commented on IGNITE-28804:
----------------------------------------------

[~alex_pl] Thanks for quick review!

> Flaky GridCacheContinuousQueryMultiNodesFilteringTest#testWithNodeFilter
> ------------------------------------------------------------------------
>
>                 Key: IGNITE-28804
>                 URL: https://issues.apache.org/jira/browse/IGNITE-28804
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Aleksandr Chesnokov
>            Assignee: Aleksandr Chesnokov
>            Priority: Minor
>              Labels: MakeTeamcityGreenAgain, ise
>             Fix For: 2.19
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Flaky GridCacheContinuousQueryMultiNodesFilteringTest#testWithNodeFilter from 
> Continuous Queries 4 suite is flaky: 
> [https://ci2.ignite.apache.org/test/4649996366513987762?currentProjectId=IgniteTests24Java8&branch=%3Cdefault%3E|https://ci2.ignite.apache.org/test/-3060366423503188690?currentProjectId=IgniteTests24Java8]
> For my local machine it requires about 6 runs to reproduce the bug 
> Fails with "Timeout of waiting for topology map update" on second 
> awaitPartitionMapExchange
>  
> UPD: Root cause is that the test used {{ClusterNode.id()}} in the node 
> filter. This worked for normal test nodes, because their UUIDs end with 
> {{{}0{}}}, {{{}1{}}}, {{{}2{}}}. But during baseline affinity calculation 
> Ignite can also call this filter for a {{{}DetachedClusterNode{}}}. This is a 
> special baseline node representation, and its UUID is random. Because of 
> that, the filtered {{grid2}} could sometimes pass the regex as a detached 
> node.
> After that Ignite mapped this detached node back to the real {{grid2}} by 
> {{{}consistentId{}}}, so affinity expected {{grid2}} to own some partitions. 
> But the cache was not actually started on the real {{{}grid2{}}}, because the 
> real node did not pass the filter. So {{awaitPartitionMapExchange()}} kept 
> seeing {{affNodesCnt=2}} and {{ownersCnt=1}} and timed out.
> The fix is to use stable {{ATTR_IGNITE_INSTANCE_NAME}} instead of runtime 
> node UUID in the filter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to