[
https://issues.apache.org/jira/browse/IGNITE-28804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18090585#comment-18090585
]
Aleksandr Chesnokov commented on IGNITE-28804:
----------------------------------------------
[~alex_pl] Thanks for quick review!
> Flaky GridCacheContinuousQueryMultiNodesFilteringTest#testWithNodeFilter
> ------------------------------------------------------------------------
>
> Key: IGNITE-28804
> URL: https://issues.apache.org/jira/browse/IGNITE-28804
> Project: Ignite
> Issue Type: Bug
> Reporter: Aleksandr Chesnokov
> Assignee: Aleksandr Chesnokov
> Priority: Minor
> Labels: MakeTeamcityGreenAgain, ise
> Fix For: 2.19
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> Flaky GridCacheContinuousQueryMultiNodesFilteringTest#testWithNodeFilter from
> Continuous Queries 4 suite is flaky:
> [https://ci2.ignite.apache.org/test/4649996366513987762?currentProjectId=IgniteTests24Java8&branch=%3Cdefault%3E|https://ci2.ignite.apache.org/test/-3060366423503188690?currentProjectId=IgniteTests24Java8]
> For my local machine it requires about 6 runs to reproduce the bug
> Fails with "Timeout of waiting for topology map update" on second
> awaitPartitionMapExchange
>
> UPD: Root cause is that the test used {{ClusterNode.id()}} in the node
> filter. This worked for normal test nodes, because their UUIDs end with
> {{{}0{}}}, {{{}1{}}}, {{{}2{}}}. But during baseline affinity calculation
> Ignite can also call this filter for a {{{}DetachedClusterNode{}}}. This is a
> special baseline node representation, and its UUID is random. Because of
> that, the filtered {{grid2}} could sometimes pass the regex as a detached
> node.
> After that Ignite mapped this detached node back to the real {{grid2}} by
> {{{}consistentId{}}}, so affinity expected {{grid2}} to own some partitions.
> But the cache was not actually started on the real {{{}grid2{}}}, because the
> real node did not pass the filter. So {{awaitPartitionMapExchange()}} kept
> seeing {{affNodesCnt=2}} and {{ownersCnt=1}} and timed out.
> The fix is to use stable {{ATTR_IGNITE_INSTANCE_NAME}} instead of runtime
> node UUID in the filter.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)