[ https://issues.apache.org/jira/browse/IGNITE-14474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335430#comment-17335430 ]
Denis Chudov edited comment on IGNITE-14474 at 4/29/21, 12:45 PM: ------------------------------------------------------------------ [~Smolnikov] LGTM, pls fix typo in test (you can just apply suggested change on github) and proceed to commiter's review. was (Author: denis chudov): [~Smolnikov] LGTM, pls fix typo in test (you can just apply suggested change on github) and proceed to core team review. > Improve error message in case rebalance fails > --------------------------------------------- > > Key: IGNITE-14474 > URL: https://issues.apache.org/jira/browse/IGNITE-14474 > Project: Ignite > Issue Type: Improvement > Affects Versions: 2.5 > Reporter: Denis Chudov > Assignee: Rodion > Priority: Major > Fix For: 2.9.2 > > Time Spent: 1h > Remaining Estimate: 0h > > Currently we can get a message like this when rebalance fails with an > exception (examples from ignite 2.5, in newer versions the log messages were > changed but the problem is still actual): > {code:java} > 2019-11-27 13:41:14,504[WARN ][utility-#79%xxx%][GridDhtPartitionDemander] > Rebalancing from node cancelled [grp=ignite-sys-cache, > topVer=AffinityTopologyVersion [topVer=1932, minorTopVer=1], > supplier=f014f30a-77f2-4459-aa5b-6c12907a7449, topic=0]. Supply message > couldn't be unmarshalled: class o.a.i.IgniteCheckedException: Failed to > unmarshal object with optimized marshaller > 2019-11-27 13:41:14,504[INFO ][utility-#79%xxx%][GridDhtPartitionDemander] > Cancelled rebalancing [grp=ignite-sys-cache, > supplier=f014f30a-77f2-4459-aa5b-6c12907a7449, topVer=AffinityTopologyVersion > [topVer=1932, minorTopVer=1], time=88 ms] > 2019-11-27 13:41:14,508[WARN ][utility-#76%xxx%][GridDhtPartitionDemander] > Rebalancing from node cancelled [grp=ignite-sys-cache, > topVer=AffinityTopologyVersion [topVer=1932, minorTopVer=1], > supplier=dfa5ee06-48c9-4458-ae55-48cc6ceda998, topic=0]. Supply message > couldn't be unmarshalled: class o.a.i.IgniteCheckedException: Failed to > unmarshal object with optimized marshaller > {code} > In the case above, a marshalling exception leads to rebalance failure which > will never be resolved - i.e. the cluster enters into a erroneous state. > We should report issues like this as ERROR. The message should explain that > the rebalance has failed, data for the cache was not fully copied to the > node, the backup factor is not recovered and the cluster may not work > correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005)