Dmitry, There are other cases that can result in inconsistent state of Atomic cache with 2 or more backups.
1. For PRIMARY_SYNC. Primary sends requests to all backups and respond to near.... and then one of backup update fails. Will primary retry update operation? I doubt. 2. For all sync modes. Primary sends request to 1-st backup and fails to send to 2-nd backup... and then near node sudden death happens. No one will retry as near has gone. On Tue, Jun 5, 2018 at 7:16 PM, Dmitriy Govorukhin < dmitriy.govoruk...@gmail.com> wrote: > Denis, > > Seem that you right, it is a problem. > I guess in this case primary node should send CachePartialUpdateException > to near node. > > On Tue, Jun 5, 2018 at 6:13 PM, Denis Garus <garus....@gmail.com> wrote: > > > Fix formatting > > > > Hello Igniters! > > > > I have found some confusing behavior of atomic partitioned cache with > > `PRIMARY_SYNC` write synchronization mode. > > Node with a primary partition sends a message to remote nodes with backup > > partitions via `GridDhtAtomicAbstractUpdateFuture#sendDhtRequests`. > > If during of sending occurs an error then it, in fact, will be ignored, > see > > [1]: > > ``` > > try { > > .... > > > > cctx.io().send(req.nodeId(), req, cctx.ioPolicy()); > > > > .... > > } > > catch (ClusterTopologyCheckedException ignored) { > > .... > > > > registerResponse(req.nodeId()); > > } > > catch (IgniteCheckedException ignored) { > > .... > > > > registerResponse(req.nodeId()); > > } > > > > ``` > > This behavior results in the primary partition and backup partitions have > > the different value for given key. > > > > There is the reproducer [2]. > > > > Should we consider this behavior as valid? > > > > [1]. > > https://github.com/dgarus/ignite/blob/d473b507f04e2ec843c1da1066d890 > > 8e882396d7/modules/core/src/main/java/org/apache/ignite/ > > internal/processors/cache/distributed/dht/atomic/ > > GridDhtAtomicAbstractUpdateFuture.java#L473 > > [2]. > > https://github.com/apache/ignite/pull/4126/files#diff- > > 5e5bfb73bd917d85f56a05552b1d014aR26 > > > > 2018-06-05 17:35 GMT+03:00 Denis Garus <garus....@gmail.com>: > > > > > Hello Igniters! > > > > > > > > > > > > I have found some confusing behavior of atomic partitioned cache with > > > `PRIMARY_SYNC` write synchronization mode. > > > > > > Node with a primary partition sends a message to remote nodes with > backup > > > partitions via `GridDhtAtomicAbstractUpdateFuture#sendDhtRequests`. > > > > > > If during of sending occurs an error then it, in fact, will be ignored, > > > see [1]: > > > > > > ``` > > > > > > try { > > > > > > .... > > > > > > > > > > > > cctx.io().send(req.nodeId(), req, cctx.ioPolicy()); > > > > > > > > > > > > .... > > > > > > } > > > > > > catch (ClusterTopologyCheckedException ignored) { > > > > > > .... > > > > > > > > > > > > registerResponse(req.nodeId()); > > > > > > } > > > > > > catch (IgniteCheckedException ignored) { > > > > > > .... > > > > > > > > > > > > registerResponse(req.nodeId()); > > > > > > } > > > > > > ``` > > > > > > This behavior results in the primary partition and backup partitions > have > > > the different value for given key. > > > > > > > > > > > > There is the reproducer [2]. > > > > > > > > > > > > Should we consider this behavior as valid? > > > > > > > > > > > > [1]. https://github.com/dgarus/ignite/blob/ > > d473b507f04e2ec843c1da1066d890 > > > 8e882396d7/modules/core/src/main/java/org/apache/ignite/ > > > internal/processors/cache/distributed/dht/atomic/ > > > GridDhtAtomicAbstractUpdateFuture.java#L473 > > > > > > [2]. https://github.com/apache/ignite/pull/4126/files#diff- > > > 5e5bfb73bd917d85f56a05552b1d014aR26 > > > > > > -- Best regards, Andrey V. Mashenkov