[ https://issues.apache.org/jira/browse/IGNITE-16621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitry Pavlov updated IGNITE-16621: ----------------------------------- Labels: ise (was: ) > AtomicSequence.incrementAndGet() fails intermittently. > ------------------------------------------------------ > > Key: IGNITE-16621 > URL: https://issues.apache.org/jira/browse/IGNITE-16621 > Project: Ignite > Issue Type: Bug > Components: data structures > Reporter: Vyacheslav Koptilin > Assignee: Vyacheslav Koptilin > Priority: Major > Labels: ise > Fix For: 2.13 > > Time Spent: 20m > Remaining Estimate: 0h > > Using _IgniteAtomicSequence_ can lead to the following _AssertionError_: > {noformat} > java.lang.AssertionError: null > at > org.apache.ignite.internal.processors.datastructures.GridCacheAtomicSequenceImpl$2.call(GridCacheAtomicSequenceImpl.java:307) > at > org.apache.ignite.internal.processors.datastructures.GridCacheAtomicSequenceImpl$2.call(GridCacheAtomicSequenceImpl.java:298) > at > org.apache.ignite.internal.processors.cache.GridCacheUtils.retryTopologySafe(GridCacheUtils.java:1418) > at > org.apache.ignite.internal.processors.datastructures.GridCacheAtomicSequenceImpl.internalUpdate(GridCacheAtomicSequenceImpl.java:230) > at > org.apache.ignite.internal.processors.datastructures.GridCacheAtomicSequenceImpl.incrementAndGet(GridCacheAtomicSequenceImpl.java:135) > {noformat} > The following code produces the mentioned error: > {code:java} > private Callable<Long> internalUpdate(final long l, final boolean updated) { > return new Callable<Long>() { > @Override public Long call() throws Exception { > assert distUpdateFreeTop.isHeldByCurrentThread() || > distUpdateLockedTop.isHeldByCurrentThread(); > try (GridNearTxLocal tx = CU.txStartInternal(ctx, cacheView, > PESSIMISTIC, REPEATABLE_READ)) { > GridCacheAtomicSequenceValue seq = cacheView.get(key); > checkRemoved(); > assert seq != null; <-- This assert can trigger the error in > case the partition loss policy is IGNORE and the corresponding partition has > been lost. > {code} > The root cause of the issue is that for in-memory case partition loss policy > is IGNORE. Therefore, the following read can return a null value without any > exceptions and trigger the mentioned AssertionError. > {code:java} > try (GridNearTxLocal tx = CU.txStartInternal(ctx, cacheView, PESSIMISTIC, > REPEATABLE_READ)) { > GridCacheAtomicSequenceValue seq = cacheView.get(key); > {code} > The possible workaround is setting a reasonable number of backups in > AtomicConfiguration. Monitoring of lost partitions would be nice as well. > The proposed solution is quite obvious. Need to change the assert _assert seq > != null;_ to explicit check and throw a suitable exception if needed. This > should allow the user to detect this and re-create the sequence, for example. -- This message was sent by Atlassian Jira (v8.20.10#820010)