[ 
https://issues.apache.org/jira/browse/IGNITE-16621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Pavlov updated IGNITE-16621:
-----------------------------------
    Labels: ise  (was: )

> AtomicSequence.incrementAndGet() fails intermittently.
> ------------------------------------------------------
>
>                 Key: IGNITE-16621
>                 URL: https://issues.apache.org/jira/browse/IGNITE-16621
>             Project: Ignite
>          Issue Type: Bug
>          Components: data structures
>            Reporter: Vyacheslav Koptilin
>            Assignee: Vyacheslav Koptilin
>            Priority: Major
>              Labels: ise
>             Fix For: 2.13
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Using _IgniteAtomicSequence_ can lead to the following _AssertionError_:
> {noformat}
> java.lang.AssertionError: null
> at 
> org.apache.ignite.internal.processors.datastructures.GridCacheAtomicSequenceImpl$2.call(GridCacheAtomicSequenceImpl.java:307)
> at 
> org.apache.ignite.internal.processors.datastructures.GridCacheAtomicSequenceImpl$2.call(GridCacheAtomicSequenceImpl.java:298)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheUtils.retryTopologySafe(GridCacheUtils.java:1418)
> at 
> org.apache.ignite.internal.processors.datastructures.GridCacheAtomicSequenceImpl.internalUpdate(GridCacheAtomicSequenceImpl.java:230)
> at 
> org.apache.ignite.internal.processors.datastructures.GridCacheAtomicSequenceImpl.incrementAndGet(GridCacheAtomicSequenceImpl.java:135)
> {noformat}
> The following code produces the mentioned error:
> {code:java}
> private Callable<Long> internalUpdate(final long l, final boolean updated) {
>     return new Callable<Long>() {
>         @Override public Long call() throws Exception {
>             assert distUpdateFreeTop.isHeldByCurrentThread() || 
> distUpdateLockedTop.isHeldByCurrentThread();
>             try (GridNearTxLocal tx = CU.txStartInternal(ctx, cacheView, 
> PESSIMISTIC, REPEATABLE_READ)) {
>                 GridCacheAtomicSequenceValue seq = cacheView.get(key);
>                 checkRemoved();
>                 assert seq != null; <-- This assert can trigger the error in 
> case the partition loss policy is IGNORE and the corresponding partition has 
> been lost.
> {code}
> The root cause of the issue is that for in-memory case partition loss policy 
> is IGNORE. Therefore, the following read can return a null value without any 
> exceptions and trigger the mentioned AssertionError.
> {code:java}
> try (GridNearTxLocal tx = CU.txStartInternal(ctx, cacheView, PESSIMISTIC, 
> REPEATABLE_READ)) {
>     GridCacheAtomicSequenceValue seq = cacheView.get(key);
> {code}
> The possible workaround is setting a reasonable number of backups in 
> AtomicConfiguration. Monitoring of lost partitions would be nice as well.
> The proposed solution is quite obvious. Need to change the assert _assert seq 
> != null;_ to explicit check and throw a suitable exception if needed. This 
> should allow the user to detect this and re-create the sequence, for example.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to