[ 
https://issues.apache.org/jira/browse/IGNITE-17385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Shishkov updated IGNITE-17385:
-----------------------------------
    Description: 
When you commit a transaction, which was _explicitly started only over a single 
cache_, then {{GridCacheAdapter#asyncOpRelease}} is called without 
{{GridCacheAdapter#asyncOpAcquire}}. This situation can lead to continuous grow 
of permits count in {{GridCacheAdapter#asyncOpsSem}} and to overflow with a 
further failure  of node started the transaction:
{code}
Critical system error detected. Will be handled accordingly to configured 
handler 
[hnd=o.a.i.i.processors.cache.transactions.TxAsyncOpsSemaphorePermitsExeededTest$$Lambda$42/1924582348@7379bebb,
 failureCtx=FailureContext [type=CRITICAL_ERROR, err=java.lang.Error: Maximum 
permit count exceeded]]
{code}

As you can see in [1], in case of the single cache context, transaction will be 
commited by calling of {{GridCacheAdapter#commitTxAsync}}, which invokes 
{{GridCacheAdapter#asyncOpRelease}} later. But, when multiple caches affected 
by transaction, {{GridNearTxLocal#commitNearTxLocalAsync}} is called to commit 
transaction, and no invokes of {{GridCacheAdapter#asyncOpRelease}} occurs.

So, the greater the load (RPS / TPS) with a such single cache transactions, the 
faster the failure of node will occur.

Reproducer of the problem:  [^SemaphorePermitsExceeded.patch]. It prints 
additional messages, when semaphore is released, or acquired.

Links:
# 
https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/GridCacheSharedContext.java#L1122

  was:
When you commit a transaction, which was _explicitly started only over a single 
cache_, then {{GridCacheAdapter#asyncOpRelease}} is called without 
{{GridCacheAdapter#asyncOpAcquire}}. This situation can lead to continuous grow 
of permits count in {{GridCacheAdapter#asyncOpsSem}} and to overflow with a 
further failure  of node started the transaction:
{code}
Critical system error detected. Will be handled accordingly to configured 
handler 
[hnd=o.a.i.i.processors.cache.transactions.TxAsyncOpsSemaphorePermitsExeededTest$$Lambda$42/1924582348@7379bebb,
 failureCtx=FailureContext [type=CRITICAL_ERROR, err=java.lang.Error: Maximum 
permit count exceeded]]
{code}

As you can see in [1], for the single cache context transaction is commited by 
calling of {{GridCacheAdapter#commitTxAsync}}, which invokes 
{{GridCacheAdapter#asyncOpRelease}} later. But, when multiple caches affected 
by transaction, {{GridNearTxLocal#commitNearTxLocalAsync}} is called to commit 
transaction, and no invokes of {{GridCacheAdapter#asyncOpRelease}} occurs.

So, the greater the load (RPS / TPS) with a such single cache transactions, the 
faster the failure of node will occur.

Reproducer of the problem:  [^SemaphorePermitsExceeded.patch]. It prints 
additional messages, when semaphore is released, or acquired.

Links:
# 
https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/GridCacheSharedContext.java#L1122


> Frequent commits of single cache transactions can lead 
> GridCacheAdapter#asyncOpsSem permits overflow
> ----------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-17385
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17385
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.13
>            Reporter: Ilya Shishkov
>            Priority: Major
>              Labels: ise
>         Attachments: SemaphorePermitsExceeded.patch
>
>
> When you commit a transaction, which was _explicitly started only over a 
> single cache_, then {{GridCacheAdapter#asyncOpRelease}} is called without 
> {{GridCacheAdapter#asyncOpAcquire}}. This situation can lead to continuous 
> grow of permits count in {{GridCacheAdapter#asyncOpsSem}} and to overflow 
> with a further failure  of node started the transaction:
> {code}
> Critical system error detected. Will be handled accordingly to configured 
> handler 
> [hnd=o.a.i.i.processors.cache.transactions.TxAsyncOpsSemaphorePermitsExeededTest$$Lambda$42/1924582348@7379bebb,
>  failureCtx=FailureContext [type=CRITICAL_ERROR, err=java.lang.Error: Maximum 
> permit count exceeded]]
> {code}
> As you can see in [1], in case of the single cache context, transaction will 
> be commited by calling of {{GridCacheAdapter#commitTxAsync}}, which invokes 
> {{GridCacheAdapter#asyncOpRelease}} later. But, when multiple caches affected 
> by transaction, {{GridNearTxLocal#commitNearTxLocalAsync}} is called to 
> commit transaction, and no invokes of {{GridCacheAdapter#asyncOpRelease}} 
> occurs.
> So, the greater the load (RPS / TPS) with a such single cache transactions, 
> the faster the failure of node will occur.
> Reproducer of the problem:  [^SemaphorePermitsExceeded.patch]. It prints 
> additional messages, when semaphore is released, or acquired.
> Links:
> # 
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/GridCacheSharedContext.java#L1122



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to