[jira] [Updated] (IGNITE-21478) OOM crash with unstable topology

Luchnikov Alexander (Jira) Wed, 07 Feb 2024 01:54:03 -0800


     [ 
https://issues.apache.org/jira/browse/IGNITE-21478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Luchnikov Alexander updated IGNITE-21478:
-----------------------------------------
    Description: 
User cases:
1) Frequent entry/exit of a thick client into the topology leads to a crash of 
the server node due to OMM.
2) Frequent creation and destroy of caches leads to a server node crash due to 
OOM.
 topVer=20098

Part of the log before the OOM crash, pay attention to *topVer=20098*
{code:java}
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=f080abcd, uptime=3 days, 09:00:55.274]
    ^-- Cluster [hosts=4, CPUs=6, servers=2, clients=2, topVer=20098, 
minorTopVer=6]
    ^-- Network [addrs=[192.168.1.2, 127.0.0.1], discoPort=47500, 
commPort=47100]
    ^-- CPU [CPUs=2, curLoad=86.83%, avgLoad=21.9%, GC=23.9%]
    ^-- Heap [used=867MB, free=15.29%, comm=1024MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=7, qSize=0]
    ^-- System thread pool [active=0, idle=8, qSize=0]
    ^-- Striped thread pool [active=0, idle=8, qSize=0]
{code}

Histogram from heap-dump after node failed
 !histo.png! 

*MinorTop example
*
{code:java}
    @Test
    public void testMinorVer() throws Exception {
        Ignite server = startGrids(1);
        IgniteEx client = startClientGrid();
        String cacheName = "cacheName";
        for (int i = 0; i < 500; i++) {
            client.getOrCreateCache(cacheName);
            client.destroyCache(cacheName);
        }
        System.err.println("Heap dump time");
        Thread.sleep(1000000);
    }
{code}

{code:java}
[INFO 
][exchange-worker-#149%internal.IgniteOomTest%][GridCachePartitionExchangeManager]
 AffinityTopologyVersion [topVer=2, minorTopVer=1000], 
evt=DISCOVERY_CUSTOM_EVT, evtNode=52b4c130-1a01-4858-813a-ebc8a5dabf1e, 
client=true]
{code}











  was:
User cases:
1) Frequent entry/exit of a thick client into the topology leads to a crash of 
the server node due to OMM.
2) Frequent creation and destroy of caches leads to a server node crash due to 
OOM.
 topVer=20098

Part of the log before the OOM crash, pay attention to *topVer=20098*
{code:java}
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=f080abcd, uptime=3 days, 09:00:55.274]
    ^-- Cluster [hosts=4, CPUs=6, servers=2, clients=2, topVer=20098, 
minorTopVer=6]
    ^-- Network [addrs=[192.168.1.2, 127.0.0.1], discoPort=47500, 
commPort=47100]
    ^-- CPU [CPUs=2, curLoad=86.83%, avgLoad=21.9%, GC=23.9%]
    ^-- Heap [used=867MB, free=15.29%, comm=1024MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=7, qSize=0]
    ^-- System thread pool [active=0, idle=8, qSize=0]
    ^-- Striped thread pool [active=0, idle=8, qSize=0]
{code}

Histogram from heap-dump after node failed
 !histo.png! 

MinorTop example

{code:java}
    @Test
    public void testMinorVer() throws Exception {
        Ignite server = startGrids(1);
        IgniteEx client = startClientGrid();
        String cacheName = "cacheName";
        for (int i = 0; i < 500; i++) {
            client.getOrCreateCache(cacheName);
            client.destroyCache(cacheName);
        }
        System.err.println("Heap dump time");
        Thread.sleep(1000000);
    }
{code}

{code:java}
[INFO 
][exchange-worker-#149%internal.IgniteOomTest%][GridCachePartitionExchangeManager]
 AffinityTopologyVersion [topVer=2, minorTopVer=1000], 
evt=DISCOVERY_CUSTOM_EVT, evtNode=52b4c130-1a01-4858-813a-ebc8a5dabf1e, 
client=true]
{code}










> OOM crash with unstable topology
> --------------------------------
>
>                 Key: IGNITE-21478
>                 URL: https://issues.apache.org/jira/browse/IGNITE-21478
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Luchnikov Alexander
>            Priority: Minor
>              Labels: ise
>         Attachments: HistoMinorTop.png, histo.png
>
>
> User cases:
> 1) Frequent entry/exit of a thick client into the topology leads to a crash 
> of the server node due to OMM.
> 2) Frequent creation and destroy of caches leads to a server node crash due 
> to OOM.
>  topVer=20098
> Part of the log before the OOM crash, pay attention to *topVer=20098*
> {code:java}
> Metrics for local node (to disable set 'metricsLogFrequency' to 0)
>     ^-- Node [id=f080abcd, uptime=3 days, 09:00:55.274]
>     ^-- Cluster [hosts=4, CPUs=6, servers=2, clients=2, topVer=20098, 
> minorTopVer=6]
>     ^-- Network [addrs=[192.168.1.2, 127.0.0.1], discoPort=47500, 
> commPort=47100]
>     ^-- CPU [CPUs=2, curLoad=86.83%, avgLoad=21.9%, GC=23.9%]
>     ^-- Heap [used=867MB, free=15.29%, comm=1024MB]
>     ^-- Outbound messages queue [size=0]
>     ^-- Public thread pool [active=0, idle=7, qSize=0]
>     ^-- System thread pool [active=0, idle=8, qSize=0]
>     ^-- Striped thread pool [active=0, idle=8, qSize=0]
> {code}
> Histogram from heap-dump after node failed
>  !histo.png! 
> *MinorTop example
> *
> {code:java}
>     @Test
>     public void testMinorVer() throws Exception {
>         Ignite server = startGrids(1);
>         IgniteEx client = startClientGrid();
>         String cacheName = "cacheName";
>         for (int i = 0; i < 500; i++) {
>             client.getOrCreateCache(cacheName);
>             client.destroyCache(cacheName);
>         }
>         System.err.println("Heap dump time");
>         Thread.sleep(1000000);
>     }
> {code}
> {code:java}
> [INFO 
> ][exchange-worker-#149%internal.IgniteOomTest%][GridCachePartitionExchangeManager]
>  AffinityTopologyVersion [topVer=2, minorTopVer=1000], 
> evt=DISCOVERY_CUSTOM_EVT, evtNode=52b4c130-1a01-4858-813a-ebc8a5dabf1e, 
> client=true]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-21478) OOM crash with unstable topology

Reply via email to