Alexey Goncharuk created IGNITE-8790:
----------------------------------------

             Summary: JVM crash during memory recovery
                 Key: IGNITE-8790
                 URL: https://issues.apache.org/jira/browse/IGNITE-8790
             Project: Ignite
          Issue Type: Bug
            Reporter: Alexey Goncharuk


I've observed the following JVM crash after one of the Ignite node restarts on 
2.5 (only relevant part is kept):

{code}
Stack: [0x00007f16f40b8000,0x00007f16f41b9000],  sp=0x00007f16f41b7308,  free 
space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x803675]
J 868  sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V (0 
bytes) @ 0x00007f173d351ca1 [0x00007f173d351bc0+0xe1]
J 3023 C1 
org.apache.ignite.internal.util.GridUnsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V
 (77 bytes) @ 0x00007f173d9e8d64 [0x00007f173d9e8ae0+0x284]
J 2991 C1 org.apache.ignite.internal.pagemem.PageUtils.putBytes(JI[B)V (73 
bytes) @ 0x00007f173d9e1dbc [0x00007f173d9e1d00+0xbc]
j  
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(Lorg/apache/ignite/internal/processors/cache/persistence/GridCacheDatabaseSharedManager$CheckpointStatus;ZLorg/apache/ignite/internal/processors/cache/persistence/pagemem/PageMemoryEx;)Lorg/apache/ignite/internal/pagemem/wal/WALPointer;+568
j  
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(Lorg/apache/ignite/internal/processors/cache/persistence/GridCacheDatabaseSharedManager$CheckpointStatus;)Lorg/apache/ignite/internal/pagemem/wal/WALPointer;+13
j  
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(Ljava/util/List;)V+173
j  
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest(Z)Lorg/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture$ExchangeType;+311
j  
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(Z)V+574
j  
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0()V+547
j  
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body()V+3
j  org.apache.ignite.internal.util.worker.GridWorker.run()V+82
j  java.lang.Thread.run()V+11
v  ~StubRoutines::call_stub
V  [libjvm.so+0x695b96]
V  [libjvm.so+0x6960a1]
V  [libjvm.so+0x696537]
V  [libjvm.so+0x71596e]
V  [libjvm.so+0xa7f243]
V  [libjvm.so+0xa7f38c]
V  [libjvm.so+0x92e0f8]
C  [libpthread.so.0+0x76ba]  start_thread+0xca
{code}

Looks like that the issue is caused by a page which ID was rotated and the node 
failed before checkpoint is finished. Then, on the second node restart, the 
page was written to the disk, but node was stopped again before the checkpoint 
marker was written.
Then, on second node restart we attempt to write-lock the page, but lock fails 
because the page tag logged to WAL is different then the one written in the 
store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to