[jira] [Updated] (IGNITE-7731) ClassCastException at restarted node if killed during checkpoint
[ https://issues.apache.org/jira/browse/IGNITE-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Ozerov updated IGNITE-7731: Fix Version/s: (was: 2.7) > ClassCastException at restarted node if killed during checkpoint > > > Key: IGNITE-7731 > URL: https://issues.apache.org/jira/browse/IGNITE-7731 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.3 >Reporter: Ksenia Rybakova >Priority: Major > Attachments: > 120728_id11_172.25.1.49_cache-random-benchmark-2-backup.log, > 121729_id11-1_172.25.1.49cache-random-benchmark-2-backup.log, > ignite-base-load-config.xml, run-load.properties, run-load.xml > > > During failover test restarted node fails to start with the following > exception: > {noformat} > [2018-02-15 12:17:46,388][INFO > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Read checkpoint status > [startMarker=/storage/ssd/krybakova/20181502-120637-2.3.0-SNAPSHOT-failover-756ae8d4-c12-s24-p20-r40-b2-d7200/yardstick/work/db/node00-730721d0-e532-4f3a-b9e9-29277c0b7a9a/cp/1518685946892-39fa4858-66cb-4c88-9a1c-13a8625e1158-START.bin, > endMarker=null] > [2018-02-15 12:17:46,389][INFO > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Checking memory state > [lastValidPos=FileWALPointer [idx=0, fileOffset=0, len=0, forceFlush=false], > lastMarked=FileWALPointer [idx=1, fileOffset=47809760, len=177151, > forceFlush=false], lastCheckpointId=39fa4858-66cb-4c88-9a1c-13a8625e1158] > [2018-02-15 12:17:46,389][WARN > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Ignite node stopped in > the middle of checkpoint. Will restore memory state and finish checkpoint on > node start. > [2018-02-15 > 12:17:46,448][ERROR][exchange-worker-#62][GridDhtPartitionsExchangeFuture] > Failed to reinitialize local partitions (preloading will be stopped): > GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=38, > minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode > [id=b8684b5c-29f5-41db-bedc-f4b4ee4cab6b, addrs=[127.0.0.1, 172.25.1.49], > sockAddrs=[lab49.gridgain.local/172.25.1.49:47500, /127.0.0.1:47500], > discPort=47500, order=38, intOrder=37, lastExchangeTime=1518686253183, > loc=true, ver=2.3.0#20180213-sha1:756ae8d4, isClient=false], topVer=38, > nodeId8=b8684b5c, msg=null, type=NODE_JOINED, tstamp=1518686266006], > nodeId=b8684b5c, evt=NODE_JOINED] > java.lang.ClassCastException: > org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl cannot be cast > to > org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryEx > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.getPageMemoryForCacheGroup(GridCacheDatabaseSharedManager.java:1595) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1533) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:568) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCachesOnLocalJoin(GridDhtPartitionsExchangeFuture.java:724) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:611) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279) > at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:748) > {noformat} > This happens if node was killed during checkpoint (it seems only during the > first one). > Load conifg: > * Yardstick with CacheRandomOperationBenchmark > * 12 client nodes, 24 server nodes, 12 hosts (2 per host). The issue is also > reproduced when restarted node is 1 per host. > * Several caches with different configs: pds/in memory, tx/atomic, > with/without eviction etc. No dynamic caches. Complete configs are attached. > * 1 node is restarted periodically. > Logs of restarted node are attached. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7731) ClassCastException at restarted node if killed during checkpoint
[ https://issues.apache.org/jira/browse/IGNITE-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolay Izhikov updated IGNITE-7731: Fix Version/s: (was: 2.8) 2.7 > ClassCastException at restarted node if killed during checkpoint > > > Key: IGNITE-7731 > URL: https://issues.apache.org/jira/browse/IGNITE-7731 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.3 >Reporter: Ksenia Rybakova >Priority: Major > Fix For: 2.7 > > Attachments: > 120728_id11_172.25.1.49_cache-random-benchmark-2-backup.log, > 121729_id11-1_172.25.1.49cache-random-benchmark-2-backup.log, > ignite-base-load-config.xml, run-load.properties, run-load.xml > > > During failover test restarted node fails to start with the following > exception: > {noformat} > [2018-02-15 12:17:46,388][INFO > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Read checkpoint status > [startMarker=/storage/ssd/krybakova/20181502-120637-2.3.0-SNAPSHOT-failover-756ae8d4-c12-s24-p20-r40-b2-d7200/yardstick/work/db/node00-730721d0-e532-4f3a-b9e9-29277c0b7a9a/cp/1518685946892-39fa4858-66cb-4c88-9a1c-13a8625e1158-START.bin, > endMarker=null] > [2018-02-15 12:17:46,389][INFO > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Checking memory state > [lastValidPos=FileWALPointer [idx=0, fileOffset=0, len=0, forceFlush=false], > lastMarked=FileWALPointer [idx=1, fileOffset=47809760, len=177151, > forceFlush=false], lastCheckpointId=39fa4858-66cb-4c88-9a1c-13a8625e1158] > [2018-02-15 12:17:46,389][WARN > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Ignite node stopped in > the middle of checkpoint. Will restore memory state and finish checkpoint on > node start. > [2018-02-15 > 12:17:46,448][ERROR][exchange-worker-#62][GridDhtPartitionsExchangeFuture] > Failed to reinitialize local partitions (preloading will be stopped): > GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=38, > minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode > [id=b8684b5c-29f5-41db-bedc-f4b4ee4cab6b, addrs=[127.0.0.1, 172.25.1.49], > sockAddrs=[lab49.gridgain.local/172.25.1.49:47500, /127.0.0.1:47500], > discPort=47500, order=38, intOrder=37, lastExchangeTime=1518686253183, > loc=true, ver=2.3.0#20180213-sha1:756ae8d4, isClient=false], topVer=38, > nodeId8=b8684b5c, msg=null, type=NODE_JOINED, tstamp=1518686266006], > nodeId=b8684b5c, evt=NODE_JOINED] > java.lang.ClassCastException: > org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl cannot be cast > to > org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryEx > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.getPageMemoryForCacheGroup(GridCacheDatabaseSharedManager.java:1595) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1533) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:568) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCachesOnLocalJoin(GridDhtPartitionsExchangeFuture.java:724) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:611) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279) > at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:748) > {noformat} > This happens if node was killed during checkpoint (it seems only during the > first one). > Load conifg: > * Yardstick with CacheRandomOperationBenchmark > * 12 client nodes, 24 server nodes, 12 hosts (2 per host). The issue is also > reproduced when restarted node is 1 per host. > * Several caches with different configs: pds/in memory, tx/atomic, > with/without eviction etc. No dynamic caches. Complete configs are attached. > * 1 node is restarted periodically. > Logs of restarted node are attached. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7731) ClassCastException at restarted node if killed during checkpoint
[ https://issues.apache.org/jira/browse/IGNITE-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolay Izhikov updated IGNITE-7731: Fix Version/s: (was: 2.7) 2.8 > ClassCastException at restarted node if killed during checkpoint > > > Key: IGNITE-7731 > URL: https://issues.apache.org/jira/browse/IGNITE-7731 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.3 >Reporter: Ksenia Rybakova >Priority: Major > Fix For: 2.8 > > Attachments: > 120728_id11_172.25.1.49_cache-random-benchmark-2-backup.log, > 121729_id11-1_172.25.1.49cache-random-benchmark-2-backup.log, > ignite-base-load-config.xml, run-load.properties, run-load.xml > > > During failover test restarted node fails to start with the following > exception: > {noformat} > [2018-02-15 12:17:46,388][INFO > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Read checkpoint status > [startMarker=/storage/ssd/krybakova/20181502-120637-2.3.0-SNAPSHOT-failover-756ae8d4-c12-s24-p20-r40-b2-d7200/yardstick/work/db/node00-730721d0-e532-4f3a-b9e9-29277c0b7a9a/cp/1518685946892-39fa4858-66cb-4c88-9a1c-13a8625e1158-START.bin, > endMarker=null] > [2018-02-15 12:17:46,389][INFO > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Checking memory state > [lastValidPos=FileWALPointer [idx=0, fileOffset=0, len=0, forceFlush=false], > lastMarked=FileWALPointer [idx=1, fileOffset=47809760, len=177151, > forceFlush=false], lastCheckpointId=39fa4858-66cb-4c88-9a1c-13a8625e1158] > [2018-02-15 12:17:46,389][WARN > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Ignite node stopped in > the middle of checkpoint. Will restore memory state and finish checkpoint on > node start. > [2018-02-15 > 12:17:46,448][ERROR][exchange-worker-#62][GridDhtPartitionsExchangeFuture] > Failed to reinitialize local partitions (preloading will be stopped): > GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=38, > minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode > [id=b8684b5c-29f5-41db-bedc-f4b4ee4cab6b, addrs=[127.0.0.1, 172.25.1.49], > sockAddrs=[lab49.gridgain.local/172.25.1.49:47500, /127.0.0.1:47500], > discPort=47500, order=38, intOrder=37, lastExchangeTime=1518686253183, > loc=true, ver=2.3.0#20180213-sha1:756ae8d4, isClient=false], topVer=38, > nodeId8=b8684b5c, msg=null, type=NODE_JOINED, tstamp=1518686266006], > nodeId=b8684b5c, evt=NODE_JOINED] > java.lang.ClassCastException: > org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl cannot be cast > to > org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryEx > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.getPageMemoryForCacheGroup(GridCacheDatabaseSharedManager.java:1595) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1533) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:568) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCachesOnLocalJoin(GridDhtPartitionsExchangeFuture.java:724) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:611) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279) > at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:748) > {noformat} > This happens if node was killed during checkpoint (it seems only during the > first one). > Load conifg: > * Yardstick with CacheRandomOperationBenchmark > * 12 client nodes, 24 server nodes, 12 hosts (2 per host). The issue is also > reproduced when restarted node is 1 per host. > * Several caches with different configs: pds/in memory, tx/atomic, > with/without eviction etc. No dynamic caches. Complete configs are attached. > * 1 node is restarted periodically. > Logs of restarted node are attached. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7731) ClassCastException at restarted node if killed during checkpoint
[ https://issues.apache.org/jira/browse/IGNITE-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Pavlov updated IGNITE-7731: --- Fix Version/s: (was: 2.6) 2.7 > ClassCastException at restarted node if killed during checkpoint > > > Key: IGNITE-7731 > URL: https://issues.apache.org/jira/browse/IGNITE-7731 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.3 >Reporter: Ksenia Rybakova >Priority: Major > Fix For: 2.7 > > Attachments: > 120728_id11_172.25.1.49_cache-random-benchmark-2-backup.log, > 121729_id11-1_172.25.1.49cache-random-benchmark-2-backup.log, > ignite-base-load-config.xml, run-load.properties, run-load.xml > > > During failover test restarted node fails to start with the following > exception: > {noformat} > [2018-02-15 12:17:46,388][INFO > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Read checkpoint status > [startMarker=/storage/ssd/krybakova/20181502-120637-2.3.0-SNAPSHOT-failover-756ae8d4-c12-s24-p20-r40-b2-d7200/yardstick/work/db/node00-730721d0-e532-4f3a-b9e9-29277c0b7a9a/cp/1518685946892-39fa4858-66cb-4c88-9a1c-13a8625e1158-START.bin, > endMarker=null] > [2018-02-15 12:17:46,389][INFO > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Checking memory state > [lastValidPos=FileWALPointer [idx=0, fileOffset=0, len=0, forceFlush=false], > lastMarked=FileWALPointer [idx=1, fileOffset=47809760, len=177151, > forceFlush=false], lastCheckpointId=39fa4858-66cb-4c88-9a1c-13a8625e1158] > [2018-02-15 12:17:46,389][WARN > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Ignite node stopped in > the middle of checkpoint. Will restore memory state and finish checkpoint on > node start. > [2018-02-15 > 12:17:46,448][ERROR][exchange-worker-#62][GridDhtPartitionsExchangeFuture] > Failed to reinitialize local partitions (preloading will be stopped): > GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=38, > minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode > [id=b8684b5c-29f5-41db-bedc-f4b4ee4cab6b, addrs=[127.0.0.1, 172.25.1.49], > sockAddrs=[lab49.gridgain.local/172.25.1.49:47500, /127.0.0.1:47500], > discPort=47500, order=38, intOrder=37, lastExchangeTime=1518686253183, > loc=true, ver=2.3.0#20180213-sha1:756ae8d4, isClient=false], topVer=38, > nodeId8=b8684b5c, msg=null, type=NODE_JOINED, tstamp=1518686266006], > nodeId=b8684b5c, evt=NODE_JOINED] > java.lang.ClassCastException: > org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl cannot be cast > to > org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryEx > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.getPageMemoryForCacheGroup(GridCacheDatabaseSharedManager.java:1595) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1533) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:568) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCachesOnLocalJoin(GridDhtPartitionsExchangeFuture.java:724) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:611) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279) > at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:748) > {noformat} > This happens if node was killed during checkpoint (it seems only during the > first one). > Load conifg: > * Yardstick with CacheRandomOperationBenchmark > * 12 client nodes, 24 server nodes, 12 hosts (2 per host). The issue is also > reproduced when restarted node is 1 per host. > * Several caches with different configs: pds/in memory, tx/atomic, > with/without eviction etc. No dynamic caches. Complete configs are attached. > * 1 node is restarted periodically. > Logs of restarted node are attached. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7731) ClassCastException at restarted node if killed during checkpoint
[ https://issues.apache.org/jira/browse/IGNITE-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Gura updated IGNITE-7731: Fix Version/s: (was: 2.5) 2.6 > ClassCastException at restarted node if killed during checkpoint > > > Key: IGNITE-7731 > URL: https://issues.apache.org/jira/browse/IGNITE-7731 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.3 >Reporter: Ksenia Rybakova >Priority: Major > Fix For: 2.6 > > Attachments: > 120728_id11_172.25.1.49_cache-random-benchmark-2-backup.log, > 121729_id11-1_172.25.1.49cache-random-benchmark-2-backup.log, > ignite-base-load-config.xml, run-load.properties, run-load.xml > > > During failover test restarted node fails to start with the following > exception: > {noformat} > [2018-02-15 12:17:46,388][INFO > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Read checkpoint status > [startMarker=/storage/ssd/krybakova/20181502-120637-2.3.0-SNAPSHOT-failover-756ae8d4-c12-s24-p20-r40-b2-d7200/yardstick/work/db/node00-730721d0-e532-4f3a-b9e9-29277c0b7a9a/cp/1518685946892-39fa4858-66cb-4c88-9a1c-13a8625e1158-START.bin, > endMarker=null] > [2018-02-15 12:17:46,389][INFO > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Checking memory state > [lastValidPos=FileWALPointer [idx=0, fileOffset=0, len=0, forceFlush=false], > lastMarked=FileWALPointer [idx=1, fileOffset=47809760, len=177151, > forceFlush=false], lastCheckpointId=39fa4858-66cb-4c88-9a1c-13a8625e1158] > [2018-02-15 12:17:46,389][WARN > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Ignite node stopped in > the middle of checkpoint. Will restore memory state and finish checkpoint on > node start. > [2018-02-15 > 12:17:46,448][ERROR][exchange-worker-#62][GridDhtPartitionsExchangeFuture] > Failed to reinitialize local partitions (preloading will be stopped): > GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=38, > minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode > [id=b8684b5c-29f5-41db-bedc-f4b4ee4cab6b, addrs=[127.0.0.1, 172.25.1.49], > sockAddrs=[lab49.gridgain.local/172.25.1.49:47500, /127.0.0.1:47500], > discPort=47500, order=38, intOrder=37, lastExchangeTime=1518686253183, > loc=true, ver=2.3.0#20180213-sha1:756ae8d4, isClient=false], topVer=38, > nodeId8=b8684b5c, msg=null, type=NODE_JOINED, tstamp=1518686266006], > nodeId=b8684b5c, evt=NODE_JOINED] > java.lang.ClassCastException: > org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl cannot be cast > to > org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryEx > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.getPageMemoryForCacheGroup(GridCacheDatabaseSharedManager.java:1595) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1533) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:568) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCachesOnLocalJoin(GridDhtPartitionsExchangeFuture.java:724) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:611) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279) > at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:748) > {noformat} > This happens if node was killed during checkpoint (it seems only during the > first one). > Load conifg: > * Yardstick with CacheRandomOperationBenchmark > * 12 client nodes, 24 server nodes, 12 hosts (2 per host). The issue is also > reproduced when restarted node is 1 per host. > * Several caches with different configs: pds/in memory, tx/atomic, > with/without eviction etc. No dynamic caches. Complete configs are attached. > * 1 node is restarted periodically. > Logs of restarted node are attached. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7731) ClassCastException at restarted node if killed during checkpoint
[ https://issues.apache.org/jira/browse/IGNITE-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ksenia Rybakova updated IGNITE-7731: Fix Version/s: 2.5 > ClassCastException at restarted node if killed during checkpoint > > > Key: IGNITE-7731 > URL: https://issues.apache.org/jira/browse/IGNITE-7731 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.3 >Reporter: Ksenia Rybakova >Priority: Major > Fix For: 2.5 > > Attachments: > 120728_id11_172.25.1.49_cache-random-benchmark-2-backup.log, > 121729_id11-1_172.25.1.49cache-random-benchmark-2-backup.log, > ignite-base-load-config.xml, run-load.properties, run-load.xml > > > During failover test restarted node fails to start with the following > exception: > {noformat} > [2018-02-15 12:17:46,388][INFO > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Read checkpoint status > [startMarker=/storage/ssd/krybakova/20181502-120637-2.3.0-SNAPSHOT-failover-756ae8d4-c12-s24-p20-r40-b2-d7200/yardstick/work/db/node00-730721d0-e532-4f3a-b9e9-29277c0b7a9a/cp/1518685946892-39fa4858-66cb-4c88-9a1c-13a8625e1158-START.bin, > endMarker=null] > [2018-02-15 12:17:46,389][INFO > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Checking memory state > [lastValidPos=FileWALPointer [idx=0, fileOffset=0, len=0, forceFlush=false], > lastMarked=FileWALPointer [idx=1, fileOffset=47809760, len=177151, > forceFlush=false], lastCheckpointId=39fa4858-66cb-4c88-9a1c-13a8625e1158] > [2018-02-15 12:17:46,389][WARN > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Ignite node stopped in > the middle of checkpoint. Will restore memory state and finish checkpoint on > node start. > [2018-02-15 > 12:17:46,448][ERROR][exchange-worker-#62][GridDhtPartitionsExchangeFuture] > Failed to reinitialize local partitions (preloading will be stopped): > GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=38, > minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode > [id=b8684b5c-29f5-41db-bedc-f4b4ee4cab6b, addrs=[127.0.0.1, 172.25.1.49], > sockAddrs=[lab49.gridgain.local/172.25.1.49:47500, /127.0.0.1:47500], > discPort=47500, order=38, intOrder=37, lastExchangeTime=1518686253183, > loc=true, ver=2.3.0#20180213-sha1:756ae8d4, isClient=false], topVer=38, > nodeId8=b8684b5c, msg=null, type=NODE_JOINED, tstamp=1518686266006], > nodeId=b8684b5c, evt=NODE_JOINED] > java.lang.ClassCastException: > org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl cannot be cast > to > org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryEx > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.getPageMemoryForCacheGroup(GridCacheDatabaseSharedManager.java:1595) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1533) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:568) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCachesOnLocalJoin(GridDhtPartitionsExchangeFuture.java:724) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:611) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279) > at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:748) > {noformat} > This happens if node was killed during checkpoint (it seems only during the > first one). > Load conifg: > * Yardstick with CacheRandomOperationBenchmark > * 12 client nodes, 24 server nodes, 12 hosts (2 per host). The issue is also > reproduced when restarted node is 1 per host. > * Several caches with different configs: pds/in memory, tx/atomic, > with/without eviction etc. No dynamic caches. Complete configs are attached. > * 1 node is restarted periodically. > Logs of restarted node are attached. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7731) ClassCastException at restarted node if killed during checkpoint
[ https://issues.apache.org/jira/browse/IGNITE-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ksenia Rybakova updated IGNITE-7731: Description: During failover test restarted node fails to start with the following exception: {noformat} [2018-02-15 12:17:46,388][INFO ][exchange-worker-#62][GridCacheDatabaseSharedManager] Read checkpoint status [startMarker=/storage/ssd/krybakova/20181502-120637-2.3.0-SNAPSHOT-failover-756ae8d4-c12-s24-p20-r40-b2-d7200/yardstick/work/db/node00-730721d0-e532-4f3a-b9e9-29277c0b7a9a/cp/1518685946892-39fa4858-66cb-4c88-9a1c-13a8625e1158-START.bin, endMarker=null] [2018-02-15 12:17:46,389][INFO ][exchange-worker-#62][GridCacheDatabaseSharedManager] Checking memory state [lastValidPos=FileWALPointer [idx=0, fileOffset=0, len=0, forceFlush=false], lastMarked=FileWALPointer [idx=1, fileOffset=47809760, len=177151, forceFlush=false], lastCheckpointId=39fa4858-66cb-4c88-9a1c-13a8625e1158] [2018-02-15 12:17:46,389][WARN ][exchange-worker-#62][GridCacheDatabaseSharedManager] Ignite node stopped in the middle of checkpoint. Will restore memory state and finish checkpoint on node start. [2018-02-15 12:17:46,448][ERROR][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Failed to reinitialize local partitions (preloading will be stopped): GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=38, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=b8684b5c-29f5-41db-bedc-f4b4ee4cab6b, addrs=[127.0.0.1, 172.25.1.49], sockAddrs=[lab49.gridgain.local/172.25.1.49:47500, /127.0.0.1:47500], discPort=47500, order=38, intOrder=37, lastExchangeTime=1518686253183, loc=true, ver=2.3.0#20180213-sha1:756ae8d4, isClient=false], topVer=38, nodeId8=b8684b5c, msg=null, type=NODE_JOINED, tstamp=1518686266006], nodeId=b8684b5c, evt=NODE_JOINED] java.lang.ClassCastException: org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl cannot be cast to org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryEx at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.getPageMemoryForCacheGroup(GridCacheDatabaseSharedManager.java:1595) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1533) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:568) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCachesOnLocalJoin(GridDhtPartitionsExchangeFuture.java:724) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:611) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:748) {noformat} This happens if node was killed during checkpoint (it seems only during the first one). Load conifg: * Yardstick with CacheRandomOperationBenchmark * 12 client nodes, 24 server nodes, 12 hosts (2 per host). The issue is also reproduced when restarted node is 1 per host. * Several caches with different configs: pds/in memory, tx/atomic, with/without eviction etc. No dynamic caches. Complete configs are attached. * 1 node is restarted periodically. Logs of restarted node are attached. was: During failover test restarted node fails to start with the following exception: {noformat} [2018-02-15 12:17:46,388][INFO ][exchange-worker-#62][GridCacheDatabaseSharedManager] Read checkpoint status [startMarker=/storage/ssd/krybakova/20181502-120637-2.3.0-SNAPSHOT-failover-756ae8d4-c12-s24-p20-r40-b2-d7200/yardstick/work/db/node00-730721d0-e532-4f3a-b9e9-29277c0b7a9a/cp/1518685946892-39fa4858-66cb-4c88-9a1c-13a8625e1158-START.bin, endMarker=null] [2018-02-15 12:17:46,389][INFO ][exchange-worker-#62][GridCacheDatabaseSharedManager] Checking memory state [lastValidPos=FileWALPointer [idx=0, fileOffset=0, len=0, forceFlush=false], lastMarked=FileWALPointer [idx=1, fileOffset=47809760, len=177151, forceFlush=false], lastCheckpointId=39fa4858-66cb-4c88-9a1c-13a8625e1158] [2018-02-15 12:17:46,389][WARN ][exchange-worker-#62][GridCacheDatabaseSharedManager] Ignite node stopped in the middle of checkpoint. Will restore memory state and finish checkpoint on node start. [2018-02-15 12:17:46,448][ERROR][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Failed to reinitialize local partitions (preloading will be stopped): GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=38, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode
[jira] [Updated] (IGNITE-7731) ClassCastException at restarted node if killed during checkpoint
[ https://issues.apache.org/jira/browse/IGNITE-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ksenia Rybakova updated IGNITE-7731: Attachment: 121729_id11-1_172.25.1.49cache-random-benchmark-2-backup.log 120728_id11_172.25.1.49_cache-random-benchmark-2-backup.log > ClassCastException at restarted node if killed during checkpoint > > > Key: IGNITE-7731 > URL: https://issues.apache.org/jira/browse/IGNITE-7731 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.3 >Reporter: Ksenia Rybakova >Priority: Major > Attachments: > 120728_id11_172.25.1.49_cache-random-benchmark-2-backup.log, > 121729_id11-1_172.25.1.49cache-random-benchmark-2-backup.log, > ignite-base-load-config.xml, run-load.properties, run-load.xml > > > During failover test restarted node fails to start with the following > exception: > {noformat} > [2018-02-15 12:17:46,388][INFO > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Read checkpoint status > [startMarker=/storage/ssd/krybakova/20181502-120637-2.3.0-SNAPSHOT-failover-756ae8d4-c12-s24-p20-r40-b2-d7200/yardstick/work/db/node00-730721d0-e532-4f3a-b9e9-29277c0b7a9a/cp/1518685946892-39fa4858-66cb-4c88-9a1c-13a8625e1158-START.bin, > endMarker=null] > [2018-02-15 12:17:46,389][INFO > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Checking memory state > [lastValidPos=FileWALPointer [idx=0, fileOffset=0, len=0, forceFlush=false], > lastMarked=FileWALPointer [idx=1, fileOffset=47809760, len=177151, > forceFlush=false], lastCheckpointId=39fa4858-66cb-4c88-9a1c-13a8625e1158] > [2018-02-15 12:17:46,389][WARN > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Ignite node stopped in > the middle of checkpoint. Will restore memory state and finish checkpoint on > node start. > [2018-02-15 > 12:17:46,448][ERROR][exchange-worker-#62][GridDhtPartitionsExchangeFuture] > Failed to reinitialize local partitions (preloading will be stopped): > GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=38, > minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode > [id=b8684b5c-29f5-41db-bedc-f4b4ee4cab6b, addrs=[127.0.0.1, 172.25.1.49], > sockAddrs=[lab49.gridgain.local/172.25.1.49:47500, /127.0.0.1:47500], > discPort=47500, order=38, intOrder=37, lastExchangeTime=1518686253183, > loc=true, ver=2.3.0#20180213-sha1:756ae8d4, isClient=false], topVer=38, > nodeId8=b8684b5c, msg=null, type=NODE_JOINED, tstamp=1518686266006], > nodeId=b8684b5c, evt=NODE_JOINED] > java.lang.ClassCastException: > org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl cannot be cast > to > org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryEx > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.getPageMemoryForCacheGroup(GridCacheDatabaseSharedManager.java:1595) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1533) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:568) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCachesOnLocalJoin(GridDhtPartitionsExchangeFuture.java:724) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:611) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279) > at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:748) > {noformat} > This happens if node was killed during checkpoint (it seems only during the > first one). > Load conifg: > * Yardstick with CacheRandomOperationBenchmark > * 12 client nodes, 24 server nodes, 12 hosts (2 per host). The issue is also > reproduced when restarted node is 1 per host. > * Several caches with different configs: pds/in memory, tx/atomic, > with/without eviction etc. No dynamic caches. Complete configs are attached. > * 1 node is restarted periodically. > Logs of restarted node are attached. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7731) ClassCastException at restarted node if killed during checkpoint
[ https://issues.apache.org/jira/browse/IGNITE-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ksenia Rybakova updated IGNITE-7731: Description: During failover test restarted node fails to start with the following exception: {noformat} [2018-02-15 12:17:46,388][INFO ][exchange-worker-#62][GridCacheDatabaseSharedManager] Read checkpoint status [startMarker=/storage/ssd/krybakova/20181502-120637-2.3.0-SNAPSHOT-failover-756ae8d4-c12-s24-p20-r40-b2-d7200/yardstick/work/db/node00-730721d0-e532-4f3a-b9e9-29277c0b7a9a/cp/1518685946892-39fa4858-66cb-4c88-9a1c-13a8625e1158-START.bin, endMarker=null] [2018-02-15 12:17:46,389][INFO ][exchange-worker-#62][GridCacheDatabaseSharedManager] Checking memory state [lastValidPos=FileWALPointer [idx=0, fileOffset=0, len=0, forceFlush=false], lastMarked=FileWALPointer [idx=1, fileOffset=47809760, len=177151, forceFlush=false], lastCheckpointId=39fa4858-66cb-4c88-9a1c-13a8625e1158] [2018-02-15 12:17:46,389][WARN ][exchange-worker-#62][GridCacheDatabaseSharedManager] Ignite node stopped in the middle of checkpoint. Will restore memory state and finish checkpoint on node start. [2018-02-15 12:17:46,448][ERROR][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Failed to reinitialize local partitions (preloading will be stopped): GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=38, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=b8684b5c-29f5-41db-bedc-f4b4ee4cab6b, addrs=[127.0.0.1, 172.25.1.49], sockAddrs=[lab49.gridgain.local/172.25.1.49:47500, /127.0.0.1:47500], discPort=47500, order=38, intOrder=37, lastExchangeTime=1518686253183, loc=true, ver=2.3.0#20180213-sha1:756ae8d4, isClient=false], topVer=38, nodeId8=b8684b5c, msg=null, type=NODE_JOINED, tstamp=1518686266006], nodeId=b8684b5c, evt=NODE_JOINED] java.lang.ClassCastException: org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl cannot be cast to org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryEx at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.getPageMemoryForCacheGroup(GridCacheDatabaseSharedManager.java:1595) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1533) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:568) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCachesOnLocalJoin(GridDhtPartitionsExchangeFuture.java:724) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:611) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:748) {noformat} This happens if node was killed during checkpoint (it seems only during the first one). Load conifg: * Yardstick with CacheRandomOperationBenchmark * 12 client nodes, 24 server nodes, 12 hosts (2 per host). The issue is also reproduced when restarted node is 1 per host. * Several caches with different configs: pds/in memory, tx/atomic, with/without eviction etc. No dynamic caches. Complete configs are attached. * 1 node is restarted periodically. was: During failover test restarted node fails to start with the following exception: {noformat} [2018-02-15 12:17:46,388][INFO ][exchange-worker-#62][GridCacheDatabaseSharedManager] Read checkpoint status [startMarker=/storage/ssd/krybakova/20181502-120637-2.3.0-SNAPSHOT-failover-756ae8d4-c12-s24-p20-r40-b2-d7200/yardstick/work/db/node00-730721d0-e532-4f3a-b9e9-29277c0b7a9a/cp/1518685946892-39fa4858-66cb-4c88-9a1c-13a8625e1158-START.bin, endMarker=null] [2018-02-15 12:17:46,389][INFO ][exchange-worker-#62][GridCacheDatabaseSharedManager] Checking memory state [lastValidPos=FileWALPointer [idx=0, fileOffset=0, len=0, forceFlush=false], lastMarked=FileWALPointer [idx=1, fileOffset=47809760, len=177151, forceFlush=false], lastCheckpointId=39fa4858-66cb-4c88-9a1c-13a8625e1158] [2018-02-15 12:17:46,389][WARN ][exchange-worker-#62][GridCacheDatabaseSharedManager] Ignite node stopped in the middle of checkpoint. Will restore memory state and finish checkpoint on node start. [2018-02-15 12:17:46,448][ERROR][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Failed to reinitialize local partitions (preloading will be stopped): GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=38, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=b8684b5c-29f5-41db-bedc-f4b4ee4cab6b,
[jira] [Updated] (IGNITE-7731) ClassCastException at restarted node if killed during checkpoint
[ https://issues.apache.org/jira/browse/IGNITE-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ksenia Rybakova updated IGNITE-7731: Attachment: run-load.xml run-load.properties ignite-base-load-config.xml > ClassCastException at restarted node if killed during checkpoint > > > Key: IGNITE-7731 > URL: https://issues.apache.org/jira/browse/IGNITE-7731 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.3 >Reporter: Ksenia Rybakova >Priority: Major > Attachments: ignite-base-load-config.xml, run-load.properties, > run-load.xml > > > During failover test restarted node fails to start with the following > exception: > {noformat} > [2018-02-15 12:17:46,388][INFO > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Read checkpoint status > [startMarker=/storage/ssd/krybakova/20181502-120637-2.3.0-SNAPSHOT-failover-756ae8d4-c12-s24-p20-r40-b2-d7200/yardstick/work/db/node00-730721d0-e532-4f3a-b9e9-29277c0b7a9a/cp/1518685946892-39fa4858-66cb-4c88-9a1c-13a8625e1158-START.bin, > endMarker=null] > [2018-02-15 12:17:46,389][INFO > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Checking memory state > [lastValidPos=FileWALPointer [idx=0, fileOffset=0, len=0, forceFlush=false], > lastMarked=FileWALPointer [idx=1, fileOffset=47809760, len=177151, > forceFlush=false], lastCheckpointId=39fa4858-66cb-4c88-9a1c-13a8625e1158] > [2018-02-15 12:17:46,389][WARN > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Ignite node stopped in > the middle of checkpoint. Will restore memory state and finish checkpoint on > node start. > [2018-02-15 > 12:17:46,448][ERROR][exchange-worker-#62][GridDhtPartitionsExchangeFuture] > Failed to reinitialize local partitions (preloading will be stopped): > GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=38, > minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode > [id=b8684b5c-29f5-41db-bedc-f4b4ee4cab6b, addrs=[127.0.0.1, 172.25.1.49], > sockAddrs=[lab49.gridgain.local/172.25.1.49:47500, /127.0.0.1:47500], > discPort=47500, order=38, intOrder=37, lastExchangeTime=1518686253183, > loc=true, ver=2.3.0#20180213-sha1:756ae8d4, isClient=false], topVer=38, > nodeId8=b8684b5c, msg=null, type=NODE_JOINED, tstamp=1518686266006], > nodeId=b8684b5c, evt=NODE_JOINED] > java.lang.ClassCastException: > org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl cannot be cast > to > org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryEx > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.getPageMemoryForCacheGroup(GridCacheDatabaseSharedManager.java:1595) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1533) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:568) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCachesOnLocalJoin(GridDhtPartitionsExchangeFuture.java:724) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:611) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279) > at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:748) > {noformat} > This happens if node was killed during checkpoint (it seems only during the > first one). > Load conifg: > * Yardstick with CacheRandomOperationBenchmark > * 12 client nodes, 23 server nodes, 12 hosts (2 per host, but restarted > server is 1 per host) > * Several caches with different configs: pds/in memory, tx/atomic, > with/without eviction etc. No dynamic caches. Complete configs are attached. > * 1 node is restarted periodically. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7731) ClassCastException at restarted node if killed during checkpoint
[ https://issues.apache.org/jira/browse/IGNITE-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ksenia Rybakova updated IGNITE-7731: Affects Version/s: 2.3 Description: During failover test restarted node fails to start with the following exception: {noformat} [2018-02-15 12:17:46,388][INFO ][exchange-worker-#62][GridCacheDatabaseSharedManager] Read checkpoint status [startMarker=/storage/ssd/krybakova/20181502-120637-2.3.0-SNAPSHOT-failover-756ae8d4-c12-s24-p20-r40-b2-d7200/yardstick/work/db/node00-730721d0-e532-4f3a-b9e9-29277c0b7a9a/cp/1518685946892-39fa4858-66cb-4c88-9a1c-13a8625e1158-START.bin, endMarker=null] [2018-02-15 12:17:46,389][INFO ][exchange-worker-#62][GridCacheDatabaseSharedManager] Checking memory state [lastValidPos=FileWALPointer [idx=0, fileOffset=0, len=0, forceFlush=false], lastMarked=FileWALPointer [idx=1, fileOffset=47809760, len=177151, forceFlush=false], lastCheckpointId=39fa4858-66cb-4c88-9a1c-13a8625e1158] [2018-02-15 12:17:46,389][WARN ][exchange-worker-#62][GridCacheDatabaseSharedManager] Ignite node stopped in the middle of checkpoint. Will restore memory state and finish checkpoint on node start. [2018-02-15 12:17:46,448][ERROR][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Failed to reinitialize local partitions (preloading will be stopped): GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=38, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=b8684b5c-29f5-41db-bedc-f4b4ee4cab6b, addrs=[127.0.0.1, 172.25.1.49], sockAddrs=[lab49.gridgain.local/172.25.1.49:47500, /127.0.0.1:47500], discPort=47500, order=38, intOrder=37, lastExchangeTime=1518686253183, loc=true, ver=2.3.0#20180213-sha1:756ae8d4, isClient=false], topVer=38, nodeId8=b8684b5c, msg=null, type=NODE_JOINED, tstamp=1518686266006], nodeId=b8684b5c, evt=NODE_JOINED] java.lang.ClassCastException: org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl cannot be cast to org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryEx at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.getPageMemoryForCacheGroup(GridCacheDatabaseSharedManager.java:1595) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1533) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:568) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCachesOnLocalJoin(GridDhtPartitionsExchangeFuture.java:724) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:611) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:748) {noformat} This happens if node was killed during checkpoint (it seems only during the first one). Load conifg: * Yardstick with CacheRandomOperationBenchmark * 12 client nodes, 23 server nodes, 12 hosts (2 per host, but restarted server is 1 per host) * Several caches with different configs: pds/in memory, tx/atomic, with/without eviction etc. No dynamic caches. Complete configs are attached. * 1 node is restarted periodically. Summary: ClassCastException at restarted node if killed during checkpoint (was: Restarted node can) > ClassCastException at restarted node if killed during checkpoint > > > Key: IGNITE-7731 > URL: https://issues.apache.org/jira/browse/IGNITE-7731 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.3 >Reporter: Ksenia Rybakova >Priority: Major > > During failover test restarted node fails to start with the following > exception: > {noformat} > [2018-02-15 12:17:46,388][INFO > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Read checkpoint status > [startMarker=/storage/ssd/krybakova/20181502-120637-2.3.0-SNAPSHOT-failover-756ae8d4-c12-s24-p20-r40-b2-d7200/yardstick/work/db/node00-730721d0-e532-4f3a-b9e9-29277c0b7a9a/cp/1518685946892-39fa4858-66cb-4c88-9a1c-13a8625e1158-START.bin, > endMarker=null] > [2018-02-15 12:17:46,389][INFO > ][exchange-worker-#62][GridCacheDatabaseSharedManager] Checking memory state > [lastValidPos=FileWALPointer [idx=0, fileOffset=0, len=0, forceFlush=false], > lastMarked=FileWALPointer [idx=1, fileOffset=47809760, len=177151, > forceFlush=false], lastCheckpointId=39fa4858-66cb-4c88-9a1c-13a8625e1158] > [2018-02-15