As a follow up to this:

We tried removing both those in the walstore and walarchive. Problem is
that somewhere there is a checkpoint that says its up to wal index
2414...yet we only have 2413...2412...etc

We need to find where it stores this checkpoint index and change it, it
seems.

On Wed, 19 Jul 2023 at 10:02 AM, Raymond Wilson <raymond_wil...@trimble.com>
wrote:

> Hi Alex,
>
> Here is the log from the Ignite startup. It's fairly short but shows
> everything I think:
>
> 2023-07-17 22:38:55,061 [1] DBG [ImmutableCacheComputeServer]   Starting
> Ignite.NET 2.15.0.23172
> 2023-07-17 22:38:55,065 [1] DBG [ImmutableCacheComputeServer]
> 2023-07-17 22:38:55,068 [1] DBG [ImmutableCacheComputeServer]
> 2023-07-17 22:38:55,070 [1] DBG [ImmutableCacheComputeServer]
> 2023-07-17 22:38:55,070 [1] DBG [ImmutableCacheComputeServer]
> 2023-07-17 22:38:55,073 [1] DBG [ImmutableCacheComputeServer]
> 2023-07-17 22:38:55,471 [1] DBG [ImmutableCacheComputeServer]   JVM
> started.
> 2023-07-17 22:38:56,340 [1] WRN [ImmutableCacheComputeServer]   Consistent
> ID is not set, it is recommended to set consistent ID for production
> clusters (use IgniteConfiguration.setConsistentId property)
> 2023-07-17 22:38:56,382 [1] INF [ImmutableCacheComputeServer]
> >>>    __________  ________________
> >>>   /  _/ ___/ |/ /  _/_  __/ __/
> >>>  _/ // (7 7    // /  / / / _/
> >>> /___/\___/_/|_/___/ /_/ /___/
> >>>
> >>> ver. 2.15.0#20230425-sha1:f98f7f35
> >>> 2023 Copyright(C) Apache Software Foundation
> >>>
> >>> Ignite documentation: https://ignite.apache.org
>
> 2023-07-17 22:38:56,383 [1] INF [ImmutableCacheComputeServer]   Config
> URL: n/a
> 2023-07-17 22:38:56,414 [1] INF [ImmutableCacheComputeServer]
> IgniteConfiguration [igniteInstanceName=TRex-Immutable, pubPoolSize=250,
> svcPoolSize=8, callbackPoolSize=8, stripedPoolSize=8, sysPoolSize=250,
> mgmtPoolSize=4, dataStreamerPoolSize=8, utilityCachePoolSize=8,
> utilityCacheKeepAliveTime=60000, p2pPoolSize=2, qryPoolSize=8,
> buildIdxPoolSize=1, igniteHome=/trex/, igniteWorkDir=/persist/Immutable,
> mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer@6e46d9f4,
> nodeId=4e70ba5e-5829-4b2d-b349-6539918990b5, marsh=BinaryMarshaller [],
> marshLocJobs=false, p2pEnabled=false, netTimeout=5000,
> netCompressionLevel=1, sndRetryDelay=1000, sndRetryCnt=3,
> metricsHistSize=10000, metricsUpdateFreq=2000,
> metricsExpTime=9223372036854775807, discoSpi=TcpDiscoverySpi
> [addrRslvr=null, addressFilter=null, sockTimeout=0, ackTimeout=0,
> marsh=null, reconCnt=10, reconDelay=2000, maxAckTimeout=600000, soLinger=0,
> forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null,
> skipAddrsRandomization=false], segPlc=USE_FAILURE_HANDLER,
> segResolveAttempts=2, waitForSegOnStart=true, allResolversPassReq=true,
> segChkFreq=10000, commSpi=TcpCommunicationSpi
> [connectGate=org.apache.ignite.spi.communication.tcp.internal.ConnectGateway@5bb3d42d,
> ctxInitLatch=java.util.concurrent.CountDownLatch@5bf61e67[Count = 1],
> stopping=false, clientPool=null, nioSrvWrapper=null, stateProvider=null],
> evtSpi=org.apache.ignite.spi.eventstorage.NoopEventStorageSpi@2c1dc8e,
> colSpi=NoopCollisionSpi [], deploySpi=LocalDeploymentSpi [],
> indexingSpi=org.apache.ignite.spi.indexing.noop.NoopIndexingSpi@61019f59,
> addrRslvr=null,
> encryptionSpi=org.apache.ignite.spi.encryption.noop.NoopEncryptionSpi@62e8f862,
> tracingSpi=org.apache.ignite.spi.tracing.NoopTracingSpi@26f3d90c,
> clientMode=false, rebalanceThreadPoolSize=1, rebalanceTimeout=10000,
> rebalanceBatchesPrefetchCnt=3, rebalanceThrottle=0,
> rebalanceBatchSize=524288, txCfg=TransactionConfiguration
> [txSerEnabled=false, dfltIsolation=REPEATABLE_READ,
> dfltConcurrency=PESSIMISTIC, dfltTxTimeout=0,
> txTimeoutOnPartitionMapExchange=0, deadlockTimeout=10000,
> pessimisticTxLogSize=0, pessimisticTxLogLinger=10000, tmLookupClsName=null,
> txManagerFactory=null, useJtaSync=false], cacheSanityCheckEnabled=true,
> discoStartupDelay=60000, deployMode=SHARED, p2pMissedCacheSize=100,
> locHost=null, timeSrvPortBase=31100, timeSrvPortRange=100,
> failureDetectionTimeout=60000, sysWorkerBlockedTimeout=null,
> clientFailureDetectionTimeout=60000, metricsLogFreq=30000,
> connectorCfg=ConnectorConfiguration [jettyPath=null, host=null, port=11212,
> noDelay=true, directBuf=false, sndBufSize=32768, rcvBufSize=32768,
> idleQryCurTimeout=600000, idleQryCurCheckFreq=60000, sndQueueLimit=0,
> selectorCnt=2, idleTimeout=7000, sslEnabled=false, sslClientAuth=false,
> sslCtxFactory=null, sslFactory=null, portRange=100, threadPoolSize=8,
> msgInterceptor=null], odbcCfg=null, warmupClos=null,
> atomicCfg=AtomicConfiguration [seqReserveSize=1000, cacheMode=PARTITIONED,
> backups=1, aff=null, grpName=null], classLdr=null, sslCtxFactory=null,
> platformCfg=PlatformDotNetConfiguration [binaryCfg=null],
> binaryCfg=BinaryConfiguration [idMapper=null, nameMapper=null,
> serializer=null, compactFooter=true], memCfg=null, pstCfg=null,
> dsCfg=DataStorageConfiguration [pageSize=4096, concLvl=2,
> sysDataRegConf=org.apache.ignite.configuration.SystemDataRegionConfiguration@55a8dc49,
> dfltDataRegConf=DataRegionConfiguration [name=Default-Immutable,
> maxSize=8589934592, initSize=8589934592, swapPath=null,
> pageEvictionMode=DISABLED, pageReplacementMode=CLOCK,
> evictionThreshold=0.9, emptyPagesPoolSize=100, metricsEnabled=false,
> metricsSubIntervalCount=5, metricsRateTimeInterval=60000,
> persistenceEnabled=true, checkpointPageBufSize=0,
> lazyMemoryAllocation=true, warmUpCfg=null, memoryAllocator=null,
> cdcEnabled=false], dataRegions=null,
> storagePath=/persist/Immutable/Persistence, checkpointFreq=30000,
> lockWaitTime=10000, checkpointThreads=4, checkpointWriteOrder=SEQUENTIAL,
> walHistSize=20, maxWalArchiveSize=5368709120, walSegments=10,
> walSegmentSize=536870912, walPath=/persist/Immutable/WalStore,
> walArchivePath=/persist/Immutable/WalArchive, cdcWalPath=db/wal/cdc,
> cdcWalDirMaxSize=0, metricsEnabled=false, walMode=FSYNC, walTlbSize=131072,
> walBuffSize=0, walFlushFreq=2000, walFsyncDelay=1000,
> walRecordIterBuffSize=67108864, alwaysWriteFullPages=false,
> fileIOFactory=org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIOFactory@545f80bf,
> metricsSubIntervalCnt=5, metricsRateTimeInterval=60000,
> walAutoArchiveAfterInactivity=-1, walForceArchiveTimeout=-1,
> writeThrottlingEnabled=false, walCompactionEnabled=false,
> walCompactionLevel=1, checkpointReadLockTimeout=null,
> walPageCompression=DISABLED, walPageCompressionLevel=null,
> dfltWarmUpCfg=null,
> encCfg=org.apache.ignite.configuration.EncryptionConfiguration@22fa55b2,
> defragmentationThreadPoolSize=4, minWalArchiveSize=-1,
> memoryAllocator=null], snapshotPath=snapshots, snapshotThreadPoolSize=4,
> activeOnStart=true, activeOnStartPropSetFlag=false, autoActivation=true,
> autoActivationPropSetFlag=false, clusterStateOnStart=null, sqlConnCfg=null,
> cliConnCfg=ClientConnectorConfiguration [host=null, port=10800,
> portRange=100, sockSndBufSize=0, sockRcvBufSize=0, tcpNoDelay=true,
> maxOpenCursorsPerConn=128, threadPoolSize=8, selectorCnt=4, idleTimeout=0,
> handshakeTimeout=10000, jdbcEnabled=true, odbcEnabled=true,
> thinCliEnabled=true, sslEnabled=false, useIgniteSslCtxFactory=true,
> sslClientAuth=false, sslCtxFactory=null, thinCliCfg=ThinClientConfiguration
> [maxActiveTxPerConn=100, maxActiveComputeTasksPerConn=0,
> sendServerExcStackTraceToClient=false]], mvccVacuumThreadCnt=2,
> mvccVacuumFreq=5000, authEnabled=false, failureHnd=null,
> commFailureRslvr=null, sqlCfg=SqlConfiguration [longQryWarnTimeout=3000,
> dfltQryTimeout=0, sqlQryHistSize=1000, validationEnabled=false],
> asyncContinuationExecutor=null]
> 2023-07-17 22:38:56,414 [1] INF [ImmutableCacheComputeServer]   OS: Linux
> 5.15.0-1041-azure amd64
> 2023-07-17 22:38:56,415 [1] INF [ImmutableCacheComputeServer]   OS user:
> root
> 2023-07-17 22:38:56,419 [1] INF [ImmutableCacheComputeServer]   PID: 1
> 2023-07-17 22:38:56,420 [1] INF [ImmutableCacheComputeServer]   Language
> runtime: Java Platform API Specification ver. 11
> 2023-07-17 22:38:56,420 [1] INF [ImmutableCacheComputeServer]   VM
> information: OpenJDK Runtime Environment 11.0.19+7-LTS Amazon.com Inc.
> OpenJDK 64-Bit Server VM 11.0.19+7-LTS
> 2023-07-17 22:38:56,420 [1] INF [ImmutableCacheComputeServer]   VM total
> memory: 1.0GB
> 2023-07-17 22:38:56,421 [1] INF [ImmutableCacheComputeServer]   Remote
> Management [restart: off, REST: on, JMX (remote: off)]
> 2023-07-17 22:38:56,421 [1] INF [ImmutableCacheComputeServer]   Logger:
> PlatformLogger [traceEnabled=false, debugEnabled=false, infoEnabled=true,
> isQuiet=false]
> 2023-07-17 22:38:56,421 [1] INF [ImmutableCacheComputeServer]
> IGNITE_HOME=/trex/
> 2023-07-17 22:38:56,422 [1] INF [ImmutableCacheComputeServer]   VM
> arguments: [-DIGNITE_QUIET=false, -Djava.net.preferIPv4Stack=true,
> -XX:+UseG1GC, -Djdk.tls.server.protocols="TLSv1.2",
> -Djdk.tls.client.protocols="TLSv1.2",
> --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED,
> --add-exports=java.base/sun.nio.ch=ALL-UNNAMED,
> --add-exports=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED,
> --add-exports=jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED,
> --add-exports=java.base/sun.reflect.generics.reflectiveObjects=ALL-UNNAMED,
> --illegal-access=permit,
> -javaagent:./libs/jmx_prometheus_javaagent-0.18.0.jar=8088:prometheusConfig.yaml,
> -Xms1024m, -Xmx1024m, -Dfile.encoding=UTF-8,
> -Djava.util.logging.config.file=/trex/config/java.util.logging.properties,
> --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED,
> --add-exports=java.base/sun.nio.ch=ALL-UNNAMED,
> --add-exports=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED,
> --add-exports=jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED,
> --add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED,
> --illegal-access=permit,
> --add-opens=java.base/jdk.internal.misc=ALL-UNNAMED, --add-opens=java.base/
> sun.nio.ch=ALL-UNNAMED,
> --add-opens=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED,
> --add-opens=jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED,
> --add-opens=java.base/sun.reflect.generics.reflectiveObjects=ALL-UNNAMED,
> --add-opens=java.base/java.io=ALL-UNNAMED,
> --add-opens=java.base/java.nio=ALL-UNNAMED,
> --add-opens=java.base/java.util=ALL-UNNAMED,
> --add-opens=java.base/java.util.concurrent=ALL-UNNAMED,
> --add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED,
> --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED,
> --add-opens=java.base/java.lang=ALL-UNNAMED,
> --add-opens=java.base/java.lang.invoke=ALL-UNNAMED,
> --add-opens=java.base/java.math=ALL-UNNAMED,
> --add-opens=java.sql/java.sql=ALL-UNNAMED]
> 2023-07-17 22:38:56,422 [1] INF [ImmutableCacheComputeServer]   System
> cache's DataRegion size is configured to 40 MB. Use
> DataStorageConfiguration.systemRegionInitialSize property to change the
> setting.
> 2023-07-17 22:38:56,422 [1] INF [ImmutableCacheComputeServer]   Configured
> caches [in 'sysMemPlc' dataRegion: ['ignite-sys-cache']]
> 2023-07-17 22:38:56,487 [1] INF [ImmutableCacheComputeServer]   Configured
> plugins:
> 2023-07-17 22:38:56,488 [1] INF [ImmutableCacheComputeServer]     ^-- None
> 2023-07-17 22:38:56,488 [1] INF [ImmutableCacheComputeServer]
> 2023-07-17 22:38:56,491 [1] INF [ImmutableCacheComputeServer]   Configured
> failure handler: [hnd=StopNodeOrHaltFailureHandler [tryStop=false,
> timeout=0, super=AbstractFailureHandler
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]]]
> 2023-07-17 22:38:56,747 [1] INF [ImmutableCacheComputeServer]
> Successfully bound communication NIO server to TCP port [port=47100,
> locHost=0.0.0.0/0.0.0.0, selectorsCnt=2, selectorSpins=0,
> pairedConn=false]
> 2023-07-17 22:38:56,749 [1] WRN [ImmutableCacheComputeServer]   Failure
> detection timeout will be ignored (one of SPI parameters has been set
> explicitly)
> 2023-07-17 22:38:56,769 [1] INF [ImmutableCacheComputeServer]   Collision
> resolution is disabled (all jobs will be activated upon arrival).
> 2023-07-17 22:38:56,836 [1] INF [ImmutableCacheComputeServer]
> Successfully bound to TCP port [port=47500, localHost=0.0.0.0/0.0.0.0,
> locNodeId=4e70ba5e-5829-4b2d-b349-6539918990b5]
> 2023-07-17 22:38:56,869 [1] INF [ImmutableCacheComputeServer]
> Successfully locked persistence storage folder
> [/persist/Immutable/Persistence/node00-03094411-a868-4d96-8ea3-39df7f6f2262]
>
> 2023-07-17 22:38:56,870 [1] INF [ImmutableCacheComputeServer]   Consistent
> ID used for local node is [03094411-a868-4d96-8ea3-39df7f6f2262] according
> to persistence data storage folders
> 2023-07-17 22:38:56,870 [1] INF [ImmutableCacheComputeServer]   Resolved
> store directory for node persistent data:
> /persist/Immutable/Persistence/node00-03094411-a868-4d96-8ea3-39df7f6f2262
> 2023-07-17 22:39:00,359 [1] INF [ImmutableCacheComputeServer]   Resolved
> directory for serialized binary metadata:
> /persist/Immutable/db/binary_meta/node00-03094411-a868-4d96-8ea3-39df7f6f2262
>
> 2023-07-17 22:39:02,289 [1] INF [ImmutableCacheComputeServer]   Resolved
> page store work directory:
> /persist/Immutable/Persistence/node00-03094411-a868-4d96-8ea3-39df7f6f2262
> 2023-07-17 22:39:02,325 [1] INF [ImmutableCacheComputeServer]   Resolved
> page store work directory:
> /persist/Immutable/Persistence/node00-03094411-a868-4d96-8ea3-39df7f6f2262
> 2023-07-17 22:39:02,412 [1] INF [ImmutableCacheComputeServer]   Resolved
> write ahead log work directory:
> /persist/Immutable/WalStore/node00-03094411-a868-4d96-8ea3-39df7f6f2262
> 2023-07-17 22:39:02,417 [1] INF [ImmutableCacheComputeServer]   Resolved
> write ahead log archive directory:
> /persist/Immutable/WalArchive/node00-03094411-a868-4d96-8ea3-39df7f6f2262
> 2023-07-17 22:39:02,568 [1] INF [ImmutableCacheComputeServer]   Configured
> data regions initialized successfully [total=5]
> 2023-07-17 22:39:02,617 [1] INF [ImmutableCacheComputeServer]   Resolved
> snapshot work directory: /persist/Immutable/snapshots
> 2023-07-17 22:39:02,617 [1] INF [ImmutableCacheComputeServer]   Resolved
> temp directory for snapshot creation:
> /persist/Immutable/Persistence/node00-03094411-a868-4d96-8ea3-39df7f6f2262/snp
>
> 2023-07-17 22:39:02,670 [1] WRN [ImmutableCacheComputeServer]
> Serialization of Java objects in H2 was enabled.
> 2023-07-17 22:39:02,798 [1] INF [ImmutableCacheComputeServer]   Client
> connector processor has started on TCP port 10800
> 2023-07-17 22:39:02,845 [1] INF [ImmutableCacheComputeServer]   Command
> protocol successfully started [name=TCP binary, host=0.0.0.0/0.0.0.0,
> port=11212]
> 2023-07-17 22:39:02,868 [1] WRN [ImmutableCacheComputeServer]   Marshaller
> is automatically set to o.a.i.i.binary.BinaryMarshaller (other nodes must
> have the same marshaller type).
> 2023-07-17 22:39:02,885 [1] INF [ImmutableCacheComputeServer]   Configured
> .NET plugins:
> 2023-07-17 22:39:02,885 [1] INF [ImmutableCacheComputeServer]     ^-- None
> 2023-07-17 22:39:02,897 [1] INF [ImmutableCacheComputeServer]
> Non-loopback local IPs: 10.215.104.141
> 2023-07-17 22:39:02,898 [1] INF [ImmutableCacheComputeServer]   Enabled
> local MACs: E636C15B5514
> 2023-07-17 22:39:02,948 [1] INF [ImmutableCacheComputeServer]   Read
> checkpoint status
> [startMarker=/persist/Immutable/Persistence/node00-03094411-a868-4d96-8ea3-39df7f6f2262/cp/1689633469988-4581c48e-4906-41c4-9c63-c798bfb50fc6-START.bin,
> endMarker=/persist/Immutable/Persistence/node00-03094411-a868-4d96-8ea3-39df7f6f2262/cp/1689632606392-6cda821a-488a-40c3-9d24-20d100190181-END.bin]
>
> 2023-07-17 22:39:02,953 [1] INF [ImmutableCacheComputeServer]   Started
> page memory [memoryAllocated=100.0 MiB, pages=24814, tableSize=1.9 MiB,
> replacementSize=3.0 KiB, checkpointBuffer=100.0 MiB]
> 2023-07-17 22:39:02,954 [1] INF [ImmutableCacheComputeServer]   Checking
> memory state [lastValidPos=WALPointer [idx=2431, fileOff=201666668,
> len=60379], lastMarked=WALPointer [idx=2431, fileOff=215985766, len=60379],
> lastCheckpointId=4581c48e-4906-41c4-9c63-c798bfb50fc6]
> 2023-07-17 22:39:03,260 [1] WRN [ImmutableCacheComputeServer]   WAL
> segment tail reached. [idx=2431, isWorkDir=true,
> serVer=org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV2Serializer@1b90fee4,
> actualFilePtr=WALPointer [idx=2431, fileOff=216046145, len=0]]
> 2023-07-17 22:39:03,265 [1] INF [ImmutableCacheComputeServer]   Restoring
> checkpoint after logical recovery, will start physical recovery from back
> pointer: WALPointer [idx=2431, fileOff=209031823, len=29]
> 2023-07-17 22:39:03,352 [30] ERR [ImmutableCacheComputeServer]   Failed to
> apply page delta. rec=[PagesListRemovePageRecord
> [rmvdPageId=0101000100000057, pageId=0101000100000004, grpId=-1476359018,
> super=PageDeltaRecord [grpId=-1476359018, pageId=0101000100000004,
> super=WALRecord [size=41, chainSize=0, pos=WALPointer [idx=2431,
> fileOff=209169155, len=41], type=PAGES_LIST_REMOVE_PAGE]]]]
> 2023-07-17 22:39:03,355 [30] ERR [ImmutableCacheComputeServer]   Failed to
> apply page delta. rec=[PagesListRemovePageRecord
> [rmvdPageId=0101000100000054, pageId=0101000100000004, grpId=-1476359018,
> super=PageDeltaRecord [grpId=-1476359018, pageId=0101000100000004,
> super=WALRecord [size=41, chainSize=0, pos=WALPointer [idx=2431,
> fileOff=209173350, len=41], type=PAGES_LIST_REMOVE_PAGE]]]]
> 2023-07-17 22:39:03,356 [30] ERR [ImmutableCacheComputeServer]   Failed to
> apply page delta. rec=[PagesListRemovePageRecord
> [rmvdPageId=0101000100000048, pageId=0101000100000004, grpId=-1476359018,
> super=PageDeltaRecord [grpId=-1476359018, pageId=0101000100000004,
> super=WALRecord [size=41, chainSize=0, pos=WALPointer [idx=2431,
> fileOff=209177545, len=41], type=PAGES_LIST_REMOVE_PAGE]]]]
> 2023-07-17 22:39:03,356 [30] ERR [ImmutableCacheComputeServer]   Failed to
> apply page delta. rec=[PagesListRemovePageRecord
> [rmvdPageId=0101000100000047, pageId=0101000100000004, grpId=-1476359018,
> super=PageDeltaRecord [grpId=-1476359018, pageId=0101000100000004,
> super=WALRecord [size=41, chainSize=0, pos=WALPointer [idx=2431,
> fileOff=209181740, len=41], type=PAGES_LIST_REMOVE_PAGE]]]]
> 2023-07-17 22:39:03,357 [30] ERR [ImmutableCacheComputeServer]   Failed to
> apply page delta. rec=[PagesListRemovePageRecord
> [rmvdPageId=0101000100000046, pageId=0101000100000004, grpId=-1476359018,
> super=PageDeltaRecord [grpId=-1476359018, pageId=0101000100000004,
> super=WALRecord [size=41, chainSize=0, pos=WALPointer [idx=2431,
> fileOff=209185935, len=41], type=PAGES_LIST_REMOVE_PAGE]]]]
> 2023-07-17 22:39:03,357 [30] ERR [ImmutableCacheComputeServer]   Failed to
> apply page delta. rec=[PagesListRemovePageRecord
> [rmvdPageId=0101000100000045, pageId=0101000100000004, grpId=-1476359018,
> super=PageDeltaRecord [grpId=-1476359018, pageId=0101000100000004,
> super=WALRecord [size=41, chainSize=0, pos=WALPointer [idx=2431,
> fileOff=209190130, len=41], type=PAGES_LIST_REMOVE_PAGE]]]]
> 2023-07-17 22:39:03,358 [30] ERR [ImmutableCacheComputeServer]   Failed to
> apply page delta. rec=[PagesListRemovePageRecord
> [rmvdPageId=0101000100000044, pageId=0101000100000004, grpId=-1476359018,
> super=PageDeltaRecord [grpId=-1476359018, pageId=0101000100000004,
> super=WALRecord [size=41, chainSize=0, pos=WALPointer [idx=2431,
> fileOff=209194325, len=41], type=PAGES_LIST_REMOVE_PAGE]]]]
> 2023-07-17 22:39:03,360 [30] ERR [ImmutableCacheComputeServer]   Failed to
> apply page delta. rec=[PagesListRemovePageRecord
> [rmvdPageId=0101000100000043, pageId=0101000100000004, grpId=-1476359018,
> super=PageDeltaRecord [grpId=-1476359018, pageId=0101000100000004,
> super=WALRecord [size=41, chainSize=0, pos=WALPointer [idx=2431,
> fileOff=209198520, len=41], type=PAGES_LIST_REMOVE_PAGE]]]]
> 2023-07-17 22:39:03,360 [30] ERR [ImmutableCacheComputeServer]   Failed to
> apply page delta. rec=[PagesListRemovePageRecord
> [rmvdPageId=0101000100000042, pageId=0101000100000004, grpId=-1476359018,
> super=PageDeltaRecord [grpId=-1476359018, pageId=0101000100000004,
> super=WALRecord [size=41, chainSize=0, pos=WALPointer [idx=2431,
> fileOff=209202715, len=41], type=PAGES_LIST_REMOVE_PAGE]]]]
> 2023-07-17 22:39:03,362 [1] INF [ImmutableCacheComputeServer]   Cleanup
> cache stores [total=0, left=0, cleanFiles=false]
> 2023-07-17 22:39:03,363 [1] ERR [ImmutableCacheComputeServer]   Exception
> during start processors, node will be stopped and close connections
> 2023-07-17 22:39:03,364 [1] ERR [ImmutableCacheComputeServer]   Got
> exception while starting (will rollback startup routine).
> 2023-07-17 22:39:03,365 [1] WRN [ImmutableCacheComputeServer]   Attempt to
> stop starting grid. This operation cannot be guaranteed to be successful.
> 2023-07-17 22:39:03,369 [1] INF [ImmutableCacheComputeServer]   Command
> protocol successfully stopped: TCP binary
> 2023-07-17 22:39:03,458 [1] INF
> [VSS.TRex.GridFabric.Servers.Compute.ImmutableCacheComputeServer]
> Completed creation of new Ignite node: Exists = False, Factory available =
> True
> 2023-07-17 22:39:03,458 [1] WRN
> [VSS.TRex.GridFabric.Servers.Compute.ImmutableCacheComputeServer]   Unable
> to obtain instance of TRex-Immutable at attempt:1
> Unhandled exception: Apache.Ignite.Core.Common.IgniteException: Failed to
> apply page delta
>  ---> Apache.Ignite.Core.Common.JavaException: class
> org.apache.ignite.IgniteException: Failed to apply page delta
> at
> org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:1150)
> at
> org.apache.ignite.internal.processors.platform.PlatformAbstractBootstrap.start(PlatformAbstractBootstrap.java:48)
> at
> org.apache.ignite.internal.processors.platform.PlatformIgnition.start(PlatformIgnition.java:74)
> Caused by: class org.apache.ignite.IgniteCheckedException: Failed to apply
> page delta
> at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.lambda$performBinaryMemoryRestore$26(GridCacheDatabaseSharedManager.java:2289)
> at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.lambda$stripedApplyPage$27(GridCacheDatabaseSharedManager.java:2346)
> at
> org.apache.ignite.internal.processors.cache.persistence.CacheStripedExecutor.lambda$submit$0(CacheStripedExecutor.java:75)
> at
> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:637)
> at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.IllegalStateException: Failed to get page IO instance
> (page content is corrupted)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forVersion(IOVersions.java:85)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forPage(IOVersions.java:97)
> at
> org.apache.ignite.internal.pagemem.wal.record.delta.PagesListRemovePageRecord.applyDelta(PagesListRemovePageRecord.java:55)
> at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyPageDelta(GridCacheDatabaseSharedManager.java:2401)
> at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.lambda$performBinaryMemoryRestore$26(GridCacheDatabaseSharedManager.java:2282)
> ... 5 more
>    at Apache.Ignite.Core.Impl.Unmanaged.Jni.Env.ExceptionCheck()
>    at
> Apache.Ignite.Core.Impl.Unmanaged.Jni.Env.CallStaticVoidMethod(GlobalRef
> cls, IntPtr methodId, Int64* argsPtr)
>    at Apache.Ignite.Core.Impl.Unmanaged.UnmanagedUtils.IgnitionStart(Env
> env, String cfgPath, String gridName, Boolean clientMode, Boolean
> userLogger, Int64 igniteId, Boolean redirectConsole)
>    at Apache.Ignite.Core.Ignition.Start(IgniteConfiguration cfg)
>    --- End of inner exception stack trace ---
>    at Apache.Ignite.Core.Ignition.Start(IgniteConfiguration cfg)
>
> Thanks,
> Raymond.
>
> On Wed, Jul 19, 2023 at 5:43 AM Raymond Wilson <raymond_wil...@trimble.com>
> wrote:
>
>> Hi Alex,
>>
>> We are using Ignite v2.15.
>>
>> I will track down the additional log information and reply on this thread.
>>
>> Raymond.
>>
>>
>> On Wed, Jul 19, 2023 at 2:55 AM Alex Plehanov <plehanov.a...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> Which Ignite version do you use?
>>> Please share exception details after "Exception during start processors,
>>> node will be stopped and close connections" (there should be a reason in
>>> the log, why the page delta can't be applied).
>>>
>>> вт, 18 июл. 2023 г. в 05:05, Raymond Wilson <raymond_wil...@trimble.com
>>> >:
>>>
>>>> Hi,
>>>>
>>>> We run a dev/alpha stack of our application in Azure Kubernetes.
>>>> Persistent storage is contained in Azure Files NAS storage volumes, one per
>>>> server node.
>>>>
>>>> We ran an upgrade of Kubernetes today (from 1.24.9 to 1.26.3). During
>>>> the update various pods were stopped and restarted as is normal for an
>>>> update. This included nodes running the dev/alpha stack.
>>>>
>>>> At least one node (of a cluster of four server nodes in the cluster)
>>>> failed to restart after the update, with the following logging:
>>>>
>>>>   2023-07-18 01:23:55.171 [1] INF    Restoring checkpoint after logical
>>>> recovery, will start physical recovery from back pointer: WALPointer
>>>> [idx=2431, fileOff=209031823, len=29]
>>>>  2023-07-18 01:23:55.205  [28] ERR    Failed to apply page delta.
>>>> rec=[PagesListRemovePageRecord [rmvdPageId=0101000100000057,
>>>> pageId=0101000100000004, grpId=-1476359018, super=PageDeltaRecord
>>>> [grpId=-1476359018, pageId=0101000100000004, super=WALRecord [size=41,
>>>> chainSize=0, pos=WALPointer [idx=2431, fileOff=209169155, len=41],
>>>> type=PAGES_LIST_REMOVE_PAGE]]]]
>>>>  2023-07-18 01:23:55.217 [1] INF    Cleanup cache stores [total=0,
>>>> left=0, cleanFiles=false]
>>>>  2023-07-18 01:23:55.218 [1] ERR    Got exception while starting (will
>>>> rollback startup routine).
>>>>  2023-07-18 01:23:55.218 [1] ERR    Exception during start processors,
>>>> node will be stopped and close connections
>>>>
>>>> I know Apache Ignite is very good at surviving 'Big Red Switch'
>>>> scenarios, and we have our data regions configured with the strictest
>>>> update protocol (full sync after each write), however it's possible the NAS
>>>> implementation does something different!
>>>>
>>>> I think if we delete the WAL files from the nodes that won't restart
>>>> then the node may be happy, though we will lose any updates since the last
>>>> checkpoint (but then, it has low use and checkpoints are every 30-45
>>>> seconds or so, so this won't be significant).
>>>>
>>>> Is this an error anyone else has noticed?
>>>> Has anyone else had similar issues with Azure Files when using strict
>>>> update/sync semantics?
>>>>
>>>> Thanks,
>>>> Raymond.
>>>>
>>>> --
>>>> <http://www.trimble.com/>
>>>> Raymond Wilson
>>>> Trimble Distinguished Engineer, Civil Construction Software (CCS)
>>>> 11 Birmingham Drive | Christchurch, New Zealand
>>>> raymond_wil...@trimble.com
>>>>
>>>>
>>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>>>
>>>
>>
>> --
>> <http://www.trimble.com/>
>> Raymond Wilson
>> Trimble Distinguished Engineer, Civil Construction Software (CCS)
>> 11 Birmingham Drive | Christchurch, New Zealand
>> raymond_wil...@trimble.com
>>
>>
>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>
>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Trimble Distinguished Engineer, Civil Construction Software (CCS)
> 11 Birmingham Drive | Christchurch, New Zealand
> raymond_wil...@trimble.com
>
>
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>
-- 
<http://www.trimble.com/>
Raymond Wilson
Trimble Distinguished Engineer, Civil Construction Software (CCS)
11 Birmingham Drive | Christchurch, New Zealand
raymond_wil...@trimble.com

<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>

Reply via email to