Re: GridProcessorAdapter fails to start due to failure to initialise WAL segment on Ignite startup
Hello! It might be safe to remove that file and restart node. Regards, -- Ilya Kasnacheev вт, 4 дек. 2018 г. в 01:35, Raymond Wilson : > Hi Ilya, > > I check the state of the WAL file in question (0008.wal) - it > is a zero byte WAL file. The only other WAL file present in the same > location is .wal (65536kb in size), which seems odd as > WALfiles 0001.wal through 0007.wal are not present. > > Thanks, > Raymond. > > On Tue, Dec 4, 2018 at 6:03 AM Ilya Kasnacheev > wrote: > >> Hello! >> >> It seems that WAL file got truncated or something like that. >> >> Can you post this file to some file storage so that we could look? >> >> You can also try to change this node's WAL mode to LOG_ONLY and try to >> start it again (after backing up data, of course). Checks are less strict >> in this case. >> >> Regards, >> -- >> Ilya Kasnacheev >> >> >> пт, 30 нояб. 2018 г. в 22:33, Raymond Wilson > >: >> >>> Hi Ilya, >>> >>> We don’t change the WAL segment size from the default values. >>> >>> The only activity that occurred was stopping a node, making a minor >>> change (not related to persistence) and rerunning the node. >>> >>> Raymond. >>> >>> Sent from my iPhone >>> >>> On 1/12/2018, at 2:52 AM, Ilya Kasnacheev >>> wrote: >>> >>> Hello! >>> >>> "WAL segment size change is not supported" >>> >>> Is there a chance that you have changed WAL segment size setting between >>> launches? >>> >>> Regards, >>> -- >>> Ilya Kasnacheev >>> >>> >>> чт, 29 нояб. 2018 г. в 02:39, Raymond Wilson >> >: >>> I'm using Ignite 2.6 with the C# client. I have a running cluster that I was debugging. All requests were read only (there were no state mutating operations running in the cluster. I terminated the one server node in the grid (running in the debugger) to make a small code change and re-run it (I do this frequently). The node may have been stopped for longer than the partitioning timeout. On re-running the server node it failed to start. On re-running the complete cluster it still failed to start, and all other nodes report failure to connect to a inactive grid. Looking at the log for the server node that is failing I get the following log showing an exception while initializing a WAL segment. This failure seems permanent and is unexpected as we are using the strict WAL atomicity mode (WalMode.Fsync) for all persisted regions.Is this a recoverable error, or does this imply data loss? [NB: This is a dev system so no prod data is affected]] 2018-11-29 12:26:09,933 [1] INFO ImmutableCacheComputeServer >>> __ >>> / _/ ___/ |/ / _/_ __/ __/ >>> _/ // (7 7// / / / / _/ >>> /___/\___/_/|_/___/ /_/ /___/ >>> >>> ver. 2.6.0#20180710-sha1:669feacc >>> 2018 Copyright(C) Apache Software Foundation >>> >>> Ignite documentation: http://ignite.apache.org 2018-11-29 12:26:09,933 [1] INFO ImmutableCacheComputeServer Config URL: n/a 2018-11-29 12:26:09,948 [1] INFO ImmutableCacheComputeServer IgniteConfiguration [igniteInstanceName=TRex-Immutable, pubPoolSize=50, svcPoolSize=12, callbackPoolSize=12, stripedPoolSize=12, sysPoolSize=12, mgmtPoolSize=4, igfsPoolSize=12, dataStreamerPoolSize=12, utilityCachePoolSize=12, utilityCacheKeepAliveTime=6, p2pPoolSize=2, qryPoolSize=12, igniteHome=null, igniteWorkDir=C:\Users\rwilson\AppData\Local\Temp\TRexIgniteData\Immutable, mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer@6e4784bc, nodeId=8f32d0a6-539c-40dd-bc42-d044f28bac73, marsh=org.apache.ignite.internal.binary.BinaryMarshaller@e4487af, marshLocJobs=false, daemon=false, p2pEnabled=false, netTimeout=5000, sndRetryDelay=1000, sndRetryCnt=3, metricsHistSize=1, metricsUpdateFreq=2000, metricsExpTime=9223372036854775807, discoSpi=TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, ackTimeout=5000, marsh=null, reconCnt=10, reconDelay=2000, maxAckTimeout=60, forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null], segPlc=STOP, segResolveAttempts=2, waitForSegOnStart=true, allResolversPassReq=true, segChkFreq=1, commSpi=TcpCommunicationSpi [connectGate=null, connPlc=null, enableForcibleNodeKill=false, enableTroubleshootingLog=false, srvLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2@10d68fcd, locAddr=127.0.0.1, locHost=null, locPort=47100, locPortRange=100, shmemPort=-1, directBuf=true, directSndBuf=false, idleConnTimeout=3, connTimeout=5000, maxConnTimeout=60, reconCnt=10, sockSndBuf=32768, sockRcvBuf=32768, msgQueueLimit=1024, slowClientQueueLimit=0, nioSrvr=null, shmemSrv=null, usePairedConnections=false, connectionsPerNode=1, tcpNoDelay=true, filterReachableAddresses=false, ackSndThreshold=16, unackedMsgsBu
Re: GridProcessorAdapter fails to start due to failure to initialise WAL segment on Ignite startup
Hi Ilya, I check the state of the WAL file in question (0008.wal) - it is a zero byte WAL file. The only other WAL file present in the same location is .wal (65536kb in size), which seems odd as WALfiles 0001.wal through 0007.wal are not present. Thanks, Raymond. On Tue, Dec 4, 2018 at 6:03 AM Ilya Kasnacheev wrote: > Hello! > > It seems that WAL file got truncated or something like that. > > Can you post this file to some file storage so that we could look? > > You can also try to change this node's WAL mode to LOG_ONLY and try to > start it again (after backing up data, of course). Checks are less strict > in this case. > > Regards, > -- > Ilya Kasnacheev > > > пт, 30 нояб. 2018 г. в 22:33, Raymond Wilson : > >> Hi Ilya, >> >> We don’t change the WAL segment size from the default values. >> >> The only activity that occurred was stopping a node, making a minor >> change (not related to persistence) and rerunning the node. >> >> Raymond. >> >> Sent from my iPhone >> >> On 1/12/2018, at 2:52 AM, Ilya Kasnacheev >> wrote: >> >> Hello! >> >> "WAL segment size change is not supported" >> >> Is there a chance that you have changed WAL segment size setting between >> launches? >> >> Regards, >> -- >> Ilya Kasnacheev >> >> >> чт, 29 нояб. 2018 г. в 02:39, Raymond Wilson > >: >> >>> I'm using Ignite 2.6 with the C# client. >>> >>> I have a running cluster that I was debugging. All requests were read >>> only (there were no state mutating operations running in the cluster. >>> >>> I terminated the one server node in the grid (running in the debugger) >>> to make a small code change and re-run it (I do this frequently). The node >>> may have been stopped for longer than the partitioning timeout. >>> >>> On re-running the server node it failed to start. On re-running the >>> complete cluster it still failed to start, and all other nodes report >>> failure to connect to a inactive grid. >>> >>> Looking at the log for the server node that is failing I get the >>> following log showing an exception while initializing a WAL segment. This >>> failure seems permanent and is unexpected as we are using the strict WAL >>> atomicity mode (WalMode.Fsync) for all persisted regions.Is this a >>> recoverable error, or does this imply data loss? [NB: This is a dev system >>> so no prod data is affected]] >>> >>> >>> 2018-11-29 12:26:09,933 [1] INFO ImmutableCacheComputeServer >>> >>> __ >>> / _/ ___/ |/ / _/_ __/ __/>>> >>> _/ // (7 7// / / / / _/ >>> /___/\___/_/|_/___/ /_/ /___/ >>> >>> >>> ver. 2.6.0#20180710-sha1:669feacc >>> 2018 Copyright(C) Apache >>> Software Foundation >>> >>> Ignite documentation: >>> http://ignite.apache.org >>> 2018-11-29 12:26:09,933 [1] INFO ImmutableCacheComputeServer Config >>> URL: n/a >>> 2018-11-29 12:26:09,948 [1] INFO ImmutableCacheComputeServer >>> IgniteConfiguration [igniteInstanceName=TRex-Immutable, pubPoolSize=50, >>> svcPoolSize=12, callbackPoolSize=12, stripedPoolSize=12, sysPoolSize=12, >>> mgmtPoolSize=4, igfsPoolSize=12, dataStreamerPoolSize=12, >>> utilityCachePoolSize=12, utilityCacheKeepAliveTime=6, p2pPoolSize=2, >>> qryPoolSize=12, igniteHome=null, >>> igniteWorkDir=C:\Users\rwilson\AppData\Local\Temp\TRexIgniteData\Immutable, >>> mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer@6e4784bc, >>> nodeId=8f32d0a6-539c-40dd-bc42-d044f28bac73, >>> marsh=org.apache.ignite.internal.binary.BinaryMarshaller@e4487af, >>> marshLocJobs=false, daemon=false, p2pEnabled=false, netTimeout=5000, >>> sndRetryDelay=1000, sndRetryCnt=3, metricsHistSize=1, >>> metricsUpdateFreq=2000, metricsExpTime=9223372036854775807, >>> discoSpi=TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, >>> ackTimeout=5000, marsh=null, reconCnt=10, reconDelay=2000, >>> maxAckTimeout=60, forceSrvMode=false, clientReconnectDisabled=false, >>> internalLsnr=null], segPlc=STOP, segResolveAttempts=2, >>> waitForSegOnStart=true, allResolversPassReq=true, segChkFreq=1, >>> commSpi=TcpCommunicationSpi [connectGate=null, connPlc=null, >>> enableForcibleNodeKill=false, enableTroubleshootingLog=false, >>> srvLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2@10d68fcd, >>> locAddr=127.0.0.1, locHost=null, locPort=47100, locPortRange=100, >>> shmemPort=-1, directBuf=true, directSndBuf=false, idleConnTimeout=3, >>> connTimeout=5000, maxConnTimeout=60, reconCnt=10, sockSndBuf=32768, >>> sockRcvBuf=32768, msgQueueLimit=1024, slowClientQueueLimit=0, nioSrvr=null, >>> shmemSrv=null, usePairedConnections=false, connectionsPerNode=1, >>> tcpNoDelay=true, filterReachableAddresses=false, ackSndThreshold=16, >>> unackedMsgsBufSize=0, sockWriteTimeout=2000, lsnr=null, boundTcpPort=-1, >>> boundTcpShmemPort=-1, selectorsCnt=4, selectorSpins=0, addrRslvr=null, >>> ctxInitLatch=java.util.concurrent.CountDownLatch@117e949d[Count = 1], >>> stopping=false, >>> metricsLsnr=org.apache.ignite.spi.co
Re: GridProcessorAdapter fails to start due to failure to initialise WAL segment on Ignite startup
Hello! It seems that WAL file got truncated or something like that. Can you post this file to some file storage so that we could look? You can also try to change this node's WAL mode to LOG_ONLY and try to start it again (after backing up data, of course). Checks are less strict in this case. Regards, -- Ilya Kasnacheev пт, 30 нояб. 2018 г. в 22:33, Raymond Wilson : > Hi Ilya, > > We don’t change the WAL segment size from the default values. > > The only activity that occurred was stopping a node, making a minor change > (not related to persistence) and rerunning the node. > > Raymond. > > Sent from my iPhone > > On 1/12/2018, at 2:52 AM, Ilya Kasnacheev > wrote: > > Hello! > > "WAL segment size change is not supported" > > Is there a chance that you have changed WAL segment size setting between > launches? > > Regards, > -- > Ilya Kasnacheev > > > чт, 29 нояб. 2018 г. в 02:39, Raymond Wilson : > >> I'm using Ignite 2.6 with the C# client. >> >> I have a running cluster that I was debugging. All requests were read >> only (there were no state mutating operations running in the cluster. >> >> I terminated the one server node in the grid (running in the debugger) to >> make a small code change and re-run it (I do this frequently). The node may >> have been stopped for longer than the partitioning timeout. >> >> On re-running the server node it failed to start. On re-running the >> complete cluster it still failed to start, and all other nodes report >> failure to connect to a inactive grid. >> >> Looking at the log for the server node that is failing I get the >> following log showing an exception while initializing a WAL segment. This >> failure seems permanent and is unexpected as we are using the strict WAL >> atomicity mode (WalMode.Fsync) for all persisted regions.Is this a >> recoverable error, or does this imply data loss? [NB: This is a dev system >> so no prod data is affected]] >> >> >> 2018-11-29 12:26:09,933 [1] INFO ImmutableCacheComputeServer >>> >> __ >>> / _/ ___/ |/ / _/_ __/ __/>>> >> _/ // (7 7// / / / / _/ >>> /___/\___/_/|_/___/ /_/ /___/ >> >>> >>> ver. 2.6.0#20180710-sha1:669feacc >>> 2018 Copyright(C) Apache >> Software Foundation >>> >>> Ignite documentation: >> http://ignite.apache.org >> 2018-11-29 12:26:09,933 [1] INFO ImmutableCacheComputeServer Config URL: >> n/a >> 2018-11-29 12:26:09,948 [1] INFO ImmutableCacheComputeServer >> IgniteConfiguration [igniteInstanceName=TRex-Immutable, pubPoolSize=50, >> svcPoolSize=12, callbackPoolSize=12, stripedPoolSize=12, sysPoolSize=12, >> mgmtPoolSize=4, igfsPoolSize=12, dataStreamerPoolSize=12, >> utilityCachePoolSize=12, utilityCacheKeepAliveTime=6, p2pPoolSize=2, >> qryPoolSize=12, igniteHome=null, >> igniteWorkDir=C:\Users\rwilson\AppData\Local\Temp\TRexIgniteData\Immutable, >> mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer@6e4784bc, >> nodeId=8f32d0a6-539c-40dd-bc42-d044f28bac73, >> marsh=org.apache.ignite.internal.binary.BinaryMarshaller@e4487af, >> marshLocJobs=false, daemon=false, p2pEnabled=false, netTimeout=5000, >> sndRetryDelay=1000, sndRetryCnt=3, metricsHistSize=1, >> metricsUpdateFreq=2000, metricsExpTime=9223372036854775807, >> discoSpi=TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, >> ackTimeout=5000, marsh=null, reconCnt=10, reconDelay=2000, >> maxAckTimeout=60, forceSrvMode=false, clientReconnectDisabled=false, >> internalLsnr=null], segPlc=STOP, segResolveAttempts=2, >> waitForSegOnStart=true, allResolversPassReq=true, segChkFreq=1, >> commSpi=TcpCommunicationSpi [connectGate=null, connPlc=null, >> enableForcibleNodeKill=false, enableTroubleshootingLog=false, >> srvLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2@10d68fcd, >> locAddr=127.0.0.1, locHost=null, locPort=47100, locPortRange=100, >> shmemPort=-1, directBuf=true, directSndBuf=false, idleConnTimeout=3, >> connTimeout=5000, maxConnTimeout=60, reconCnt=10, sockSndBuf=32768, >> sockRcvBuf=32768, msgQueueLimit=1024, slowClientQueueLimit=0, nioSrvr=null, >> shmemSrv=null, usePairedConnections=false, connectionsPerNode=1, >> tcpNoDelay=true, filterReachableAddresses=false, ackSndThreshold=16, >> unackedMsgsBufSize=0, sockWriteTimeout=2000, lsnr=null, boundTcpPort=-1, >> boundTcpShmemPort=-1, selectorsCnt=4, selectorSpins=0, addrRslvr=null, >> ctxInitLatch=java.util.concurrent.CountDownLatch@117e949d[Count = 1], >> stopping=false, >> metricsLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationMetricsListener@6db9f5a4], >> evtSpi=org.apache.ignite.spi.eventstorage.NoopEventStorageSpi@5f8edcc5, >> colSpi=NoopCollisionSpi [], deploySpi=LocalDeploymentSpi [lsnr=null], >> indexingSpi=org.apache.ignite.spi.indexing.noop.NoopIndexingSpi@7a675056, >> addrRslvr=null, clientMode=false, rebalanceThreadPoolSize=1, >> txCfg=org.apache.ignite.configuration.TransactionConfiguration@d21a74c, >> cacheSanityCheckEnabled=true, discoStartupDelay=6, deployMode=S
Re: GridProcessorAdapter fails to start due to failure to initialise WAL segment on Ignite startup
Hi Ilya, We don’t change the WAL segment size from the default values. The only activity that occurred was stopping a node, making a minor change (not related to persistence) and rerunning the node. Raymond. Sent from my iPhone On 1/12/2018, at 2:52 AM, Ilya Kasnacheev wrote: Hello! "WAL segment size change is not supported" Is there a chance that you have changed WAL segment size setting between launches? Regards, -- Ilya Kasnacheev чт, 29 нояб. 2018 г. в 02:39, Raymond Wilson : > I'm using Ignite 2.6 with the C# client. > > I have a running cluster that I was debugging. All requests were read only > (there were no state mutating operations running in the cluster. > > I terminated the one server node in the grid (running in the debugger) to > make a small code change and re-run it (I do this frequently). The node may > have been stopped for longer than the partitioning timeout. > > On re-running the server node it failed to start. On re-running the > complete cluster it still failed to start, and all other nodes report > failure to connect to a inactive grid. > > Looking at the log for the server node that is failing I get the following > log showing an exception while initializing a WAL segment. This failure > seems permanent and is unexpected as we are using the strict WAL atomicity > mode (WalMode.Fsync) for all persisted regions.Is this a recoverable error, > or does this imply data loss? [NB: This is a dev system so no prod data is > affected]] > > > 2018-11-29 12:26:09,933 [1] INFO ImmutableCacheComputeServer >>> > __ >>> / _/ ___/ |/ / _/_ __/ __/>>> > _/ // (7 7// / / / / _/ >>> /___/\___/_/|_/___/ /_/ /___/ > >>> >>> ver. 2.6.0#20180710-sha1:669feacc >>> 2018 Copyright(C) Apache > Software Foundation >>> >>> Ignite documentation: > http://ignite.apache.org > 2018-11-29 12:26:09,933 [1] INFO ImmutableCacheComputeServer Config URL: > n/a > 2018-11-29 12:26:09,948 [1] INFO ImmutableCacheComputeServer > IgniteConfiguration [igniteInstanceName=TRex-Immutable, pubPoolSize=50, > svcPoolSize=12, callbackPoolSize=12, stripedPoolSize=12, sysPoolSize=12, > mgmtPoolSize=4, igfsPoolSize=12, dataStreamerPoolSize=12, > utilityCachePoolSize=12, utilityCacheKeepAliveTime=6, p2pPoolSize=2, > qryPoolSize=12, igniteHome=null, > igniteWorkDir=C:\Users\rwilson\AppData\Local\Temp\TRexIgniteData\Immutable, > mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer@6e4784bc, > nodeId=8f32d0a6-539c-40dd-bc42-d044f28bac73, > marsh=org.apache.ignite.internal.binary.BinaryMarshaller@e4487af, > marshLocJobs=false, daemon=false, p2pEnabled=false, netTimeout=5000, > sndRetryDelay=1000, sndRetryCnt=3, metricsHistSize=1, > metricsUpdateFreq=2000, metricsExpTime=9223372036854775807, > discoSpi=TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, > ackTimeout=5000, marsh=null, reconCnt=10, reconDelay=2000, > maxAckTimeout=60, forceSrvMode=false, clientReconnectDisabled=false, > internalLsnr=null], segPlc=STOP, segResolveAttempts=2, > waitForSegOnStart=true, allResolversPassReq=true, segChkFreq=1, > commSpi=TcpCommunicationSpi [connectGate=null, connPlc=null, > enableForcibleNodeKill=false, enableTroubleshootingLog=false, > srvLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2@10d68fcd, > locAddr=127.0.0.1, locHost=null, locPort=47100, locPortRange=100, > shmemPort=-1, directBuf=true, directSndBuf=false, idleConnTimeout=3, > connTimeout=5000, maxConnTimeout=60, reconCnt=10, sockSndBuf=32768, > sockRcvBuf=32768, msgQueueLimit=1024, slowClientQueueLimit=0, nioSrvr=null, > shmemSrv=null, usePairedConnections=false, connectionsPerNode=1, > tcpNoDelay=true, filterReachableAddresses=false, ackSndThreshold=16, > unackedMsgsBufSize=0, sockWriteTimeout=2000, lsnr=null, boundTcpPort=-1, > boundTcpShmemPort=-1, selectorsCnt=4, selectorSpins=0, addrRslvr=null, > ctxInitLatch=java.util.concurrent.CountDownLatch@117e949d[Count = 1], > stopping=false, > metricsLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationMetricsListener@6db9f5a4], > evtSpi=org.apache.ignite.spi.eventstorage.NoopEventStorageSpi@5f8edcc5, > colSpi=NoopCollisionSpi [], deploySpi=LocalDeploymentSpi [lsnr=null], > indexingSpi=org.apache.ignite.spi.indexing.noop.NoopIndexingSpi@7a675056, > addrRslvr=null, clientMode=false, rebalanceThreadPoolSize=1, > txCfg=org.apache.ignite.configuration.TransactionConfiguration@d21a74c, > cacheSanityCheckEnabled=true, discoStartupDelay=6, deployMode=SHARED, > p2pMissedCacheSize=100, locHost=null, timeSrvPortBase=31100, > timeSrvPortRange=100, failureDetectionTimeout=1, > clientFailureDetectionTimeout=3, metricsLogFreq=1, hadoopCfg=null, > connectorCfg=org.apache.ignite.configuration.ConnectorConfiguration@6e509ffa, > odbcCfg=null, warmupClos=null, atomicCfg=AtomicConfiguration > [seqReserveSize=1000, cacheMode=PARTITIONED, backups=1, aff=null, > grpName=null], classLdr=null, sslCtxFactory=null, > platformCfg=Platform
Re: GridProcessorAdapter fails to start due to failure to initialise WAL segment on Ignite startup
Hello! "WAL segment size change is not supported" Is there a chance that you have changed WAL segment size setting between launches? Regards, -- Ilya Kasnacheev чт, 29 нояб. 2018 г. в 02:39, Raymond Wilson : > I'm using Ignite 2.6 with the C# client. > > I have a running cluster that I was debugging. All requests were read only > (there were no state mutating operations running in the cluster. > > I terminated the one server node in the grid (running in the debugger) to > make a small code change and re-run it (I do this frequently). The node may > have been stopped for longer than the partitioning timeout. > > On re-running the server node it failed to start. On re-running the > complete cluster it still failed to start, and all other nodes report > failure to connect to a inactive grid. > > Looking at the log for the server node that is failing I get the following > log showing an exception while initializing a WAL segment. This failure > seems permanent and is unexpected as we are using the strict WAL atomicity > mode (WalMode.Fsync) for all persisted regions.Is this a recoverable error, > or does this imply data loss? [NB: This is a dev system so no prod data is > affected]] > > > 2018-11-29 12:26:09,933 [1] INFO ImmutableCacheComputeServer >>> > __ >>> / _/ ___/ |/ / _/_ __/ __/>>> > _/ // (7 7// / / / / _/ >>> /___/\___/_/|_/___/ /_/ /___/ > >>> >>> ver. 2.6.0#20180710-sha1:669feacc >>> 2018 Copyright(C) Apache > Software Foundation >>> >>> Ignite documentation: > http://ignite.apache.org > 2018-11-29 12:26:09,933 [1] INFO ImmutableCacheComputeServer Config URL: > n/a > 2018-11-29 12:26:09,948 [1] INFO ImmutableCacheComputeServer > IgniteConfiguration [igniteInstanceName=TRex-Immutable, pubPoolSize=50, > svcPoolSize=12, callbackPoolSize=12, stripedPoolSize=12, sysPoolSize=12, > mgmtPoolSize=4, igfsPoolSize=12, dataStreamerPoolSize=12, > utilityCachePoolSize=12, utilityCacheKeepAliveTime=6, p2pPoolSize=2, > qryPoolSize=12, igniteHome=null, > igniteWorkDir=C:\Users\rwilson\AppData\Local\Temp\TRexIgniteData\Immutable, > mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer@6e4784bc, > nodeId=8f32d0a6-539c-40dd-bc42-d044f28bac73, > marsh=org.apache.ignite.internal.binary.BinaryMarshaller@e4487af, > marshLocJobs=false, daemon=false, p2pEnabled=false, netTimeout=5000, > sndRetryDelay=1000, sndRetryCnt=3, metricsHistSize=1, > metricsUpdateFreq=2000, metricsExpTime=9223372036854775807, > discoSpi=TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, > ackTimeout=5000, marsh=null, reconCnt=10, reconDelay=2000, > maxAckTimeout=60, forceSrvMode=false, clientReconnectDisabled=false, > internalLsnr=null], segPlc=STOP, segResolveAttempts=2, > waitForSegOnStart=true, allResolversPassReq=true, segChkFreq=1, > commSpi=TcpCommunicationSpi [connectGate=null, connPlc=null, > enableForcibleNodeKill=false, enableTroubleshootingLog=false, > srvLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2@10d68fcd, > locAddr=127.0.0.1, locHost=null, locPort=47100, locPortRange=100, > shmemPort=-1, directBuf=true, directSndBuf=false, idleConnTimeout=3, > connTimeout=5000, maxConnTimeout=60, reconCnt=10, sockSndBuf=32768, > sockRcvBuf=32768, msgQueueLimit=1024, slowClientQueueLimit=0, nioSrvr=null, > shmemSrv=null, usePairedConnections=false, connectionsPerNode=1, > tcpNoDelay=true, filterReachableAddresses=false, ackSndThreshold=16, > unackedMsgsBufSize=0, sockWriteTimeout=2000, lsnr=null, boundTcpPort=-1, > boundTcpShmemPort=-1, selectorsCnt=4, selectorSpins=0, addrRslvr=null, > ctxInitLatch=java.util.concurrent.CountDownLatch@117e949d[Count = 1], > stopping=false, > metricsLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationMetricsListener@6db9f5a4], > evtSpi=org.apache.ignite.spi.eventstorage.NoopEventStorageSpi@5f8edcc5, > colSpi=NoopCollisionSpi [], deploySpi=LocalDeploymentSpi [lsnr=null], > indexingSpi=org.apache.ignite.spi.indexing.noop.NoopIndexingSpi@7a675056, > addrRslvr=null, clientMode=false, rebalanceThreadPoolSize=1, > txCfg=org.apache.ignite.configuration.TransactionConfiguration@d21a74c, > cacheSanityCheckEnabled=true, discoStartupDelay=6, deployMode=SHARED, > p2pMissedCacheSize=100, locHost=null, timeSrvPortBase=31100, > timeSrvPortRange=100, failureDetectionTimeout=1, > clientFailureDetectionTimeout=3, metricsLogFreq=1, hadoopCfg=null, > connectorCfg=org.apache.ignite.configuration.ConnectorConfiguration@6e509ffa, > odbcCfg=null, warmupClos=null, atomicCfg=AtomicConfiguration > [seqReserveSize=1000, cacheMode=PARTITIONED, backups=1, aff=null, > grpName=null], classLdr=null, sslCtxFactory=null, > platformCfg=PlatformDotNetConfiguration [binaryCfg=null], > binaryCfg=BinaryConfiguration [idMapper=null, nameMapper=null, > serializer=null, compactFooter=true], memCfg=null, pstCfg=null, > dsCfg=DataStorageConfiguration [sysRegionInitSize=41943040, > sysCacheMaxSize=104857600, pageSize=16384, concLvl=
GridProcessorAdapter fails to start due to failure to initialise WAL segment on Ignite startup
I'm using Ignite 2.6 with the C# client. I have a running cluster that I was debugging. All requests were read only (there were no state mutating operations running in the cluster. I terminated the one server node in the grid (running in the debugger) to make a small code change and re-run it (I do this frequently). The node may have been stopped for longer than the partitioning timeout. On re-running the server node it failed to start. On re-running the complete cluster it still failed to start, and all other nodes report failure to connect to a inactive grid. Looking at the log for the server node that is failing I get the following log showing an exception while initializing a WAL segment. This failure seems permanent and is unexpected as we are using the strict WAL atomicity mode (WalMode.Fsync) for all persisted regions.Is this a recoverable error, or does this imply data loss? [NB: This is a dev system so no prod data is affected]] 2018-11-29 12:26:09,933 [1] INFO ImmutableCacheComputeServer >>> __ >>> / _/ ___/ |/ / _/_ __/ __/>>> _/ // (7 7// / / / / _/ >>> /___/\___/_/|_/___/ /_/ /___/ >>> >>> ver. 2.6.0#20180710-sha1:669feacc >>> 2018 Copyright(C) Apache Software Foundation >>> >>> Ignite documentation: http://ignite.apache.org 2018-11-29 12:26:09,933 [1] INFO ImmutableCacheComputeServer Config URL: n/a 2018-11-29 12:26:09,948 [1] INFO ImmutableCacheComputeServer IgniteConfiguration [igniteInstanceName=TRex-Immutable, pubPoolSize=50, svcPoolSize=12, callbackPoolSize=12, stripedPoolSize=12, sysPoolSize=12, mgmtPoolSize=4, igfsPoolSize=12, dataStreamerPoolSize=12, utilityCachePoolSize=12, utilityCacheKeepAliveTime=6, p2pPoolSize=2, qryPoolSize=12, igniteHome=null, igniteWorkDir=C:\Users\rwilson\AppData\Local\Temp\TRexIgniteData\Immutable, mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer@6e4784bc, nodeId=8f32d0a6-539c-40dd-bc42-d044f28bac73, marsh=org.apache.ignite.internal.binary.BinaryMarshaller@e4487af, marshLocJobs=false, daemon=false, p2pEnabled=false, netTimeout=5000, sndRetryDelay=1000, sndRetryCnt=3, metricsHistSize=1, metricsUpdateFreq=2000, metricsExpTime=9223372036854775807, discoSpi=TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, ackTimeout=5000, marsh=null, reconCnt=10, reconDelay=2000, maxAckTimeout=60, forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null], segPlc=STOP, segResolveAttempts=2, waitForSegOnStart=true, allResolversPassReq=true, segChkFreq=1, commSpi=TcpCommunicationSpi [connectGate=null, connPlc=null, enableForcibleNodeKill=false, enableTroubleshootingLog=false, srvLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2@10d68fcd, locAddr=127.0.0.1, locHost=null, locPort=47100, locPortRange=100, shmemPort=-1, directBuf=true, directSndBuf=false, idleConnTimeout=3, connTimeout=5000, maxConnTimeout=60, reconCnt=10, sockSndBuf=32768, sockRcvBuf=32768, msgQueueLimit=1024, slowClientQueueLimit=0, nioSrvr=null, shmemSrv=null, usePairedConnections=false, connectionsPerNode=1, tcpNoDelay=true, filterReachableAddresses=false, ackSndThreshold=16, unackedMsgsBufSize=0, sockWriteTimeout=2000, lsnr=null, boundTcpPort=-1, boundTcpShmemPort=-1, selectorsCnt=4, selectorSpins=0, addrRslvr=null, ctxInitLatch=java.util.concurrent.CountDownLatch@117e949d[Count = 1], stopping=false, metricsLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationMetricsListener@6db9f5a4], evtSpi=org.apache.ignite.spi.eventstorage.NoopEventStorageSpi@5f8edcc5, colSpi=NoopCollisionSpi [], deploySpi=LocalDeploymentSpi [lsnr=null], indexingSpi=org.apache.ignite.spi.indexing.noop.NoopIndexingSpi@7a675056, addrRslvr=null, clientMode=false, rebalanceThreadPoolSize=1, txCfg=org.apache.ignite.configuration.TransactionConfiguration@d21a74c, cacheSanityCheckEnabled=true, discoStartupDelay=6, deployMode=SHARED, p2pMissedCacheSize=100, locHost=null, timeSrvPortBase=31100, timeSrvPortRange=100, failureDetectionTimeout=1, clientFailureDetectionTimeout=3, metricsLogFreq=1, hadoopCfg=null, connectorCfg=org.apache.ignite.configuration.ConnectorConfiguration@6e509ffa, odbcCfg=null, warmupClos=null, atomicCfg=AtomicConfiguration [seqReserveSize=1000, cacheMode=PARTITIONED, backups=1, aff=null, grpName=null], classLdr=null, sslCtxFactory=null, platformCfg=PlatformDotNetConfiguration [binaryCfg=null], binaryCfg=BinaryConfiguration [idMapper=null, nameMapper=null, serializer=null, compactFooter=true], memCfg=null, pstCfg=null, dsCfg=DataStorageConfiguration [sysRegionInitSize=41943040, sysCacheMaxSize=104857600, pageSize=16384, concLvl=0, dfltDataRegConf=DataRegionConfiguration [name=Default-Immutable, maxSize=1073741824, initSize=134217728, swapPath=null, pageEvictionMode=DISABLED, evictionThreshold=0.9, emptyPagesPoolSize=100, metricsEnabled=false, metricsSubIntervalCount=5, metricsRateTimeInterval=6, persistenceEnabled=true, checkpointPageBufSize=0], storagePath=/persist\TRexIgniteData\Imm