[ https://issues.apache.org/jira/browse/IGNITE-16541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ilya Shishkov updated IGNITE-16541: ----------------------------------- Description: Currently, if CDC is turned on and {{cdcWalPath}} and {{walArchivePath}} point to different file system partitions, server node fails on a first attempt to archive segment, but not during a startup. Because cluster may be under load, in order to prevent data loss or corruption we should implement fail-first approach for this case. Server node should check {{cdcWalPath}} and {{walArchivePath}} during the startup, and prevent further starting if they point to different file system partitions. {code} [ERROR]wal-file-archiver%null-#108[] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.i.processors.cache.persistence.StorageException: Failed to archive WAL segment [srcFile=/ignite/work/db/wal/consistent_Id/0000000000000000.wal, dstFile=/ignite/work/db/wal/archive/consistent_Id/0000000000000000.wal.tmp]]] org.apache.ignite.internal.processors.cache.persistence.StorageException: Failed to archive WAL segment [srcFile=/ignite/work/db/wal/consistent_Id/0000000000000000.wal, dstFile=/ignite/work/db/wal/archive/consistent_Id/0000000000000000.wal.tmp] at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.archiveSegment(FileWriteAheadLogManager.java:2074) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.body(FileWriteAheadLogManager.java:1934) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) at java.lang.Thread.run(Thread.java:829) [?:?] Caused by: java.nio.file.FileSystemException: /ignite/data/work/db/wal/cdc/consistent_Id/0000000000000000.wal -> /ignite/work/db/wal/archive/consistent_Id/0000000000000000.wal: Invalid cross-device link at sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) at sun.nio.fs.UnixFileSystemProvider.createLink(UnixFileSystemProvider.java:481) at java.nio.file.Files.createLink(Files.java:1102) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.archiveSegment(FileWriteAheadLogManager.java:2058) ... 3 more {code} was: Currently, if CDC is turned on and {{cdcWalPath}} and {{walArchivePath}} point to a different file system partitions, server node fails on a first attempt to archive segment, but not during a startup. Because cluster may be under load, in order to prevent data loss or corruption we should implement fail-first approach for this case. Server node should check {{cdcWalPath}} and {{walArchivePath}} during the startup, and prevent further starting if they point to different file system partitions. {code} [ERROR]wal-file-archiver%null-#108[] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.i.processors.cache.persistence.StorageException: Failed to archive WAL segment [srcFile=/ignite/work/db/wal/consistent_Id/0000000000000000.wal, dstFile=/ignite/work/db/wal/archive/consistent_Id/0000000000000000.wal.tmp]]] org.apache.ignite.internal.processors.cache.persistence.StorageException: Failed to archive WAL segment [srcFile=/ignite/work/db/wal/consistent_Id/0000000000000000.wal, dstFile=/ignite/work/db/wal/archive/consistent_Id/0000000000000000.wal.tmp] at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.archiveSegment(FileWriteAheadLogManager.java:2074) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.body(FileWriteAheadLogManager.java:1934) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) at java.lang.Thread.run(Thread.java:829) [?:?] Caused by: java.nio.file.FileSystemException: /ignite/data/work/db/wal/cdc/consistent_Id/0000000000000000.wal -> /ignite/work/db/wal/archive/consistent_Id/0000000000000000.wal: Invalid cross-device link at sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) at sun.nio.fs.UnixFileSystemProvider.createLink(UnixFileSystemProvider.java:481) at java.nio.file.Files.createLink(Files.java:1102) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.archiveSegment(FileWriteAheadLogManager.java:2058) ... 3 more {code} > 'Invalid cross-device link error' if cdcWalPath and walArchivePath point to > different file system partitions > ------------------------------------------------------------------------------------------------------------ > > Key: IGNITE-16541 > URL: https://issues.apache.org/jira/browse/IGNITE-16541 > Project: Ignite > Issue Type: Improvement > Reporter: Ilya Shishkov > Priority: Major > Labels: IEP-59 > > Currently, if CDC is turned on and {{cdcWalPath}} and {{walArchivePath}} > point to different file system partitions, server node fails on a first > attempt to archive segment, but not during a startup. > Because cluster may be under load, in order to prevent data loss or > corruption we should implement fail-first approach for this case. Server > node should check {{cdcWalPath}} and {{walArchivePath}} during the startup, > and prevent further starting if they point to different file system > partitions. > {code} > [ERROR]wal-file-archiver%null-#108[] Critical system error detected. Will be > handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler > [tryStop=false, timeout=0, super=AbstractFailureHandler > [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, > SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext > [type=SYSTEM_WORKER_TERMINATION, err=class > o.a.i.i.processors.cache.persistence.StorageException: Failed to archive WAL > segment [srcFile=/ignite/work/db/wal/consistent_Id/0000000000000000.wal, > dstFile=/ignite/work/db/wal/archive/consistent_Id/0000000000000000.wal.tmp]]] > org.apache.ignite.internal.processors.cache.persistence.StorageException: > Failed to archive WAL segment > [srcFile=/ignite/work/db/wal/consistent_Id/0000000000000000.wal, > dstFile=/ignite/work/db/wal/archive/consistent_Id/0000000000000000.wal.tmp] > at > org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.archiveSegment(FileWriteAheadLogManager.java:2074) > at > org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.body(FileWriteAheadLogManager.java:1934) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) > at java.lang.Thread.run(Thread.java:829) [?:?] > Caused by: java.nio.file.FileSystemException: > /ignite/data/work/db/wal/cdc/consistent_Id/0000000000000000.wal -> > /ignite/work/db/wal/archive/consistent_Id/0000000000000000.wal: Invalid > cross-device link > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) > at > sun.nio.fs.UnixFileSystemProvider.createLink(UnixFileSystemProvider.java:481) > at java.nio.file.Files.createLink(Files.java:1102) > at > org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.archiveSegment(FileWriteAheadLogManager.java:2058) > ... 3 more > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)