[ 
https://issues.apache.org/jira/browse/IGNITE-16541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Shishkov updated IGNITE-16541:
-----------------------------------
    Description: 
Currently, if CDC is turned on and {{cdcWalPath}} and {{walArchivePath}} point 
to different file system partitions, server node fails on a first attempt to 
archive segment, but not during a startup.

Because cluster may be under load, in order to prevent data loss or corruption 
we should implement fail-fast approach for this case.  Server node should check 
{{cdcWalPath}} and {{walArchivePath}} during the startup, and prevent further 
starting if they point to different file system partitions.

{code}
[ERROR]wal-file-archiver%null-#108[] Critical system error detected. Will be 
handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler 
[tryStop=false, timeout=0, super=AbstractFailureHandler 
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
[type=SYSTEM_WORKER_TERMINATION, err=class 
o.a.i.i.processors.cache.persistence.StorageException: Failed to archive WAL 
segment [srcFile=/ignite/work/db/wal/consistent_Id/0000000000000000.wal, 
dstFile=/ignite/work/db/wal/archive/consistent_Id/0000000000000000.wal.tmp]]]
org.apache.ignite.internal.processors.cache.persistence.StorageException: 
Failed to archive WAL segment 
[srcFile=/ignite/work/db/wal/consistent_Id/0000000000000000.wal, 
dstFile=/ignite/work/db/wal/archive/consistent_Id/0000000000000000.wal.tmp]
            at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.archiveSegment(FileWriteAheadLogManager.java:2074)
            at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.body(FileWriteAheadLogManager.java:1934)
            at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
            at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.nio.file.FileSystemException: 
/ignite/data/work/db/wal/cdc/consistent_Id/0000000000000000.wal -> 
/ignite/work/db/wal/archive/consistent_Id/0000000000000000.wal: Invalid 
cross-device link
            at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:100)
            at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
            at 
sun.nio.fs.UnixFileSystemProvider.createLink(UnixFileSystemProvider.java:481)
            at java.nio.file.Files.createLink(Files.java:1102)
            at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.archiveSegment(FileWriteAheadLogManager.java:2058)
            ... 3 more
{code}

  was:
Currently, if CDC is turned on and {{cdcWalPath}} and {{walArchivePath}} point 
to different file system partitions, server node fails on a first attempt to 
archive segment, but not during a startup.

Because cluster may be under load, in order to prevent data loss or corruption 
we should implement fail-first approach for this case.  Server node should 
check {{cdcWalPath}} and {{walArchivePath}} during the startup, and prevent 
further starting if they point to different file system partitions.

{code}
[ERROR]wal-file-archiver%null-#108[] Critical system error detected. Will be 
handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler 
[tryStop=false, timeout=0, super=AbstractFailureHandler 
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
[type=SYSTEM_WORKER_TERMINATION, err=class 
o.a.i.i.processors.cache.persistence.StorageException: Failed to archive WAL 
segment [srcFile=/ignite/work/db/wal/consistent_Id/0000000000000000.wal, 
dstFile=/ignite/work/db/wal/archive/consistent_Id/0000000000000000.wal.tmp]]]
org.apache.ignite.internal.processors.cache.persistence.StorageException: 
Failed to archive WAL segment 
[srcFile=/ignite/work/db/wal/consistent_Id/0000000000000000.wal, 
dstFile=/ignite/work/db/wal/archive/consistent_Id/0000000000000000.wal.tmp]
            at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.archiveSegment(FileWriteAheadLogManager.java:2074)
            at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.body(FileWriteAheadLogManager.java:1934)
            at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
            at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.nio.file.FileSystemException: 
/ignite/data/work/db/wal/cdc/consistent_Id/0000000000000000.wal -> 
/ignite/work/db/wal/archive/consistent_Id/0000000000000000.wal: Invalid 
cross-device link
            at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:100)
            at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
            at 
sun.nio.fs.UnixFileSystemProvider.createLink(UnixFileSystemProvider.java:481)
            at java.nio.file.Files.createLink(Files.java:1102)
            at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.archiveSegment(FileWriteAheadLogManager.java:2058)
            ... 3 more
{code}


> 'Invalid cross-device link error' if cdcWalPath and walArchivePath point to 
> different file system partitions
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-16541
>                 URL: https://issues.apache.org/jira/browse/IGNITE-16541
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Ilya Shishkov
>            Priority: Minor
>              Labels: IEP-59, ise
>
> Currently, if CDC is turned on and {{cdcWalPath}} and {{walArchivePath}} 
> point to different file system partitions, server node fails on a first 
> attempt to archive segment, but not during a startup.
> Because cluster may be under load, in order to prevent data loss or 
> corruption we should implement fail-fast approach for this case.  Server node 
> should check {{cdcWalPath}} and {{walArchivePath}} during the startup, and 
> prevent further starting if they point to different file system partitions.
> {code}
> [ERROR]wal-file-archiver%null-#108[] Critical system error detected. Will be 
> handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler 
> [tryStop=false, timeout=0, super=AbstractFailureHandler 
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
> [type=SYSTEM_WORKER_TERMINATION, err=class 
> o.a.i.i.processors.cache.persistence.StorageException: Failed to archive WAL 
> segment [srcFile=/ignite/work/db/wal/consistent_Id/0000000000000000.wal, 
> dstFile=/ignite/work/db/wal/archive/consistent_Id/0000000000000000.wal.tmp]]]
> org.apache.ignite.internal.processors.cache.persistence.StorageException: 
> Failed to archive WAL segment 
> [srcFile=/ignite/work/db/wal/consistent_Id/0000000000000000.wal, 
> dstFile=/ignite/work/db/wal/archive/consistent_Id/0000000000000000.wal.tmp]
>             at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.archiveSegment(FileWriteAheadLogManager.java:2074)
>             at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.body(FileWriteAheadLogManager.java:1934)
>             at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
>             at java.lang.Thread.run(Thread.java:829) [?:?]
> Caused by: java.nio.file.FileSystemException: 
> /ignite/data/work/db/wal/cdc/consistent_Id/0000000000000000.wal -> 
> /ignite/work/db/wal/archive/consistent_Id/0000000000000000.wal: Invalid 
> cross-device link
>             at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:100)
>             at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
>             at 
> sun.nio.fs.UnixFileSystemProvider.createLink(UnixFileSystemProvider.java:481)
>             at java.nio.file.Files.createLink(Files.java:1102)
>             at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.archiveSegment(FileWriteAheadLogManager.java:2058)
>             ... 3 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to