[jira] [Commented] (CASSANDRA-16047) Potential race condition in creating hard link when incremental backup is turned on

Brandon Williams (Jira) Thu, 13 Aug 2020 09:22:21 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-16047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17177127#comment-17177127
 ]


Brandon Williams commented on CASSANDRA-16047:
----------------------------------------------

bq. so it is hard to think the culprit being in the OS and IO layer.

If you had said local disks, I would agree.

> Potential race condition in creating hard link when incremental backup is 
> turned on
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-16047
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16047
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/SSTable
>            Reporter: Wei Deng
>            Priority: Urgent
>         Attachments: incremental_backup_hardlink_exception.jpg, 
> incremental_backup_hardlink_exception1.jpg
>
>
> It seems that there is a race condition in creating hard link if incremental 
> backup is turned on.
> The following screenshot was captured in a production cluster running 
> Cassandra 3.0.15 after turning on incremental backup. When this 
> {{NoSuchFileException}} happens, due to the {{FSWriteError}} and the default 
> disk failure policy, the JVM will be shutdown, so it's a pretty critical bug.
>  !incremental_backup_hardlink_exception.jpg!
> Due to the risk of causing production database downtime (if similar issue 
> happens on multiple nodes in a short time frame), and same exception causing 
> JVM shutdown multiple times already, incremental backup had to be turned off 
> for now, but this is not an ideal situation.
> !incremental_backup_hardlink_exception1.jpg!
> The deployment is on a public cloud environment with EBS-like disks that are 
> backed by SSD with decent latency, throughput and IOPS, so it is hard to 
> think the culprit being in the OS and IO layer. Based on the second 
> screenshot above, this is a low flush traffic {{system.size_estimates}} 
> table, so compaction of the source SSTable doesn't seem to be at play here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16047) Potential race condition in creating hard link when incremental backup is turned on

Reply via email to