[ https://issues.apache.org/jira/browse/CASSANDRA-16047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17177122#comment-17177122 ]
Wei Deng edited comment on CASSANDRA-16047 at 8/13/20, 4:10 PM: ---------------------------------------------------------------- The version was mentioned in the description: 3.0.15. The deployment is on a public cloud environment with EBS-like disks that are backed by SSD with decent latency, throughput and IOPS, so it is hard to think the culprit being in the OS and IO layer. was (Author: weideng): The version was mentioned in the description: 3.0.15. > Potential race condition in creating hard link when incremental backup is > turned on > ----------------------------------------------------------------------------------- > > Key: CASSANDRA-16047 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16047 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable > Reporter: Wei Deng > Priority: Urgent > Attachments: incremental_backup_hardlink_exception.jpg, > incremental_backup_hardlink_exception1.jpg > > > It seems that there is a race condition in creating hard link if incremental > backup is turned on. > The following screenshot was captured in a production cluster running > Cassandra 3.0.15 after turning on incremental backup. When this > {{NoSuchFileException}} happens, due to the {{FSWriteError}} and the default > disk failure policy, the JVM will be shutdown, so it's a pretty critical bug. > !incremental_backup_hardlink_exception.jpg! > Due to the risk of causing production database downtime (if similar issue > happens on multiple nodes in a short time frame), and same exception causing > JVM shutdown multiple times already, incremental backup had to be turned off > for now, but this is not an ideal situation. > !incremental_backup_hardlink_exception1.jpg! > The deployment is on a public cloud environment with EBS-like disks that are > backed by SSD with decent latency, throughput and IOPS, so it is hard to > think the culprit being in the OS and IO layer. Based on the second > screenshot above, this is a low flush traffic {{system.size_estimates}} > table, so compaction of the source SSTable doesn't seem to be at play here. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org