[jira] [Commented] (CASSANDRA-8499) Ensure SSTableWriter cleans up properly after failure
[ https://issues.apache.org/jira/browse/CASSANDRA-8499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14382492#comment-14382492 ] Benedict commented on CASSANDRA-8499: - Affected actions are: truncate, major compaction, cleanup, scrub, upgrade. Repair looks to be fine. > Ensure SSTableWriter cleans up properly after failure > - > > Key: CASSANDRA-8499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8499 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Benedict >Assignee: Benedict > Fix For: 2.0.12, 2.1.3 > > Attachments: 8499-20.txt, 8499-20v2, 8499-21.txt, 8499-21v2, 8499-21v3 > > > In 2.0 we do not free a bloom filter, in 2.1 we do not free a small piece of > offheap memory for writing compression metadata. In both we attempt to flush > the BF despite having encountered an exception, making the exception slow to > propagate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8499) Ensure SSTableWriter cleans up properly after failure
[ https://issues.apache.org/jira/browse/CASSANDRA-8499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14382090#comment-14382090 ] Nick Bailey commented on CASSANDRA-8499: So would this affect snapshot repairs? Potentially causing an eventual OOM after continually doing snapshot repairs on the cluster? > Ensure SSTableWriter cleans up properly after failure > - > > Key: CASSANDRA-8499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8499 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Benedict >Assignee: Benedict > Fix For: 2.0.12, 2.1.3 > > Attachments: 8499-20.txt, 8499-20v2, 8499-21.txt, 8499-21v2, 8499-21v3 > > > In 2.0 we do not free a bloom filter, in 2.1 we do not free a small piece of > offheap memory for writing compression metadata. In both we attempt to flush > the BF despite having encountered an exception, making the exception slow to > propagate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8499) Ensure SSTableWriter cleans up properly after failure
[ https://issues.apache.org/jira/browse/CASSANDRA-8499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1427#comment-1427 ] Marcus Eriksson commented on CASSANDRA-8499: +1 (remove the unused "boolean closed" in SequentialWriter on commit) > Ensure SSTableWriter cleans up properly after failure > - > > Key: CASSANDRA-8499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8499 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Benedict >Assignee: Benedict > Fix For: 2.0.12 > > Attachments: 8499-20.txt, 8499-20v2, 8499-21.txt, 8499-21v2, 8499-21v3 > > > In 2.0 we do not free a bloom filter, in 2.1 we do not free a small piece of > offheap memory for writing compression metadata. In both we attempt to flush > the BF despite having encountered an exception, making the exception slow to > propagate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8499) Ensure SSTableWriter cleans up properly after failure
[ https://issues.apache.org/jira/browse/CASSANDRA-8499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266284#comment-14266284 ] Marcus Eriksson commented on CASSANDRA-8499: 2.1 v2 seems to double-close the files, first when we switch the writer, then when we call abort(), running SSTableRewriterTest.testNumberOfFiles_abort() outputs this: WARN 15:43:17 close(81) failed, errno (9). > Ensure SSTableWriter cleans up properly after failure > - > > Key: CASSANDRA-8499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8499 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Benedict >Assignee: Benedict > Fix For: 2.0.12 > > Attachments: 8499-20.txt, 8499-20v2, 8499-21.txt, 8499-21v2 > > > In 2.0 we do not free a bloom filter, in 2.1 we do not free a small piece of > offheap memory for writing compression metadata. In both we attempt to flush > the BF despite having encountered an exception, making the exception slow to > propagate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8499) Ensure SSTableWriter cleans up properly after failure
[ https://issues.apache.org/jira/browse/CASSANDRA-8499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14265086#comment-14265086 ] Benedict commented on CASSANDRA-8499: - Good point. I've updated both versions to suppress warnings and ensure all abortion is completed regardless of any exception throwing. I think it makes most sense for both versions, since we only abort in the face of an error. > Ensure SSTableWriter cleans up properly after failure > - > > Key: CASSANDRA-8499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8499 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Benedict >Assignee: Benedict > Fix For: 2.0.12 > > Attachments: 8499-20.txt, 8499-20v2, 8499-21.txt, 8499-21v2 > > > In 2.0 we do not free a bloom filter, in 2.1 we do not free a small piece of > offheap memory for writing compression metadata. In both we attempt to flush > the BF despite having encountered an exception, making the exception slow to > propagate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8499) Ensure SSTableWriter cleans up properly after failure
[ https://issues.apache.org/jira/browse/CASSANDRA-8499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14255596#comment-14255596 ] Marcus Eriksson commented on CASSANDRA-8499: In general LGTM, but at least for the 2.0 patch I think we should mimic current behavior as much as possible (ie, in SSTableWriter.abort(), we used closeQuietly which only logs an error message if we fail closing, now we throw an FSWriteError). Since we always* propagate the exception that caused abort() to be called, maybe it is better to always just log the exceptions in abort() and let the cause of abort() be thrown all the way out? (*we should propagate the cause in doAntiCompaction as well) > Ensure SSTableWriter cleans up properly after failure > - > > Key: CASSANDRA-8499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8499 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Benedict >Assignee: Benedict > Fix For: 2.0.12 > > Attachments: 8499-20.txt, 8499-21.txt > > > In 2.0 we do not free a bloom filter, in 2.1 we do not free a small piece of > offheap memory for writing compression metadata. In both we attempt to flush > the BF despite having encountered an exception, making the exception slow to > propagate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8499) Ensure SSTableWriter cleans up properly after failure
[ https://issues.apache.org/jira/browse/CASSANDRA-8499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14253683#comment-14253683 ] Benedict commented on CASSANDRA-8499: - Everytime we fail to complete a flush or compaction we will leak the bloom filter data, so it is probably not _that_ uncommon. It's also a pretty small fix. But pretty agnostic about it, really, since clusters seem to have been surviving with it for years. > Ensure SSTableWriter cleans up properly after failure > - > > Key: CASSANDRA-8499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8499 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Benedict >Assignee: Benedict > Fix For: 2.0.12 > > Attachments: 8499-20.txt, 8499-21.txt > > > In 2.0 we do not free a bloom filter, in 2.1 we do not free a small piece of > offheap memory for writing compression metadata. In both we attempt to flush > the BF despite having encountered an exception, making the exception slow to > propagate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)