[ 
https://issues.apache.org/jira/browse/CASSANDRA-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960321#comment-14960321
 ] 

Stefania commented on CASSANDRA-10538:
--------------------------------------

bq. Yes, it looks like we did. This only matters for abort, since commit we 
want to throw either way - but we expect to do this in the caller 
(LifecycleTransaction, so catching and returning them in both is most suitable.

This was my conclusion as well, thanks for checking.

bq. Once things quiet down we should really try to introduce fault injection 
tests for this subsystem so we can easily cover this kind of scenario.

Yes we definitely need tests with fault injection for this component.

CI:

http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10538-3.0-dtest
http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10538-3.0-testall


> Assertion failed in LogFile when disk is full
> ---------------------------------------------
>
>                 Key: CASSANDRA-10538
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10538
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 3.x
>
>         Attachments: 
> ma_txn_compaction_67311da0-72b4-11e5-9eb9-b14fa4bbe709.log, 
> ma_txn_compaction_696059b0-72b4-11e5-9eb9-b14fa4bbe709.log, 
> ma_txn_compaction_8ac58b70-72b4-11e5-9eb9-b14fa4bbe709.log, 
> ma_txn_compaction_8be24610-72b4-11e5-9eb9-b14fa4bbe709.log, 
> ma_txn_compaction_95500fc0-72b4-11e5-9eb9-b14fa4bbe709.log, 
> ma_txn_compaction_a41caa90-72b4-11e5-9eb9-b14fa4bbe709.log
>
>
> [~carlyeks] was running a stress job which filled up the disk. At the end of 
> the system logs there are several assertion errors:
> {code}
> ERROR [CompactionExecutor:1] 2015-10-14 20:46:55,467 CassandraDaemon.java:195 
> - Exception in thread Thread[CompactionExecutor:1,1,main]
> java.lang.RuntimeException: Insufficient disk space to write 2097152 bytes
>         at 
> org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.getWriteDirectory(CompactionAwareWriter.java:156)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.compaction.writers.MaxSSTableSizeWriter.realAppend(MaxSSTableSizeWriter.java:77)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:110)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:182)
>  ~[main/:na]
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[main/:na]
>         at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:78)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:220)
>  ~[main/:na]
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_40]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_40]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_40]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_40]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
> INFO  [IndexSummaryManager:1] 2015-10-14 21:10:40,099 
> IndexSummaryManager.java:257 - Redistributing index summaries
> ERROR [IndexSummaryManager:1] 2015-10-14 21:10:42,275 
> CassandraDaemon.java:195 - Exception in thread 
> Thread[IndexSummaryManager:1,1,main]
> java.lang.AssertionError: Already completed!
>         at org.apache.cassandra.db.lifecycle.LogFile.abort(LogFile.java:221) 
> ~[main/:na]
>         at 
> org.apache.cassandra.db.lifecycle.LogTransaction.doAbort(LogTransaction.java:376)
>  ~[main/:na]
>         at 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:144)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.doAbort(LifecycleTransaction.java:259)
>  ~[main/:na]
>         at 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:144)
>  ~[main/:na]
>         at 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:193)
>  ~[main/:na]
>         at 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.close(Transactional.java:158)
>  ~[main/:na]
>         at 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(IndexSummaryManager.java:242)
>  ~[main/:na]
>         at 
> org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow(IndexSummaryManager.java:134)
>  ~[main/:na]
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[main/:na]
>         at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolE
> {code}
> We should not have an assertion if it can happen when the disk is full, we 
> should rather have a runtime exception.
> I also would like to understand exactly what triggered the assertion. 
> {{LifecycleTransaction}} can throw at the beginning of the commit method if 
> it cannot write the record to disk, in which case all we have to do is ensure 
> we update the records in memory after writing to disk (currently we update 
> them before). However, I am not sure this is what happened here, it looks 
> more like abort was called twice, which should never happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to