[ 
https://issues.apache.org/jira/browse/CASSANDRA-8399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14262498#comment-14262498
 ] 

Joshua McKenzie commented on CASSANDRA-8399:
--------------------------------------------

While I agree that the Right Thing here seems to be to protect the entire 
compaction operation by holding a reference, I'm not sure that the 2.X line is 
appropriate for this change at the DataTracker level.  While acquiring and 
releasing within a single SSTableScanner is a cleanly tied together RAII 
operation that should be an "invisible" change from a logical flow / API 
perspective, pushing that operation into markCompacting and unmarkCompacting 
means we have over 10 upstream users of those methods that are having an 
assumption (and contract) changed on them - namely, that if they fail to 
acquire references on the SSTables in question markCompacting will return 
false.  Correct me if I'm wrong on that - if there's some other more 
appropriate place to make this change than in the DataTracker (haven't worked 
much in this section of the code-base).

A naive change in DataTracker.markCompacting leads to infinite loops (it looks 
like from multiple insertion points) so we'd need to go upstream and fiddle 
with the various marking operations in order to accommodate entries in the 
SSTableReader collections being "unmarkable".  My preference here would be to 
go with _v2 which resolves the ordering problems introduced in CASSANDRA-7932 
without introducing a ref count on the read path and create a separate ticket 
for 3.0 to pursue the more invasive change of reference counting all compacting 
sstables.

As you've mentioned several times, reference counting is tricky to get right.  
The idea of promoting it up to the abstraction of the data tracker for 
compaction marking strikes me as a risky change when we already have quite a 
few failing unit tests on 2.X and bugs to resolve.  I definitely think it's the 
right thing long-term.

> Reference Counter exception when dropping user type
> ---------------------------------------------------
>
>                 Key: CASSANDRA-8399
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8399
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Philip Thompson
>            Assignee: Joshua McKenzie
>             Fix For: 2.1.3
>
>         Attachments: 8399_fix_empty_results.txt, 8399_v2.txt, node2.log, 
> ubuntu-8399.log
>
>
> When running the dtest 
> {{user_types_test.py:TestUserTypes.test_type_keyspace_permission_isolation}} 
> with the current 2.1-HEAD code, very frequently, but not always, when 
> dropping a type, the following exception is seen:{code}
> ERROR [MigrationStage:1] 2014-12-01 13:54:54,824 CassandraDaemon.java:170 - 
> Exception in thread Thread[MigrationStage:1,5,main]
> java.lang.AssertionError: Reference counter -1 for 
> /var/folders/v3/z4wf_34n1q506_xjdy49gb780000gn/T/dtest-eW2RXj/test/node2/data/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-sche
> ma_keyspaces-ka-14-Data.db
>         at 
> org.apache.cassandra.io.sstable.SSTableReader.releaseReference(SSTableReader.java:1662)
>  ~[main/:na]
>         at 
> org.apache.cassandra.io.sstable.SSTableScanner.close(SSTableScanner.java:164) 
> ~[main/:na]
>         at 
> org.apache.cassandra.utils.MergeIterator.close(MergeIterator.java:62) 
> ~[main/:na]
>         at 
> org.apache.cassandra.db.ColumnFamilyStore$8.close(ColumnFamilyStore.java:1943)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.ColumnFamilyStore.filter(ColumnFamilyStore.java:2116) 
> ~[main/:na]
>         at 
> org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:2029)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1963)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.SystemKeyspace.serializedSchema(SystemKeyspace.java:744)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.SystemKeyspace.serializedSchema(SystemKeyspace.java:731)
>  ~[main/:na]
>         at org.apache.cassandra.config.Schema.updateVersion(Schema.java:374) 
> ~[main/:na]
>         at 
> org.apache.cassandra.config.Schema.updateVersionAndAnnounce(Schema.java:399) 
> ~[main/:na]
>         at 
> org.apache.cassandra.db.DefsTables.mergeSchema(DefsTables.java:167) 
> ~[main/:na]
>         at 
> org.apache.cassandra.db.DefinitionsUpdateVerbHandler$1.runMayThrow(DefinitionsUpdateVerbHandler.java:49)
>  ~[main/:na]
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[main/:na]
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> ~[na:1.7.0_67]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
> ~[na:1.7.0_67]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_67]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_67]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_67]{code}
> Log of the node with the error is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to