[ 
https://issues.apache.org/jira/browse/CASSANDRA-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908395#comment-16908395
 ] 

Marcus Eriksson commented on CASSANDRA-14941:
---------------------------------------------

I'll have a closer look tomorrow or early next week, but I think the problem is 
that we use {{EncodingStats#minTimestamp}} without checking for the epoch here: 
https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/Memtable.java#L256

> Expired secondary index sstables are not promptly discarded under TWCS
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-14941
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14941
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Feature/2i Index
>            Reporter: Samuel Klock
>            Priority: Normal
>
> We have a table in a cluster running 3.0.17 storing roughly time-series data 
> using TWCS with a secondary index. We've noticed that while expired sstables 
> for the table are discarded mostly when we expect them to be, the expired 
> sstables for the secondary index would linger for weeks longer than expected 
> – essentially indefinitely. Eventually the sstables would fill disks, which 
> would require manual steps (deleting ancient index sstables) to address. We 
> verified with {{sstableexpiredblockers}} that there wasn't anything on disk 
> blocking the expired sstables from being dropped, so this looks like a bug.
> Through some debugging, we traced the problem to the index's memtables, which 
> were consistently (except _just_ after node restarts) reporting a minimum 
> timestamp from September 2015 – much older than any of our live data – which 
> causes {{CompactionController.getFullyExpiredSSTables()}} to consistently 
> return an empty set. The reason that the index sstables report this minimum 
> timestamp is because of how index updates are created, using 
> {{PartitionUpdate.singleRowUpdate()}}:
> {code:java}
>     public static PartitionUpdate singleRowUpdate(CFMetaData metadata, 
> DecoratedKey key, Row row, Row staticRow)
>     {
>         MutableDeletionInfo deletionInfo = MutableDeletionInfo.live();
>         Holder holder = new Holder(
>             new PartitionColumns(
>                 staticRow == null ? Columns.NONE : 
> Columns.from(staticRow.columns()),
>                 row == null ? Columns.NONE : Columns.from(row.columns())
>             ),
>             row == null ? BTree.empty() : BTree.singleton(row),
>             deletionInfo,
>             staticRow == null ? Rows.EMPTY_STATIC_ROW : staticRow,
>             EncodingStats.NO_STATS
>         );
>         return new PartitionUpdate(metadata, key, holder, deletionInfo, 
> false);
>     }
> {code}
> The use of {{EncodingStats.NO_STATS}} makes it appear as though the earliest 
> timestamp in the resulting {{PartitionUpdate}} is from September 2015. That 
> timestamp becomes the minimum for the memtable.
> Modifying this version of {{PartitionUpdate.singleRowUpdate()}} to:
> {code:java}
>     public static PartitionUpdate singleRowUpdate(CFMetaData metadata, 
> DecoratedKey key, Row row, Row staticRow)
>     {
>         MutableDeletionInfo deletionInfo = MutableDeletionInfo.live();
>         staticRow = (staticRow == null ? Rows.EMPTY_STATIC_ROW : staticRow);
>         EncodingStats stats = EncodingStats.Collector.collect(staticRow,
>                                                               (row == null ?
>                                                                
> Collections.emptyIterator() :
>                                                                
> Iterators.singletonIterator(row)),
>                                                               deletionInfo);
>         Holder holder = new Holder(
>             new PartitionColumns(
>                 staticRow == Rows.EMPTY_STATIC_ROW ? Columns.NONE : 
> Columns.from(staticRow.columns()),
>                 row == null ? Columns.NONE : Columns.from(row.columns())
>             ),
>             row == null ? BTree.empty() : BTree.singleton(row),
>             deletionInfo,
>             staticRow,
>             stats
>         );
>         return new PartitionUpdate(metadata, key, holder, deletionInfo, 
> false);
>     }
> {code}
> (i.e., computing an {{EncodingStats}} from the contents of the update) seems 
> to fix the problem. However, we're not certain whether A) there's a 
> functional reason the method was using {{EncodingStats.NO_STATS}} previously 
> or B) whether the {{EncodingStats}} the revised version creates is correct 
> (in particular, the use of {{deletionInfo}} feels a little suspect). We're 
> also not sure whether there's a more appropriate fix (e.g., changing how the 
> memtables compute the minimum timestamp, particularly in the {{NO_STATS}} 
> case).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to