Samuel Klock created CASSANDRA-14941:
----------------------------------------

             Summary: Expired secondary index sstables are not promptly 
discarded under TWCS
                 Key: CASSANDRA-14941
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14941
             Project: Cassandra
          Issue Type: Bug
          Components: Secondary Indexes
            Reporter: Samuel Klock


We have a table in a cluster running 3.0.17 storing roughly time-series data 
using TWCS with a secondary index. We've noticed that while expired sstables 
for the table are discarded mostly when we expect them to be, the expired 
sstables for the secondary index would linger for weeks longer than expected – 
essentially indefinitely. Eventually the sstables would fill disks, which would 
require manual steps (deleting ancient index sstables) to address. We verified 
with {{sstableexpiredblockers}} that there wasn't anything on disk blocking the 
expired sstables from being dropped, so this looks like a bug.

Through some debugging, we traced the problem to the index's memtables, which 
were consistently (except _just_ after node restarts) reporting a minimum 
timestamp from September 2015 – much older than any of our live data – which 
causes {{CompactionController.getFullyExpiredSSTables()}} to consistently 
return an empty set. The reason that the index sstables report this minimum 
timestamp is because of how index updates are created, using 
{{PartitionUpdate.singleRowUpdate()}}:
{code:java}
    public static PartitionUpdate singleRowUpdate(CFMetaData metadata, 
DecoratedKey key, Row row, Row staticRow)
    {
        MutableDeletionInfo deletionInfo = MutableDeletionInfo.live();
        Holder holder = new Holder(
            new PartitionColumns(
                staticRow == null ? Columns.NONE : 
Columns.from(staticRow.columns()),
                row == null ? Columns.NONE : Columns.from(row.columns())
            ),
            row == null ? BTree.empty() : BTree.singleton(row),
            deletionInfo,
            staticRow == null ? Rows.EMPTY_STATIC_ROW : staticRow,
            EncodingStats.NO_STATS
        );
        return new PartitionUpdate(metadata, key, holder, deletionInfo, false);
    }
{code}
The use of {{EncodingStats.NO_STATS}} makes it appear as though the earliest 
timestamp in the resulting {{PartitionUpdate}} is from September 2015. That 
timestamp becomes the minimum for the memtable.

Modifying this version of {{PartitionUpdate.singleRowUpdate()}} to:
{code:java}
    public static PartitionUpdate singleRowUpdate(CFMetaData metadata, 
DecoratedKey key, Row row, Row staticRow)
    {
        MutableDeletionInfo deletionInfo = MutableDeletionInfo.live();
        staticRow = (staticRow == null ? Rows.EMPTY_STATIC_ROW : staticRow);
        EncodingStats stats = EncodingStats.Collector.collect(staticRow,
                                                              (row == null ?
                                                               
Collections.emptyIterator() :
                                                               
Iterators.singletonIterator(row)),
                                                              deletionInfo);
        Holder holder = new Holder(
            new PartitionColumns(
                staticRow == Rows.EMPTY_STATIC_ROW ? Columns.NONE : 
Columns.from(staticRow.columns()),
                row == null ? Columns.NONE : Columns.from(row.columns())
            ),
            row == null ? BTree.empty() : BTree.singleton(row),
            deletionInfo,
            staticRow,
            stats
        );
        return new PartitionUpdate(metadata, key, holder, deletionInfo, false);
    }
{code}
(i.e., computing an {{EncodingStats}} from the contents of the update) seems to 
fix the problem. However, we're not certain whether A) there's a functional 
reason the method was using {{EncodingStats.NO_STATS}} previously or B) whether 
the {{EncodingStats}} the revised version creates is correct (in particular, 
the use of {{deletionInfo}} feels a little suspect). We're also not sure 
whether there's a more appropriate fix (e.g., changing how the memtables 
compute the minimum timestamp, particularly in the {{NO_STATS}} case).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to