[ https://issues.apache.org/jira/browse/CASSANDRA-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027543#comment-13027543 ]
Aaron Morton commented on CASSANDRA-2589: ----------------------------------------- bq. What's supposed to happen is, isRelevant will supress those columns (which may be in an older sstable). We should never require a read (e.g. to load a list of all-columns-deleted), when doing a write. Was only thinking about the columns in the memtable. bq. If you have columns in memory when you do a row deletion, it shouldn't matter whether we write those out or not, as far as correctness is concerned. agree this was more of a performance issue, e.g. write a lot of data and delete it quickly (before memtable flush) using a row delete takes more disk space than deleting by column path. CASSANDRA-2590 is where I noticed it breaking correctness. > row deletes do not remove columns > --------------------------------- > > Key: CASSANDRA-2589 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2589 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 0.7.5, 0.8 beta 1 > Reporter: Aaron Morton > Assignee: Aaron Morton > Priority: Minor > > When a row delete is issued CF.delete() sets the localDeletetionTime and > markedForDeleteAt values but does not remove columns which have a lower time > stamp. As a result: > # Memory which could be freed is held on to (prob not too bad as it's already > counted) > # The deleted columns are serialised to disk, along with the CF info to say > they are no longer valid. > # NamesQueryFilter and SliceQueryFilter have to do more work as they filter > out the irrelevant columns using QueryFilter.isRelevant() > # Also columns written with a lower time stamp after the deletion are added > to the CF without checking markedForDeletionAt. > This can cause RR to fail, will create another ticket for that and link. This > ticket is for a fix to removing the columns. > Two options I could think of: > # Check for deletion when serialising to SSTable and ignore columns if the > have a lower timestamp. Otherwise leave as is so dead columns stay in memory. > # Ensure at all times if the CF is deleted all columns it contains have a > higher timestamp. > ## I *think* this would include all column types (DeletedColumn as well) as > the CF deletion has the same effect. But not sure. > ## Deleting (potentially) all columns in delete() will take time. Could track > the highest timestamp in the CF so the normal case of deleting all cols does > not need to iterate. > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira