[ https://issues.apache.org/jira/browse/CASSANDRA-9195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508427#comment-14508427 ]
Jonathan Ellis commented on CASSANDRA-9195: ------------------------------------------- Yes, it sounds to me like the restore is being "too clever." Truncate means "I don't want the data before now anymore." PITR shouldn't truncate. > commitlog replay only actually replays mutation every other time > ---------------------------------------------------------------- > > Key: CASSANDRA-9195 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9195 > Project: Cassandra > Issue Type: Bug > Reporter: Jon Moses > Assignee: Branimir Lambov > Priority: Critical > Fix For: 2.1.5 > > Attachments: loader.py > > > Version: Cassandra 2.1.4.374 | DSE 4.7.0 > The main issue here is that the restore-cycle only replays the mutations > every other try. On the first try, it will restore the snapshot as expected > and the cassandra system load will show that it's reading the mutations, but > they do not actually get replayed, and at the end you're left with only the > snapshot data (2k records). > If you re-run the restore-cycle again, the commitlogs are replayed as > expected, > and the data expected is present in the table (4k records, with a spot check > of > record 4500, as it's in the commitlog but not the snapshot). > Then if you run the cycle again, it will fail. Then again, and it will work. > The work/ > not work pattern continues. Even re-running the commitlog replay a 2nd time, > without > reloading the snapshot doesn't work > The load process is: > * Modify commitlog segment to 1mb > * Archive to directory > * create keyspace/table > * insert base data > * initial snapshot > * write more data > * capture timestamp > * write more data > * final snapshot > * copy commitlogs to 2nd location > * modify cassandra-env to replay only specified keyspace > * modify commitlog properties to restore from 2nd location, with noted > timestamp > The restore cycle is: > * truncate table > * sstableload snapshot > * flush > * output data status > * restart to replay commitlogs > * output data status > ==== > See attached .py for a mostly automated reproduction scenario. It expects > DSE (and I found it with DSE 4.7.0-1), rather than "actual" Cassandra, but > it's not using any DSE specific features. The script looks for the configs > in the DSE locations, but they're set at the top, and there's only 2 places > where dse is restarted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)