[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450366#comment-17450366
 ] 

Stefan Miklosovic edited comment on CASSANDRA-16619 at 11/29/21, 11:13 AM:
---------------------------------------------------------------------------

No, I think I am still right in what I wrote. It breaks point-in-time 
restoration. If you set property "restore_point_in_time" in 
commitlog_archiving.properties, with this patch it is just ignored and all is 
replayed. So you do not have any possibility to replay to some point in time 
for a completely new node. I still consider this to be a regression.

Lets say you have a cluster of 5 nodes you are taking snapshots of regularly as 
well as all commit logs in between and all this stuff is uploaded to some 
backup storage.

Then on restore, you want to regenerate the cluster as it was, point-in-time 
precision. With backup, I have also backed up the information about tokens so I 
set initial_tokens field in yaml etc.

So when I put it all in place, SSTables will match the nodes (because tokens 
have not changed) and I want to replay the commit logs, but since host ids are 
generated and I cant set them up on my own as shown in the other ticket I 
linked, for previous versions, pit restore will stop work propertly even my 
tokens and all that stuff is just completely fine and I know what I do.

This was all possible to do previously. Now I end up with all data being replay 
even I do not want that.


was (Author: smiklosovic):
No, I think I am still right in what I wrote. It breaks point-in-time 
restoration. If you set property "restore_point_in_time" in 
commitlog_archiving.properties, with this patch it is just ignored and all is 
replayed. So you do not have any possibility to replay to some point in time 
for a completely new node. I still consider this to be a regression.

> Loss of commit log data possible after sstable ingest
> -----------------------------------------------------
>
>                 Key: CASSANDRA-16619
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Commit Log
>            Reporter: Jacek Lewandowski
>            Assignee: Jacek Lewandowski
>            Priority: Normal
>             Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to