[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332479#comment-17332479 ]
Benjamin Lerer edited comment on CASSANDRA-16619 at 4/26/21, 3:06 PM: ---------------------------------------------------------------------- As {{4.0-rc1}} has been released we need to provide forward compatibility for data. I do not think it is the case with the current patch. Users using {{4.0-rc1}} will have an {{na}} format *without* hostID whereas when they will deploy {{4.0-rc2}} they will have an {{na}} format *with* hostID which will fail to read the {{na}} SSTables from the {{4.0-rc1}}. To fix that I think that we need to introduce a new {{nb}} version. Regarding the patches should refactor the {{MetadataSerializerTest.testXReadY}} tests to ensure that they test all the possible combinations not only some of them. was (Author: blerer): As {{4.0-rc1}} has been released by consequence we need to provide forward compatibility for data. I do not think it is the case with the current patch. Users using {{4.0-rc1}} will have an {{na}} format *without* hostID whereas when they will deploy {{4.0-rc2}} they will have an {{na}} format *with* hostID which will fail to read the {{na}} SSTables from the {{4.0-rc1}}. To fix that I think that we need to introduce a new {{nb}} version. Regarding the patches should refactor the {{MetadataSerializerTest.testXReadY}} tests to ensure that they test all the possible combinations not only some of them. > Loss of commit log data possible after sstable ingest > ----------------------------------------------------- > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log > Reporter: Jacek Lewandowski > Assignee: Jacek Lewandowski > Priority: Normal > Fix For: 4.0, 3.0.x, 3.11.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org