[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332479#comment-17332479
 ] 

Benjamin Lerer edited comment on CASSANDRA-16619 at 4/26/21, 3:06 PM:
----------------------------------------------------------------------

As {{4.0-rc1}} has been released we need to provide forward compatibility for 
data. I do not think it is the case with the current patch. Users using 
{{4.0-rc1}} will have an {{na}} format *without* hostID whereas when they will 
deploy {{4.0-rc2}} they will have an {{na}} format *with* hostID which will 
fail to read the {{na}} SSTables from the {{4.0-rc1}}. To fix that I think that 
we need to introduce a new {{nb}} version.

Regarding the patches should refactor the {{MetadataSerializerTest.testXReadY}} 
tests to ensure that they test all the possible combinations not only some of 
them.


was (Author: blerer):
As {{4.0-rc1}} has been released by consequence we need to provide forward 
compatibility for data. I do not think it is the case with the current patch. 
Users using {{4.0-rc1}} will have an {{na}} format *without* hostID whereas 
when they will deploy {{4.0-rc2}} they will have an {{na}} format *with* hostID 
which will fail to read the {{na}} SSTables from the {{4.0-rc1}}. To fix that I 
think that we need to introduce a new {{nb}} version.

Regarding the patches should refactor the {{MetadataSerializerTest.testXReadY}} 
tests to ensure that they test all the possible combinations not only some of 
them.

> Loss of commit log data possible after sstable ingest
> -----------------------------------------------------
>
>                 Key: CASSANDRA-16619
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Commit Log
>            Reporter: Jacek Lewandowski
>            Assignee: Jacek Lewandowski
>            Priority: Normal
>             Fix For: 4.0, 3.0.x, 3.11.x
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to