[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528376#comment-17528376 ] Doug Whitfield commented on CASSANDRA-16619: oops, I was wanting to do a search for things since 3.11.9 but I changed this bug. I clearly need more coffee. going to see if I can figure out what it was. > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450645#comment-17450645 ] Stefan Miklosovic commented on CASSANDRA-16619: --- I ll keep an eye on this, I was trying to reproduce it but I cant :D But I am absolutely sure I was getting this, multiple times in a row. I could not wrapy my head around this. I am not sure if it is happening non-deterministically or what. > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450642#comment-17450642 ] Jeremiah Jordan commented on CASSANDRA-16619: - I would expect someone to see more data replayed, because the intervals would not be ignored, but the end result data wise should be the same. The data in the commitlogs should be idempotent so it is safe to replay even if it is already in a given sstable. > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450494#comment-17450494 ] Branimir Lambov commented on CASSANDRA-16619: - Rechecking the code, the point in time in restores is applied in {{MutationInitiator.initiateMutation}} [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L247]. What this patch may change is the filtering by commit log position done [later in the same method|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L265]. I still do not see any evidence that this patch is affecting PIT restore in any way, other than making replay slower because it could be needlessly replaying mutations that are already present in sstables. Granted, the latter is an issue and may warrant risking correctness by flagging this out, but reusing the same ID as in CASSANDRA-14582 is most probably a better solution. > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450493#comment-17450493 ] Jacek Lewandowski commented on CASSANDRA-16619: --- [~smiklosovic] you mentioned you have some test to demonstrates the problem - could you share it? > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450486#comment-17450486 ] Stefan Miklosovic commented on CASSANDRA-16619: --- Ok I am reading this: [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L290-L313] After closer look it seems that it relates to SSTables, not commit logs. So basically it will replay all "commitLogIntervals", whatever it means, I am not familiar with this code, and it does not have anything in common with restore_point_in_time setting or anything related to that replay path. It is just because SSTables were created on the other node, host ids do not match, so it will replay all commit log intervals and these commit log intervals are maybe covering all intervals of commit logs I want to replay, right? Because, clearly, there is less mutations, physically, in these sstables then in sstables + commit logs so when I see all mutations being replayed, the stuff this patch introduced will somehow replay it all ... > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450478#comment-17450478 ] Brandon Williams commented on CASSANDRA-16619: -- I think we can just add a flag to disable to get around this. > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450469#comment-17450469 ] Stefan Miklosovic commented on CASSANDRA-16619: --- Maybe my wording was not precise enough so let me fix and reiterate on that. As you said, when it detects that the commit log was not created on the node we try to replay it on, it will effectively replay all mutations. I have a unit test for this and I noticed that when I created 6 mutations and I took a snapshot of that and I created two more mutations but I have not made a snapshot of that but I backed up a commit log, when I replayed commit logs in such a way that I was expecting 7 mutations to be present (6 from sstables + 1 from logs), I was still getting 8 mutations and after looking into the logs I discovered that message which lead me to this ticket. So in that sense - "restore_point_in_time" is effectively ignored in cases when I want to restore to a completely new node because its host id will be generated in such a way that it will not match the host id of the commit log I want to replay - so it will replay all mutations - but I do not want that. > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450439#comment-17450439 ] Branimir Lambov commented on CASSANDRA-16619: - Could you elaborate why this change would cause the restore point would be ignored? > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450366#comment-17450366 ] Stefan Miklosovic commented on CASSANDRA-16619: --- No, I think I am still right in what I wrote. It breaks point-in-time restoration. If you set property "restore_point_in_time" in commitlog_archiving.properties, with this patch it is just ignored and all is replayed. So you do not have any possibility to replay to some point in time for a completely new node. I still consider this to be a regression. > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450328#comment-17450328 ] Branimir Lambov commented on CASSANDRA-16619: - [~tsteinmaurer], could you please open a separate ticket with your suggestions, so that the fix can be tracked correctly? {quote} How am I supposed to do a point-in-time restoration using a commit log from the other node when it gets skipped as shown in the previous comment? {quote} The patch ensures that the commit log will _not_ be skipped, even if it matches spans that sstables already cover. The problem you are describing is not a correctness one – the node will ignore intervals and hence replay all commit log data that it finds. Since mutations are idempotent, this will work correctly, albeit slower than before. The difference is that it will not mess up if the wrong sstable ended up in the restore set for whatever reason. > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450284#comment-17450284 ] Stefan Miklosovic commented on CASSANDRA-16619: --- I think this feature is broken on the scenario when I have a commit log I want to replay for a completely diffent / new node which has UUID not matching the one in the commit log. There is a way how to workaround this will be in 4.1 as done in https://issues.apache.org/jira/browse/CASSANDRA-14582 however I am afraid that point in time restoration of all other Cassandra versions, since this patch was introduced, is broken. > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17449594#comment-17449594 ] Stefan Miklosovic commented on CASSANDRA-16619: --- How am I supposed to do a point-in-time restoration using a commit log from the other node when it gets skipped as show in the previous comment? > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441530#comment-17441530 ] Thomas Steinmaurer commented on CASSANDRA-16619: Regarding the WARN log, which got introduced by that ticket, e.g.: {noformat} WARN [main] 2021-11-08 21:54:06,826 CommitLogReplayer.java:253 - Origin of 1 sstables is unknown or doesn't match the local node; commitLogIntervals for them were ignored {noformat} While I understand the intention to ensure / avoid things when SSTables have been copied around (or e.g. due to a restore), the WARN log also seems to happen when Cassandra 3.11.11 reads pre-"*me*" SSTables, thus e.g. from 3.11.10. I understand that the WARN log will go away eventually on its own resp. for sure (I guess?) after running "nodetool upgradesstables". These sort of WARN log has produced quite some confusion and customer interaction for on-premise customer installations. * Would it be possible to WARN only if we are in context of a "me" SSTable to avoid confusion after upgrading from pre-3.11.11? * Would it be possible to mention a SSTable minor upgrade in e.g. {{NEWS.txt}} (or perhaps I missed it), as there might be tooling out there which counts number of SSTables per "format" via file name Many thanks. > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17345976#comment-17345976 ] Benjamin Lerer commented on CASSANDRA-16619: Committed into cassandra 4.0 at d35f36cd055419f5ba5b82f2efc047348c71b530 and merged into trunk. > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17345956#comment-17345956 ] Benjamin Lerer commented on CASSANDRA-16619: The patch looks good to me. > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17344917#comment-17344917 ] Ekaterina Dimitrova commented on CASSANDRA-16619: - Thank you [~jlewandowski], I will review the patch on Monday > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17344513#comment-17344513 ] Jacek Lewandowski commented on CASSANDRA-16619: --- [~e.dimitrova] I've missed regenerating {{nb}} in legacy sstables after the change - fixed that and rerun the tests: https://jenkins-cm4.apache.org/view/patches/job/Cassandra-devbranch-test/524/ > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343980#comment-17343980 ] Ekaterina Dimitrova commented on CASSANDRA-16619: - Jenkins run also submitted [here|https://jenkins-cm4.apache.org/job/Cassandra-devbranch/773/] > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343966#comment-17343966 ] Ekaterina Dimitrova commented on CASSANDRA-16619: - Thank you [~jlewandowski], CircleCI run submitted: [Java 8 | https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/821/workflows/a53aac7b-ecee-4597-9d27-66a9d6b9ab90] | [Java 11 | https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/821/workflows/02eba6a1-5601-45f1-9743-16b3f04beaf8] > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17340366#comment-17340366 ] Jacek Lewandowski commented on CASSANDRA-16619: --- thank you [~blerer] > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339547#comment-17339547 ] Benjamin Lerer commented on CASSANDRA-16619: +1. > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 4.0, 3.0.x, 3.11.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17337318#comment-17337318 ] Jacek Lewandowski commented on CASSANDRA-16619: --- Started CI tests: https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/720/ https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/721/ https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/722/ > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 4.0, 3.0.x, 3.11.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17333149#comment-17333149 ] Jacek Lewandowski commented on CASSANDRA-16619: --- Thanks you [~blerer], I'll apply the requested changes > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 4.0, 3.0.x, 3.11.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332479#comment-17332479 ] Benjamin Lerer commented on CASSANDRA-16619: As {{4.0-rc1}} has been released by consequence we need to provide forward compatibility for data. I do not think it is the case with the current patch. Users using {{4.0-rc1}} will have an {{na}} format *without* hostID whereas when they will deploy {{4.0-rc2}} they will have an {{na}} format *with* hostID which will fail to read the {{na}} SSTables from the {{4.0-rc1}}. To fix that I think that we need to introduce a new {{nb}} version. Regarding the patches should refactor the testMcReadMc > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 4.0, 3.0.x, 3.11.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330342#comment-17330342 ] Jacek Lewandowski commented on CASSANDRA-16619: --- I've started CI builds: https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/703/ https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/704/ https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/705/ > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 4.0, 3.0.x, 3.11.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329926#comment-17329926 ] Jeremiah Jordan commented on CASSANDRA-16619: - There is only a downgrade issue for major version bumps of sstables. > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 4.0, 3.0.x, 3.11.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329924#comment-17329924 ] Jeremiah Jordan commented on CASSANDRA-16619: - Actually new minor sstable versions that only add new metadata fields are fine. If you downgrade the previous version should still be able to read the files. > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 4.0, 3.0.x, 3.11.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329365#comment-17329365 ] Benjamin Lerer commented on CASSANDRA-16619: One problem with a new SSTable version is that if you hit an issue after upgrading you might not be able to downgrade your cluster without losing some data. I agree that the current solution is probably better, nevertheless introducing a new SSTable version is not without consequence until we introduce a mechanism as the one descripbe by [~cscotta] in the roadmap discussion. Would it makes sense to use another approach for now and introduce the change later on (either when we have other reasons to introduce a new SSTable format or when we have a safety mechanism) ? > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17327149#comment-17327149 ] Branimir Lambov commented on CASSANDRA-16619: - {quote}what do we do with other things such as repair, ancestry, level, etc? {quote} With this ticket, we _have_ the originating host id, so we have the means to ignore non-relevant information, whether it is in commit log, compaction or anywhere. There's some room to make the interface more generic, i.e. have a mechanism to mark fields as local so that they can be properly combined when doing compaction (which can easily be done in a separate ticket), but this IMHO is a better solution to the problem as it handles all manners of transfer and also allows correcting errors caused by tables already transferred by the time a bug with local metadata is uncovered. > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326943#comment-17326943 ] David Capwell commented on CASSANDRA-16619: --- A backup/restore process which bypasses nodetool import and directly dumps the files in the CF directory makes sense to hit this, but if you go through import I would hope we strip out all the metadata which is no longer relevant (which we are trying to do in import as commit log position isn't the only thing we need to deal with). If we special case commit log, what do we do with other things such as repair, ancestry, level, etc? Since the cases which load SStables from external writers are few and well known, I feel it makes the most sense to make sure each strips the metadata the same way. Adding a method to MetadataSerializer such as resetCommitLogPosition and calling it in the places which import files would handle this without requiring a format change (import allows more flexibility in what we strip out, which backup/restore processes can use. So nice to have this method rather than a resetNonLocalMetadata method). > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326916#comment-17326916 ] Jeremiah Jordan commented on CASSANDRA-16619: - [~dcapwell] this isn't just about ZCS. Any backup/restore process that copied an SSTable around is also affected. If a given node did not create the CommitLogPosition information then it needs to be ignored when loading the sstable. > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326911#comment-17326911 ] David Capwell commented on CASSANDRA-16619: --- bq. a node needs to load the file into memory when receiving from remote we would already when we open the file, so can strip this out still > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326907#comment-17326907 ] Yifan Cai commented on CASSANDRA-16619: --- ZCS streams a file as-is and w/o loading it into memory, hence fast. To remove a field metadata, a node needs to load the file into memory when receiving from remote. I think it is an expected behavior with ZCS. To distinguish, adding the original hostID in the metadata sounds valid. > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x > > Time Spent: 20m > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326906#comment-17326906 ] David Capwell commented on CASSANDRA-16619: --- Looking at importer I don't see it cleaning up https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SSTableImporter.java#L365-L380. Ideally we should drop it, and ancestors (also missing) > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x > > Time Spent: 20m > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326899#comment-17326899 ] David Capwell commented on CASSANDRA-16619: --- that sounds like a bug in zero-copy streaming then, ideally we should strip that info out before adding > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x > > Time Spent: 20m > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326894#comment-17326894 ] Jacek Lewandowski commented on CASSANDRA-16619: --- [~dcapwell] - how the streaming is expected to remove this info? I can see that zero copy streaming moves the whole files between the nodes and there is no transformation which removes that information. > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x > > Time Spent: 20m > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325939#comment-17325939 ] David Capwell commented on CASSANDRA-16619: --- bq. If an SSTable is moved between nodes What method are you using to "move" SSTables? Streaming and nodetool import are expected to remove this info; can you elaborate? > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 4.0, 3.0.x, 3.11.x > > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org