[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2022-04-26 Thread Doug Whitfield (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528376#comment-17528376
 ] 

Doug Whitfield commented on CASSANDRA-16619:


oops, I was wanting to do a search for things since 3.11.9 but I changed this 
bug. I clearly need more coffee. going to see if I can figure out what it was.

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-11-29 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450645#comment-17450645
 ] 

Stefan Miklosovic commented on CASSANDRA-16619:
---

I ll keep an eye on this, I was trying to reproduce it but I cant :D But I am 
absolutely sure I was getting this, multiple times in a row. I could not wrapy 
my head around this. I am not sure if it is happening non-deterministically or 
what.

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-11-29 Thread Jeremiah Jordan (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450642#comment-17450642
 ] 

Jeremiah Jordan commented on CASSANDRA-16619:
-

I would expect someone to see more data replayed, because the intervals would 
not be ignored, but the end result data wise should be the same. The data in 
the commitlogs should be idempotent so it is safe to replay even if it is 
already in a given sstable.

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-11-29 Thread Branimir Lambov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450494#comment-17450494
 ] 

Branimir Lambov commented on CASSANDRA-16619:
-

Rechecking the code, the point in time in restores is applied in 
{{MutationInitiator.initiateMutation}} 
[here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L247].
 What this patch may change is the filtering by commit log position done [later 
in the same 
method|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L265].

I still do not see any evidence that this patch is affecting PIT restore in any 
way, other than making replay slower because it could be needlessly replaying 
mutations that are already present in sstables. Granted, the latter is an issue 
and may warrant risking correctness by flagging this out, but reusing the same 
ID as in CASSANDRA-14582 is most probably a better solution.

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-11-29 Thread Jacek Lewandowski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450493#comment-17450493
 ] 

Jacek Lewandowski commented on CASSANDRA-16619:
---

[~smiklosovic] you mentioned you have some test to demonstrates the problem - 
could you share it?

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-11-29 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450486#comment-17450486
 ] 

Stefan Miklosovic commented on CASSANDRA-16619:
---

Ok I am reading this: 
[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L290-L313]

After closer look it seems that it relates to SSTables, not commit logs. So 
basically it will replay all "commitLogIntervals", whatever it means, I am not 
familiar with this code, and it does not have anything in common with 
restore_point_in_time setting or anything related to that replay path. It is 
just because SSTables were created on the other node, host ids do not match, so 
it will replay all commit log intervals and these commit log intervals are 
maybe covering all intervals of commit logs I want to replay, right? Because, 
clearly, there is less mutations, physically, in these sstables then in 
sstables + commit logs so when I see all mutations being replayed, the stuff 
this patch introduced will somehow replay it all ...

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-11-29 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450478#comment-17450478
 ] 

Brandon Williams commented on CASSANDRA-16619:
--

I think we can just add a flag to disable to get around this.

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-11-29 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450469#comment-17450469
 ] 

Stefan Miklosovic commented on CASSANDRA-16619:
---

Maybe my wording was not precise enough so let me fix and reiterate on that.

As you said, when it detects that the commit log was not created on the node we 
try to replay it on, it will effectively replay all mutations. I have a unit 
test for this and I noticed that when I created 6 mutations and I took a 
snapshot of that and I created two more mutations but I have not made a 
snapshot of that but I backed up a commit log, when I replayed commit logs in 
such a way that I was expecting 7 mutations to be present (6 from sstables + 1 
from logs), I was still getting 8 mutations and after looking into the logs I 
discovered that message which lead me to this ticket.

So in that sense - "restore_point_in_time" is effectively ignored in cases when 
I want to restore to a completely new node because its host id will be 
generated in such a way that it will not match the host id of the commit log I 
want to replay - so it will replay all mutations - but I do not want that.

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-11-29 Thread Branimir Lambov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450439#comment-17450439
 ] 

Branimir Lambov commented on CASSANDRA-16619:
-

Could you elaborate why this change would cause the restore point would be 
ignored?

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-11-29 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450366#comment-17450366
 ] 

Stefan Miklosovic commented on CASSANDRA-16619:
---

No, I think I am still right in what I wrote. It breaks point-in-time 
restoration. If you set property "restore_point_in_time" in 
commitlog_archiving.properties, with this patch it is just ignored and all is 
replayed. So you do not have any possibility to replay to some point in time 
for a completely new node. I still consider this to be a regression.

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-11-29 Thread Branimir Lambov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450328#comment-17450328
 ] 

Branimir Lambov commented on CASSANDRA-16619:
-

[~tsteinmaurer], could you please open a separate ticket with your suggestions, 
so that the fix can be tracked correctly?
{quote} How am I supposed to do a point-in-time restoration using a commit log 
from the other node when it gets skipped as shown in the previous comment? 
{quote}
The patch ensures that the commit log will _not_ be skipped, even if it matches 
spans that sstables already cover. The problem you are describing is not a 
correctness one – the node will ignore intervals and hence replay all commit 
log data that it finds. Since mutations are idempotent, this will work 
correctly, albeit slower than before. The difference is that it will not mess 
up if the wrong sstable ended up in the restore set for whatever reason.

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-11-29 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450284#comment-17450284
 ] 

Stefan Miklosovic commented on CASSANDRA-16619:
---

I think this feature is broken on the scenario when I have a commit log I want 
to replay for a completely diffent / new node which has UUID not matching the 
one in the commit log. There is a way how to workaround this will be in 4.1 as 
done in https://issues.apache.org/jira/browse/CASSANDRA-14582 however I am 
afraid that point in time restoration of all other Cassandra versions, since 
this patch was introduced, is broken. 

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-11-26 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17449594#comment-17449594
 ] 

Stefan Miklosovic commented on CASSANDRA-16619:
---

How am I supposed to do a point-in-time restoration using a commit log from the 
other node when it gets skipped as show in the previous comment?

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-11-09 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441530#comment-17441530
 ] 

Thomas Steinmaurer commented on CASSANDRA-16619:


Regarding the WARN log, which got introduced by that ticket, e.g.:
{noformat}
WARN  [main] 2021-11-08 21:54:06,826 CommitLogReplayer.java:253 - Origin of 1 
sstables is unknown or doesn't match the local node; commitLogIntervals for 
them were ignored
{noformat}

While I understand the intention to ensure / avoid things when SSTables have 
been copied around (or e.g. due to a restore), the WARN log also seems to 
happen when Cassandra 3.11.11 reads pre-"*me*" SSTables, thus e.g. from 
3.11.10. I understand that the WARN log will go away eventually on its own 
resp. for sure (I guess?) after running "nodetool upgradesstables".

These sort of WARN log has produced quite some confusion and customer 
interaction for on-premise customer installations.
* Would it be possible to WARN only if we are in context of a "me" SSTable to 
avoid confusion after upgrading from pre-3.11.11?
* Would it be possible to mention a SSTable minor upgrade in e.g. {{NEWS.txt}} 
(or perhaps I missed it), as there might be tooling out there which counts 
number of SSTables per "format" via file name

Many thanks.

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-05-17 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17345976#comment-17345976
 ] 

Benjamin Lerer commented on CASSANDRA-16619:


Committed into cassandra 4.0 at d35f36cd055419f5ba5b82f2efc047348c71b530 and 
merged into trunk.

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-05-17 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17345956#comment-17345956
 ] 

Benjamin Lerer commented on CASSANDRA-16619:


The patch looks good to me.


> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-05-14 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17344917#comment-17344917
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16619:
-

Thank you [~jlewandowski], I will review the patch on Monday

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-05-14 Thread Jacek Lewandowski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17344513#comment-17344513
 ] 

Jacek Lewandowski commented on CASSANDRA-16619:
---

[~e.dimitrova] I've missed regenerating {{nb}} in legacy sstables after the 
change - fixed that and rerun the tests: 
https://jenkins-cm4.apache.org/view/patches/job/Cassandra-devbranch-test/524/


> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-05-13 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343980#comment-17343980
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16619:
-

Jenkins run also submitted 
[here|https://jenkins-cm4.apache.org/job/Cassandra-devbranch/773/]

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-05-13 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343966#comment-17343966
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16619:
-

Thank you [~jlewandowski], CircleCI run submitted:
[Java 8 | 
https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/821/workflows/a53aac7b-ecee-4597-9d27-66a9d6b9ab90]
 | [Java 11 | 
https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/821/workflows/02eba6a1-5601-45f1-9743-16b3f04beaf8]

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-05-06 Thread Jacek Lewandowski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17340366#comment-17340366
 ] 

Jacek Lewandowski commented on CASSANDRA-16619:
---

thank you [~blerer]

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-05-05 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339547#comment-17339547
 ] 

Benjamin Lerer commented on CASSANDRA-16619:


+1.

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 4.0, 3.0.x, 3.11.x
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-04-30 Thread Jacek Lewandowski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17337318#comment-17337318
 ] 

Jacek Lewandowski commented on CASSANDRA-16619:
---

Started CI tests:
https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/720/
https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/721/
https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/722/

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 4.0, 3.0.x, 3.11.x
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-04-27 Thread Jacek Lewandowski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17333149#comment-17333149
 ] 

Jacek Lewandowski commented on CASSANDRA-16619:
---

Thanks you [~blerer], I'll apply the requested changes

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 4.0, 3.0.x, 3.11.x
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-04-26 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332479#comment-17332479
 ] 

Benjamin Lerer commented on CASSANDRA-16619:


As {{4.0-rc1}} has been released by consequence we need to provide forward 
compatibility for data. I do not think it is the case with the current patch. 
Users using {{4.0-rc1}} will have an {{na}} format *without* hostID whereas 
when they will deploy {{4.0-rc2}} they will have an {{na}} format *with* hostID 
which will fail to read the {{na}} SSTables from the {{4.0-rc1}}. To fix that I 
think that we need to introduce a new {{nb}} version.

Regarding the patches should refactor the testMcReadMc

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 4.0, 3.0.x, 3.11.x
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-04-23 Thread Jacek Lewandowski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330342#comment-17330342
 ] 

Jacek Lewandowski commented on CASSANDRA-16619:
---

I've started CI builds:

https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/703/
https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/704/
https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/705/

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 4.0, 3.0.x, 3.11.x
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-04-22 Thread Jeremiah Jordan (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329926#comment-17329926
 ] 

Jeremiah Jordan commented on CASSANDRA-16619:
-

There is only a downgrade issue for major version bumps of sstables.

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 4.0, 3.0.x, 3.11.x
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-04-22 Thread Jeremiah Jordan (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329924#comment-17329924
 ] 

Jeremiah Jordan commented on CASSANDRA-16619:
-

Actually new minor sstable versions that only add new metadata fields are fine. 
If you downgrade the previous version should still be able to read the files.

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 4.0, 3.0.x, 3.11.x
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-04-22 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329365#comment-17329365
 ] 

Benjamin Lerer commented on CASSANDRA-16619:


One problem with a new SSTable version is that if you hit an issue after 
upgrading you might not be able to downgrade your cluster without losing some 
data. I agree that the current solution is probably better, nevertheless 
introducing a new SSTable version is not without consequence until we introduce 
a mechanism as the one descripbe by [~cscotta] in the roadmap discussion.
Would it makes sense to use another approach for now and introduce the change 
later on (either when we have other reasons to introduce a new SSTable format 
or when we have a safety mechanism) ? 

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-04-22 Thread Branimir Lambov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17327149#comment-17327149
 ] 

Branimir Lambov commented on CASSANDRA-16619:
-

{quote}what do we do with other things such as repair, ancestry, level, etc?
{quote}
With this ticket, we _have_ the originating host id, so we have the means to 
ignore non-relevant information, whether it is in commit log, compaction or 
anywhere.

There's some room to make the interface more generic, i.e. have a mechanism to 
mark fields as local so that they can be properly combined when doing 
compaction (which can easily be done in a separate ticket), but this IMHO is a 
better solution to the problem as it handles all manners of transfer and also 
allows correcting errors caused by tables already transferred by the time a bug 
with local metadata is uncovered.

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-04-21 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326943#comment-17326943
 ] 

David Capwell commented on CASSANDRA-16619:
---

A backup/restore process which bypasses nodetool import and directly dumps the 
files in the CF directory makes sense to hit this, but if you go through import 
I would hope we strip out all the metadata which is no longer relevant (which 
we are trying to do in import as commit log position isn't the only thing we 
need to deal with).  If we special case commit log, what do we do with other 
things such as repair, ancestry, level, etc?  

Since the cases which load SStables from external writers are few and well 
known, I feel it makes the most sense to make sure each strips the metadata the 
same way. Adding a method to MetadataSerializer such as resetCommitLogPosition 
and calling it in the places which import files would handle this without 
requiring a format change (import allows more flexibility in what we strip out, 
which backup/restore processes can use.  So nice to have this method rather 
than a resetNonLocalMetadata method).

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-04-21 Thread Jeremiah Jordan (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326916#comment-17326916
 ] 

Jeremiah Jordan commented on CASSANDRA-16619:
-

[~dcapwell] this isn't just about ZCS.  Any backup/restore process that copied 
an SSTable around is also affected.  If a given node did not create the 
CommitLogPosition information then it needs to be ignored when loading the 
sstable.

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-04-21 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326911#comment-17326911
 ] 

David Capwell commented on CASSANDRA-16619:
---

bq. a node needs to load the file into memory when receiving from remote

we would already when we open the file, so can strip this out still

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-04-21 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326907#comment-17326907
 ] 

Yifan Cai commented on CASSANDRA-16619:
---

ZCS streams a file as-is and w/o loading it into memory, hence fast. To remove 
a field metadata, a node needs to load the file into memory when receiving from 
remote. 

I think it is an expected behavior with ZCS. 

To distinguish, adding the original hostID in the metadata sounds valid.

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-04-21 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326906#comment-17326906
 ] 

David Capwell commented on CASSANDRA-16619:
---

Looking at importer I don't see it cleaning up 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SSTableImporter.java#L365-L380.
 Ideally we should drop it, and ancestors (also missing)

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-04-21 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326899#comment-17326899
 ] 

David Capwell commented on CASSANDRA-16619:
---

that sounds like a bug in zero-copy streaming then, ideally we should strip 
that info out before adding

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-04-21 Thread Jacek Lewandowski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326894#comment-17326894
 ] 

Jacek Lewandowski commented on CASSANDRA-16619:
---

[~dcapwell] - how the streaming is expected to remove this info? I can see that 
zero copy streaming moves the whole files between the nodes and there is no 
transformation which removes that information.

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-04-20 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325939#comment-17325939
 ] 

David Capwell commented on CASSANDRA-16619:
---

bq. If an SSTable is moved between nodes

What method are you using to "move" SSTables?  Streaming and nodetool import 
are expected to remove this info; can you elaborate?

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 4.0, 3.0.x, 3.11.x
>
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org