[jira] [Commented] (CASSANDRA-12497) COPY ... TO STDOUT regression in 2.2.7
[ https://issues.apache.org/jira/browse/CASSANDRA-12497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196594#comment-16196594 ] ASF GitHub Bot commented on CASSANDRA-12497: Github user salomvary closed the pull request at: https://github.com/apache/cassandra/pull/92 > COPY ... TO STDOUT regression in 2.2.7 > -- > > Key: CASSANDRA-12497 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12497 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Max Bowsher >Assignee: Márton Salomváry > Fix For: 2.2.10, 3.0.12, 3.11.0, 4.0 > > > Cassandra 2.2.7 introduces a regression over 2.2.6 breaking COPY ... TO > STDOUT. > In pylib/cqlshlib/copyutil.py, in CopyTask.__init__, self.printmsg is > conditionally defined to EITHER a module level function accepting arguments > (msg, eol=, encoding=), OR a lambda accepting arguments only (_, eol=). > Consequently, when the lambda is in use (which requires COPY ... TO STDOUT > without --debug), any attempt to call CopyTask.printmsg with an encoding > parameter causes an exception. > This occurs in ExportTask.run, thus rendering all COPY ... TO STDOUT without > --debug broken. > The fix is to update the lambda's arguments to include encoding, or better, > replace it with a module-level function defined next to printmsg, so that > people realize the two argument lists must be kept in sync. > The regression was introduced in this commit: > commit 5de9de1f5832f2a0e92783e2f4412874423e6e15 > Author: Tyler Hobbs > Date: Thu May 5 11:33:35 2016 -0500 > cqlsh: Handle non-ascii chars in error messages > > Patch by Tyler Hobbs; reviewed by Paulo Motta for CASSANDRA-11626 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13930) Avoid grabbing the read lock when checking if compaction strategy should do defragmentation
[ https://issues.apache.org/jira/browse/CASSANDRA-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196660#comment-16196660 ] Marcus Eriksson commented on CASSANDRA-13930: - bq. What do you think about a similar fix for fanout? makes sense, at least it doesn't hurt (until we want to have LCS change fanout dynamically or something) pushed up a new commit to the same branch and rerunning tests https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/363/ https://circleci.com/gh/krummas/cassandra/148 > Avoid grabbing the read lock when checking if compaction strategy should do > defragmentation > --- > > Key: CASSANDRA-13930 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13930 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson > Fix For: 3.11.x, 4.x > > > We grab the read lock when checking whether the compaction strategy benefits > from defragmentation, avoid that. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-13942) Open Cassandra.yaml for developers to extend custom settings
zhaoyan created CASSANDRA-13942: --- Summary: Open Cassandra.yaml for developers to extend custom settings Key: CASSANDRA-13942 URL: https://issues.apache.org/jira/browse/CASSANDRA-13942 Project: Cassandra Issue Type: Wish Components: Configuration Reporter: zhaoyan we now try to write one index plugin for cassandra. we want to put some more settings in cassandra.yaml. and read it in our code. we find the cassandra use DatabaseDescriptor.java and Config.java to save the configurations in cassandra.yaml. but we cant extend it so I advice cassandra provide some interfaces for deleopers to extend custom settings Thank you -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13910) Consider deprecating (then removing) read_repair_chance/dclocal_read_repair_chance
[ https://issues.apache.org/jira/browse/CASSANDRA-13910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196782#comment-16196782 ] Robert Stupp commented on CASSANDRA-13910: -- (If nodes are down for longer than the hint-window (or some similar operational issues), you should run an AE repair and not rely on RR.) Considering the misunderstandings in the wild how RR works, I'm also +1 on deprecating it in 3.11.x and removing in 4.0. In fact, I haven't seen a single use case where RR was used intentionally, but many because of "it's the default, so it must be good to have it". > Consider deprecating (then removing) > read_repair_chance/dclocal_read_repair_chance > -- > > Key: CASSANDRA-13910 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13910 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Priority: Minor > Labels: CommunityFeedbackRequested > > First, let me clarify so this is not misunderstood that I'm not *at all* > suggesting to remove the read-repair mechanism of detecting and repairing > inconsistencies between read responses: that mechanism is imo fine and > useful. But the {{read_repair_chance}} and {{dclocal_read_repair_chance}} > have never been about _enabling_ that mechanism, they are about querying all > replicas (even when this is not required by the consistency level) for the > sole purpose of maybe read-repairing some of the replica that wouldn't have > been queried otherwise. Which btw, bring me to reason 1 for considering their > removal: their naming/behavior is super confusing. Over the years, I've seen > countless users (and not only newbies) misunderstanding what those options > do, and as a consequence misunderstand when read-repair itself was happening. > But my 2nd reason for suggesting this is that I suspect > {{read_repair_chance}}/{{dclocal_read_repair_chance}} are, especially > nowadays, more harmful than anything else when enabled. When those option > kick in, what you trade-off is additional resources consumption (all nodes > have to execute the read) for a _fairly remote chance_ of having some > inconsistencies repaired on _some_ replica _a bit faster_ than they would > otherwise be. To justify that last part, let's recall that: > # most inconsistencies are actually fixed by hints in practice; and in the > case where a node stay dead for a long time so that hints ends up timing-out, > you really should repair the node when it comes back (if not simply > re-bootstrapping it). Read-repair probably don't fix _that_ much stuff in > the first place. > # again, read-repair do happen without those options kicking in. If you do > reads at {{QUORUM}}, inconsistencies will eventually get read-repaired all > the same. Just a tiny bit less quickly. > # I suspect almost everyone use a low "chance" for those options at best > (because the extra resources consumption is real), so at the end of the day, > it's up to chance how much faster this fixes inconsistencies. > Overall, I'm having a hard time imagining real cases where that trade-off > really make sense. Don't get me wrong, those options had their places a long > time ago when hints weren't working all that well, but I think they bring > more confusion than benefits now. > And I think it's sane to reconsider stuffs every once in a while, and to > clean up anything that may not make all that much sense anymore, which I > think is the case here. > Tl;dr, I feel the benefits brought by those options are very slim at best and > well overshadowed by the confusion they bring, and not worth maintaining the > code that supports them (which, to be fair, isn't huge, but getting rid of > {{ReadCallback.AsyncRepairRunner}} wouldn't hurt for instance). > Lastly, if the consensus here ends up being that they can have their use in > weird case and that we fill supporting those cases is worth confusing > everyone else and maintaining that code, I would still suggest disabling them > totally by default. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13813) Don't let user drop (or generally break) tables in system_distributed
[ https://issues.apache.org/jira/browse/CASSANDRA-13813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196789#comment-16196789 ] Aleksey Yeschenko commented on CASSANDRA-13813: --- bq. I'm not sure I understand the problem. If user have manually and knowingly updated some table params, my guess is that they expect (even rely on) future changes to defaults to not override their changes. Isn't the whole point why we've picked 0 for our hardcoded timestamp in fact? Right. But the way ALTER works, we serialise the whole table, including all params and all columns, with the new timestamp in {{system_schema.*}} tables. Which makes it impossible for us to change the defaults later, even those that the user didn't modify on purpose. And this isn't something we can change very easily in a minor I'm afraid. This is why we don't allow altering anything beyond keyspace params, and why this issue is, as it stands, a serious bug, and was never intended to be allowed. > Don't let user drop (or generally break) tables in system_distributed > - > > Key: CASSANDRA-13813 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13813 > Project: Cassandra > Issue Type: Bug > Components: Distributed Metadata >Reporter: Sylvain Lebresne >Assignee: Aleksey Yeschenko > Fix For: 3.0.x, 3.11.x > > > There is not currently no particular restrictions on schema modifications to > tables of the {{system_distributed}} keyspace. This does mean you can drop > those tables, or even alter them in wrong ways like dropping or renaming > columns. All of which is guaranteed to break stuffs (that is, repair if you > mess up with on of it's table, or MVs if you mess up with > {{view_build_status}}). > I'm pretty sure this was never intended and is an oversight of the condition > on {{ALTERABLE_SYSTEM_KEYSPACES}} in > [ClientState|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L397]. > That condition is such that any keyspace not listed in > {{ALTERABLE_SYSTEM_KEYSPACES}} (which happens to be the case for > {{system_distributed}}) has no specific restrictions whatsoever, while given > the naming it's fair to assume the intention that exactly the opposite. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13813) Don't let user drop (or generally break) tables in system_distributed
[ https://issues.apache.org/jira/browse/CASSANDRA-13813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196803#comment-16196803 ] Aleksey Yeschenko commented on CASSANDRA-13813: --- FWIW, I'll be the first to admit that the current situation is not ideal. It wasn't me who came up with it, but I share part of the blame - replicated system keyspaces are a bit of a mess, and this has already caused us some issues, and hassle with {{system_auth}}, and it won't be the last. We can't even fix CASSANDRA-12701 properly in a minor without causing migration mismatch fun. So all things considered, my personal preference would be to shield existing users from causing further issues for themselves by accidentally or intentionally modifying those tables. At least until we have a good answer to these related issues, which I don't :( > Don't let user drop (or generally break) tables in system_distributed > - > > Key: CASSANDRA-13813 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13813 > Project: Cassandra > Issue Type: Bug > Components: Distributed Metadata >Reporter: Sylvain Lebresne >Assignee: Aleksey Yeschenko > Fix For: 3.0.x, 3.11.x > > > There is not currently no particular restrictions on schema modifications to > tables of the {{system_distributed}} keyspace. This does mean you can drop > those tables, or even alter them in wrong ways like dropping or renaming > columns. All of which is guaranteed to break stuffs (that is, repair if you > mess up with on of it's table, or MVs if you mess up with > {{view_build_status}}). > I'm pretty sure this was never intended and is an oversight of the condition > on {{ALTERABLE_SYSTEM_KEYSPACES}} in > [ClientState|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L397]. > That condition is such that any keyspace not listed in > {{ALTERABLE_SYSTEM_KEYSPACES}} (which happens to be the case for > {{system_distributed}}) has no specific restrictions whatsoever, while given > the naming it's fair to assume the intention that exactly the opposite. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13910) Consider deprecating (then removing) read_repair_chance/dclocal_read_repair_chance
[ https://issues.apache.org/jira/browse/CASSANDRA-13910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196820#comment-16196820 ] Aleksey Yeschenko commented on CASSANDRA-13910: --- Right. But you can't *rely* on speculative RR unless your chance is set to an absurdly high value. Which I don't see why anyone would do, and in that case you might as well set speculative retry to {{ALWAYS}} instead. bq. There's also very minimal gain from removing this from the codebase. Says who? I've got plans to rewrite our coordinator read path level. To me having a new clean implementation that doesn't need to concern itself with baggage like RR has more than minimal gain - I only need to worry about speculative retry being a complication. Anyway, repeating my +1 here for removal in 4.0. [~slebresne] Do you want to write up a patch for this? If not, feel free to assign the JIRA to me. > Consider deprecating (then removing) > read_repair_chance/dclocal_read_repair_chance > -- > > Key: CASSANDRA-13910 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13910 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Priority: Minor > Labels: CommunityFeedbackRequested > > First, let me clarify so this is not misunderstood that I'm not *at all* > suggesting to remove the read-repair mechanism of detecting and repairing > inconsistencies between read responses: that mechanism is imo fine and > useful. But the {{read_repair_chance}} and {{dclocal_read_repair_chance}} > have never been about _enabling_ that mechanism, they are about querying all > replicas (even when this is not required by the consistency level) for the > sole purpose of maybe read-repairing some of the replica that wouldn't have > been queried otherwise. Which btw, bring me to reason 1 for considering their > removal: their naming/behavior is super confusing. Over the years, I've seen > countless users (and not only newbies) misunderstanding what those options > do, and as a consequence misunderstand when read-repair itself was happening. > But my 2nd reason for suggesting this is that I suspect > {{read_repair_chance}}/{{dclocal_read_repair_chance}} are, especially > nowadays, more harmful than anything else when enabled. When those option > kick in, what you trade-off is additional resources consumption (all nodes > have to execute the read) for a _fairly remote chance_ of having some > inconsistencies repaired on _some_ replica _a bit faster_ than they would > otherwise be. To justify that last part, let's recall that: > # most inconsistencies are actually fixed by hints in practice; and in the > case where a node stay dead for a long time so that hints ends up timing-out, > you really should repair the node when it comes back (if not simply > re-bootstrapping it). Read-repair probably don't fix _that_ much stuff in > the first place. > # again, read-repair do happen without those options kicking in. If you do > reads at {{QUORUM}}, inconsistencies will eventually get read-repaired all > the same. Just a tiny bit less quickly. > # I suspect almost everyone use a low "chance" for those options at best > (because the extra resources consumption is real), so at the end of the day, > it's up to chance how much faster this fixes inconsistencies. > Overall, I'm having a hard time imagining real cases where that trade-off > really make sense. Don't get me wrong, those options had their places a long > time ago when hints weren't working all that well, but I think they bring > more confusion than benefits now. > And I think it's sane to reconsider stuffs every once in a while, and to > clean up anything that may not make all that much sense anymore, which I > think is the case here. > Tl;dr, I feel the benefits brought by those options are very slim at best and > well overshadowed by the confusion they bring, and not worth maintaining the > code that supports them (which, to be fair, isn't huge, but getting rid of > {{ReadCallback.AsyncRepairRunner}} wouldn't hurt for instance). > Lastly, if the consensus here ends up being that they can have their use in > weird case and that we fill supporting those cases is worth confusing > everyone else and maintaining that code, I would still suggest disabling them > totally by default. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13910) Consider deprecating (then removing) read_repair_chance/dclocal_read_repair_chance
[ https://issues.apache.org/jira/browse/CASSANDRA-13910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-13910: -- Fix Version/s: 3.11.x 4.0 > Consider deprecating (then removing) > read_repair_chance/dclocal_read_repair_chance > -- > > Key: CASSANDRA-13910 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13910 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Priority: Minor > Labels: CommunityFeedbackRequested > Fix For: 4.0, 3.11.x > > > First, let me clarify so this is not misunderstood that I'm not *at all* > suggesting to remove the read-repair mechanism of detecting and repairing > inconsistencies between read responses: that mechanism is imo fine and > useful. But the {{read_repair_chance}} and {{dclocal_read_repair_chance}} > have never been about _enabling_ that mechanism, they are about querying all > replicas (even when this is not required by the consistency level) for the > sole purpose of maybe read-repairing some of the replica that wouldn't have > been queried otherwise. Which btw, bring me to reason 1 for considering their > removal: their naming/behavior is super confusing. Over the years, I've seen > countless users (and not only newbies) misunderstanding what those options > do, and as a consequence misunderstand when read-repair itself was happening. > But my 2nd reason for suggesting this is that I suspect > {{read_repair_chance}}/{{dclocal_read_repair_chance}} are, especially > nowadays, more harmful than anything else when enabled. When those option > kick in, what you trade-off is additional resources consumption (all nodes > have to execute the read) for a _fairly remote chance_ of having some > inconsistencies repaired on _some_ replica _a bit faster_ than they would > otherwise be. To justify that last part, let's recall that: > # most inconsistencies are actually fixed by hints in practice; and in the > case where a node stay dead for a long time so that hints ends up timing-out, > you really should repair the node when it comes back (if not simply > re-bootstrapping it). Read-repair probably don't fix _that_ much stuff in > the first place. > # again, read-repair do happen without those options kicking in. If you do > reads at {{QUORUM}}, inconsistencies will eventually get read-repaired all > the same. Just a tiny bit less quickly. > # I suspect almost everyone use a low "chance" for those options at best > (because the extra resources consumption is real), so at the end of the day, > it's up to chance how much faster this fixes inconsistencies. > Overall, I'm having a hard time imagining real cases where that trade-off > really make sense. Don't get me wrong, those options had their places a long > time ago when hints weren't working all that well, but I think they bring > more confusion than benefits now. > And I think it's sane to reconsider stuffs every once in a while, and to > clean up anything that may not make all that much sense anymore, which I > think is the case here. > Tl;dr, I feel the benefits brought by those options are very slim at best and > well overshadowed by the confusion they bring, and not worth maintaining the > code that supports them (which, to be fair, isn't huge, but getting rid of > {{ReadCallback.AsyncRepairRunner}} wouldn't hurt for instance). > Lastly, if the consensus here ends up being that they can have their use in > weird case and that we fill supporting those cases is worth confusing > everyone else and maintaining that code, I would still suggest disabling them > totally by default. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13910) Consider deprecating (then removing) read_repair_chance/dclocal_read_repair_chance
[ https://issues.apache.org/jira/browse/CASSANDRA-13910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196820#comment-16196820 ] Aleksey Yeschenko edited comment on CASSANDRA-13910 at 10/9/17 11:22 AM: - Right. But you can't *rely* on chance RR unless your chance is set to an absurdly high value. Which I don't see why anyone would do, and in that case you might as well set speculative retry to {{ALWAYS}} instead. bq. There's also very minimal gain from removing this from the codebase. Says who? I've got plans to rewrite our coordinator read path level. To me having a new clean implementation that doesn't need to concern itself with baggage like RR has more than minimal gain - I only need to worry about speculative retry being a complication. Anyway, repeating my +1 here for removal in 4.0. [~slebresne] Do you want to write up a patch for this? If not, feel free to assign the JIRA to me. was (Author: iamaleksey): Right. But you can't *rely* on speculative RR unless your chance is set to an absurdly high value. Which I don't see why anyone would do, and in that case you might as well set speculative retry to {{ALWAYS}} instead. bq. There's also very minimal gain from removing this from the codebase. Says who? I've got plans to rewrite our coordinator read path level. To me having a new clean implementation that doesn't need to concern itself with baggage like RR has more than minimal gain - I only need to worry about speculative retry being a complication. Anyway, repeating my +1 here for removal in 4.0. [~slebresne] Do you want to write up a patch for this? If not, feel free to assign the JIRA to me. > Consider deprecating (then removing) > read_repair_chance/dclocal_read_repair_chance > -- > > Key: CASSANDRA-13910 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13910 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Priority: Minor > Labels: CommunityFeedbackRequested > Fix For: 4.0, 3.11.x > > > First, let me clarify so this is not misunderstood that I'm not *at all* > suggesting to remove the read-repair mechanism of detecting and repairing > inconsistencies between read responses: that mechanism is imo fine and > useful. But the {{read_repair_chance}} and {{dclocal_read_repair_chance}} > have never been about _enabling_ that mechanism, they are about querying all > replicas (even when this is not required by the consistency level) for the > sole purpose of maybe read-repairing some of the replica that wouldn't have > been queried otherwise. Which btw, bring me to reason 1 for considering their > removal: their naming/behavior is super confusing. Over the years, I've seen > countless users (and not only newbies) misunderstanding what those options > do, and as a consequence misunderstand when read-repair itself was happening. > But my 2nd reason for suggesting this is that I suspect > {{read_repair_chance}}/{{dclocal_read_repair_chance}} are, especially > nowadays, more harmful than anything else when enabled. When those option > kick in, what you trade-off is additional resources consumption (all nodes > have to execute the read) for a _fairly remote chance_ of having some > inconsistencies repaired on _some_ replica _a bit faster_ than they would > otherwise be. To justify that last part, let's recall that: > # most inconsistencies are actually fixed by hints in practice; and in the > case where a node stay dead for a long time so that hints ends up timing-out, > you really should repair the node when it comes back (if not simply > re-bootstrapping it). Read-repair probably don't fix _that_ much stuff in > the first place. > # again, read-repair do happen without those options kicking in. If you do > reads at {{QUORUM}}, inconsistencies will eventually get read-repaired all > the same. Just a tiny bit less quickly. > # I suspect almost everyone use a low "chance" for those options at best > (because the extra resources consumption is real), so at the end of the day, > it's up to chance how much faster this fixes inconsistencies. > Overall, I'm having a hard time imagining real cases where that trade-off > really make sense. Don't get me wrong, those options had their places a long > time ago when hints weren't working all that well, but I think they bring > more confusion than benefits now. > And I think it's sane to reconsider stuffs every once in a while, and to > clean up anything that may not make all that much sense anymore, which I > think is the case here. > Tl;dr, I feel the benefits brought by those options are very slim at best and > well overshadowed by the confusion they bring, and not worth maintainin
[jira] [Commented] (CASSANDRA-13910) Consider deprecating (then removing) read_repair_chance/dclocal_read_repair_chance
[ https://issues.apache.org/jira/browse/CASSANDRA-13910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196835#comment-16196835 ] Kurt Greaves commented on CASSANDRA-13910: -- Says me. It's probably the least complex code in the whole read path. Even if you kept it through a re-write it would amount to almost nothing. But that's all besides the point. Just because you've got plans doesn't mean you should rush removal of a feature that's existed for years. It's a database. Change doesn't have to happen at lightspeed. > Consider deprecating (then removing) > read_repair_chance/dclocal_read_repair_chance > -- > > Key: CASSANDRA-13910 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13910 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Priority: Minor > Labels: CommunityFeedbackRequested > Fix For: 4.0, 3.11.x > > > First, let me clarify so this is not misunderstood that I'm not *at all* > suggesting to remove the read-repair mechanism of detecting and repairing > inconsistencies between read responses: that mechanism is imo fine and > useful. But the {{read_repair_chance}} and {{dclocal_read_repair_chance}} > have never been about _enabling_ that mechanism, they are about querying all > replicas (even when this is not required by the consistency level) for the > sole purpose of maybe read-repairing some of the replica that wouldn't have > been queried otherwise. Which btw, bring me to reason 1 for considering their > removal: their naming/behavior is super confusing. Over the years, I've seen > countless users (and not only newbies) misunderstanding what those options > do, and as a consequence misunderstand when read-repair itself was happening. > But my 2nd reason for suggesting this is that I suspect > {{read_repair_chance}}/{{dclocal_read_repair_chance}} are, especially > nowadays, more harmful than anything else when enabled. When those option > kick in, what you trade-off is additional resources consumption (all nodes > have to execute the read) for a _fairly remote chance_ of having some > inconsistencies repaired on _some_ replica _a bit faster_ than they would > otherwise be. To justify that last part, let's recall that: > # most inconsistencies are actually fixed by hints in practice; and in the > case where a node stay dead for a long time so that hints ends up timing-out, > you really should repair the node when it comes back (if not simply > re-bootstrapping it). Read-repair probably don't fix _that_ much stuff in > the first place. > # again, read-repair do happen without those options kicking in. If you do > reads at {{QUORUM}}, inconsistencies will eventually get read-repaired all > the same. Just a tiny bit less quickly. > # I suspect almost everyone use a low "chance" for those options at best > (because the extra resources consumption is real), so at the end of the day, > it's up to chance how much faster this fixes inconsistencies. > Overall, I'm having a hard time imagining real cases where that trade-off > really make sense. Don't get me wrong, those options had their places a long > time ago when hints weren't working all that well, but I think they bring > more confusion than benefits now. > And I think it's sane to reconsider stuffs every once in a while, and to > clean up anything that may not make all that much sense anymore, which I > think is the case here. > Tl;dr, I feel the benefits brought by those options are very slim at best and > well overshadowed by the confusion they bring, and not worth maintaining the > code that supports them (which, to be fair, isn't huge, but getting rid of > {{ReadCallback.AsyncRepairRunner}} wouldn't hurt for instance). > Lastly, if the consensus here ends up being that they can have their use in > weird case and that we fill supporting those cases is worth confusing > everyone else and maintaining that code, I would still suggest disabling them > totally by default. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id
[ https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196856#comment-16196856 ] Robert Stupp commented on CASSANDRA-10786: -- I'm ok with the approach to commit this patch as it is (and resolve this ticket) and create a follow-up blocker for 4.0 to pull in a release version of the driver. > Include hash of result set metadata in prepared statement id > > > Key: CASSANDRA-10786 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10786 > Project: Cassandra > Issue Type: Sub-task > Components: CQL >Reporter: Olivier Michallat >Assignee: Alex Petrov >Priority: Minor > Labels: client-impacting, doc-impacting, protocolv5 > Fix For: 4.x > > > *_Initial description:_* > This is a follow-up to CASSANDRA-7910, which was about invalidating a > prepared statement when the table is altered, to force clients to update > their local copy of the metadata. > There's still an issue if multiple clients are connected to the same host. > The first client to execute the query after the cache was invalidated will > receive an UNPREPARED response, re-prepare, and update its local metadata. > But other clients might miss it entirely (the MD5 hasn't changed), and they > will keep using their old metadata. For example: > # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, > clientA and clientB both have a cache of the metadata (columns b and c) > locally > # column a gets added to the table, C* invalidates its cache entry > # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, > re-prepares on the fly and updates its local metadata to (a, b, c) > # prepared statement is now in C*’s cache again, with the same md5 abc123 > # clientB sends an EXECUTE request for id abc123. Because the cache has been > populated again, the query succeeds. But clientB still has not updated its > metadata, it’s still (b,c) > One solution that was suggested is to include a hash of the result set > metadata in the md5. This way the md5 would change at step 3, and any client > using the old md5 would get an UNPREPARED, regardless of whether another > client already reprepared. > - > *_Resolution (2017/02/13):_* > The following changes were made to native protocol v5: > - the PREPARED response includes {{result_metadata_id}}, a hash of the result > set metadata. > - every EXECUTE message must provide {{result_metadata_id}} in addition to > the prepared statement id. If it doesn't match the current one on the server, > it means the client is operating on a stale schema. > - to notify the client, the server returns a ROWS response with a new > {{Metadata_changed}} flag, the new {{result_metadata_id}} and the updated > result metadata (this overrides the {{No_metadata}} flag, even if the client > had requested it) > - the client updates its copy of the result metadata before it decodes the > results. > So the scenario above would now look like: > # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, and > result set (b, c) that hashes to cde456 > # column a gets added to the table, C* does not invalidate its cache entry, > but only updates the result set to (a, b, c) which hashes to fff789 > # client sends an EXECUTE request for (statementId=abc123, resultId=cde456) > and skip_metadata flag > # cde456!=fff789, so C* responds with ROWS(..., no_metadata=false, > metadata_changed=true, new_metadata_id=fff789,col specs for (a,b,c)) > # client updates its column specifications, and will send the next execute > queries with (statementId=abc123, resultId=fff789) > This works the same with multiple clients. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13910) Consider deprecating (then removing) read_repair_chance/dclocal_read_repair_chance
[ https://issues.apache.org/jira/browse/CASSANDRA-13910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196909#comment-16196909 ] Aleksey Yeschenko commented on CASSANDRA-13910: --- bq. Just because you've got plans doesn't mean you should rush removal of a feature that's existed for years. It's not 'just because'. There is two pages of comments here justifying the change, including one in the comment you are replying to. If something has a negative value staying in (which I and others are arguing is the fact), then it should absolutely be removed as soon as possible - which is the next major. > Consider deprecating (then removing) > read_repair_chance/dclocal_read_repair_chance > -- > > Key: CASSANDRA-13910 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13910 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Priority: Minor > Labels: CommunityFeedbackRequested > Fix For: 4.0, 3.11.x > > > First, let me clarify so this is not misunderstood that I'm not *at all* > suggesting to remove the read-repair mechanism of detecting and repairing > inconsistencies between read responses: that mechanism is imo fine and > useful. But the {{read_repair_chance}} and {{dclocal_read_repair_chance}} > have never been about _enabling_ that mechanism, they are about querying all > replicas (even when this is not required by the consistency level) for the > sole purpose of maybe read-repairing some of the replica that wouldn't have > been queried otherwise. Which btw, bring me to reason 1 for considering their > removal: their naming/behavior is super confusing. Over the years, I've seen > countless users (and not only newbies) misunderstanding what those options > do, and as a consequence misunderstand when read-repair itself was happening. > But my 2nd reason for suggesting this is that I suspect > {{read_repair_chance}}/{{dclocal_read_repair_chance}} are, especially > nowadays, more harmful than anything else when enabled. When those option > kick in, what you trade-off is additional resources consumption (all nodes > have to execute the read) for a _fairly remote chance_ of having some > inconsistencies repaired on _some_ replica _a bit faster_ than they would > otherwise be. To justify that last part, let's recall that: > # most inconsistencies are actually fixed by hints in practice; and in the > case where a node stay dead for a long time so that hints ends up timing-out, > you really should repair the node when it comes back (if not simply > re-bootstrapping it). Read-repair probably don't fix _that_ much stuff in > the first place. > # again, read-repair do happen without those options kicking in. If you do > reads at {{QUORUM}}, inconsistencies will eventually get read-repaired all > the same. Just a tiny bit less quickly. > # I suspect almost everyone use a low "chance" for those options at best > (because the extra resources consumption is real), so at the end of the day, > it's up to chance how much faster this fixes inconsistencies. > Overall, I'm having a hard time imagining real cases where that trade-off > really make sense. Don't get me wrong, those options had their places a long > time ago when hints weren't working all that well, but I think they bring > more confusion than benefits now. > And I think it's sane to reconsider stuffs every once in a while, and to > clean up anything that may not make all that much sense anymore, which I > think is the case here. > Tl;dr, I feel the benefits brought by those options are very slim at best and > well overshadowed by the confusion they bring, and not worth maintaining the > code that supports them (which, to be fair, isn't huge, but getting rid of > {{ReadCallback.AsyncRepairRunner}} wouldn't hurt for instance). > Lastly, if the consensus here ends up being that they can have their use in > weird case and that we fill supporting those cases is worth confusing > everyone else and maintaining that code, I would still suggest disabling them > totally by default. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13813) Don't let user drop (or generally break) tables in system_distributed
[ https://issues.apache.org/jira/browse/CASSANDRA-13813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196920#comment-16196920 ] Sylvain Lebresne commented on CASSANDRA-13813: -- bq. But the way ALTER works, we serialise the whole table, including all params and all columns Good point, I hadn't though about that part. Sad! So I guess I would agree in principle about shielding user against clearly dysfunctional behaviors. The problem is that in practice I know for a fact that CASSANDRA-12701 has been an issue for some users, where the tables had been growing way too much, to the point that being able to work-around that by setting a TTL manually probably override concerns about hypothetical future changes to defaults not being picked up. Or to put it another way, none of this is ideal, but I wonder is "repair history tables regularly grows out of control regularly" isn't a bigger problem in practice than "future defaults changes to system tables may not be picked up". Anyway, again, not opposed on the current patch personally, but uneased by it, so wouldn't mind a few additional opinion to see if it's just me being difficult (which is possible). > Don't let user drop (or generally break) tables in system_distributed > - > > Key: CASSANDRA-13813 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13813 > Project: Cassandra > Issue Type: Bug > Components: Distributed Metadata >Reporter: Sylvain Lebresne >Assignee: Aleksey Yeschenko > Fix For: 3.0.x, 3.11.x > > > There is not currently no particular restrictions on schema modifications to > tables of the {{system_distributed}} keyspace. This does mean you can drop > those tables, or even alter them in wrong ways like dropping or renaming > columns. All of which is guaranteed to break stuffs (that is, repair if you > mess up with on of it's table, or MVs if you mess up with > {{view_build_status}}). > I'm pretty sure this was never intended and is an oversight of the condition > on {{ALTERABLE_SYSTEM_KEYSPACES}} in > [ClientState|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L397]. > That condition is such that any keyspace not listed in > {{ALTERABLE_SYSTEM_KEYSPACES}} (which happens to be the case for > {{system_distributed}}) has no specific restrictions whatsoever, while given > the naming it's fair to assume the intention that exactly the opposite. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13813) Don't let user drop (or generally break) tables in system_distributed
[ https://issues.apache.org/jira/browse/CASSANDRA-13813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196923#comment-16196923 ] Aleksey Yeschenko commented on CASSANDRA-13813: --- bq. so wouldn't mind a few additional opinion to see if it's just me being difficult (which is possible). Oh, you've never been difficult. Neither have I. FWIW I don't feel very strongly about this going to 3.0.x vs. this going to 4.0 only. Worst case I'll just fix this for us internally. Seeing that neither of us feels really strongly about this, I don't mind getting some opinions from others, either. I'll throw a signal on IRC and hopefully someone will reply. Either way it's not urgent. > Don't let user drop (or generally break) tables in system_distributed > - > > Key: CASSANDRA-13813 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13813 > Project: Cassandra > Issue Type: Bug > Components: Distributed Metadata >Reporter: Sylvain Lebresne >Assignee: Aleksey Yeschenko > Fix For: 3.0.x, 3.11.x > > > There is not currently no particular restrictions on schema modifications to > tables of the {{system_distributed}} keyspace. This does mean you can drop > those tables, or even alter them in wrong ways like dropping or renaming > columns. All of which is guaranteed to break stuffs (that is, repair if you > mess up with on of it's table, or MVs if you mess up with > {{view_build_status}}). > I'm pretty sure this was never intended and is an oversight of the condition > on {{ALTERABLE_SYSTEM_KEYSPACES}} in > [ClientState|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L397]. > That condition is such that any keyspace not listed in > {{ALTERABLE_SYSTEM_KEYSPACES}} (which happens to be the case for > {{system_distributed}}) has no specific restrictions whatsoever, while given > the naming it's fair to assume the intention that exactly the opposite. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12701) Repair history tables should have TTL and TWCS
[ https://issues.apache.org/jira/browse/CASSANDRA-12701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196954#comment-16196954 ] Aleksey Yeschenko commented on CASSANDRA-12701: --- CASSANDRA-13813 has implications on the workaround suggested here. I favour tackling CASSANDRA-13813 independently, and finding a way to correct CASSANDRA-12701 in this JIRA, without blocking the former on the latter. If you feel the same, or otherwise, or have a third option, please let your opinion known in CASSANDRA-13813 comments. Cheers. > Repair history tables should have TTL and TWCS > -- > > Key: CASSANDRA-12701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12701 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Lohfink > Labels: lhf > Attachments: CASSANDRA-12701.txt > > > Some tools schedule a lot of small subrange repairs which can lead to a lot > of repairs constantly being run. These partitions can grow pretty big in > theory. I dont think much reads from them which might help but its still > kinda wasted disk space. I think a month TTL (longer than gc grace) and maybe > a 1 day twcs window makes sense to me. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10404) Node to Node encryption transitional mode
[ https://issues.apache.org/jira/browse/CASSANDRA-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196999#comment-16196999 ] Stefan Podkowinski commented on CASSANDRA-10404: bq. As I explained in the previous comment, this is the trickiest part of this patch. The upgraded node, after it bounces, must have at least one 3.0 node connect to it 1) Would it make sense to fallback to {{SystemKeyspace.getReleaseVersion(ep)}} in case we don't have the version available through gossip? The method seems to be dead code by now, but the "peers" table is still being updated. bq. Maybe we can add another property under the server_encryption_options, something like enable_legacy_ssl_storage_port. That would also clean up MessagingService#listen a little bit. wdyt? 2) Having that flag next to the new {{enabled}} flag should work. The yaml file needs attention during upgrade anyways. So if you upgrade from 3.0 with ssl enabled, you'd have to set both "enabled: true" and "enable_legacy_ssl_storage_port: true" in your config. 3) Hostname verification: I've pushed a commit [here|https://github.com/spodkowinski/cassandra/commit/fb2ca6ee87ccc5a8dcb92739237f21a49585ec7a] that will honor the {{require_endpoint_verification}} flag for incoming connections. 4) If we want to avoid potential attacks with invalid or stolen certificates, we should also enable {{require_client_auth}} by default. This should not cause any issues, as the truststores need to be managed for outgoing connections anyways. So why not validate incoming connections as well? > Node to Node encryption transitional mode > - > > Key: CASSANDRA-10404 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10404 > Project: Cassandra > Issue Type: New Feature >Reporter: Tom Lewis >Assignee: Jason Brown > Fix For: 4.x > > > Create a transitional mode for encryption that allows encrypted and > unencrypted traffic node-to-node during a change over to encryption from > unencrypted. This alleviates downtime during the switch. > This is similar to CASSANDRA-10559 which is intended for client-to-node -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13813) Don't let user drop (or generally break) tables in system_distributed
[ https://issues.apache.org/jira/browse/CASSANDRA-13813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197002#comment-16197002 ] Jeremiah Jordan commented on CASSANDRA-13813: - I am a little concerned about this change not letting anything be updated, but I do understand the reasons, and I can't really see a way around them. Given that an experience person can still get around this restriction by doing inserts into the schema tables, that is probably enough if there are any future bugs to be worked around. Un-experienced users should not be changing these values by themselves. > Don't let user drop (or generally break) tables in system_distributed > - > > Key: CASSANDRA-13813 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13813 > Project: Cassandra > Issue Type: Bug > Components: Distributed Metadata >Reporter: Sylvain Lebresne >Assignee: Aleksey Yeschenko > Fix For: 3.0.x, 3.11.x > > > There is not currently no particular restrictions on schema modifications to > tables of the {{system_distributed}} keyspace. This does mean you can drop > those tables, or even alter them in wrong ways like dropping or renaming > columns. All of which is guaranteed to break stuffs (that is, repair if you > mess up with on of it's table, or MVs if you mess up with > {{view_build_status}}). > I'm pretty sure this was never intended and is an oversight of the condition > on {{ALTERABLE_SYSTEM_KEYSPACES}} in > [ClientState|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L397]. > That condition is such that any keyspace not listed in > {{ALTERABLE_SYSTEM_KEYSPACES}} (which happens to be the case for > {{system_distributed}}) has no specific restrictions whatsoever, while given > the naming it's fair to assume the intention that exactly the opposite. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197194#comment-16197194 ] ASF GitHub Bot commented on CASSANDRA-13265: Github user christian-esken closed the pull request at: https://github.com/apache/cassandra/pull/95 > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.14, 3.11.0, 4.0 > > Attachments: cassandra-13265-2.2-dtest_stdout.txt, > cassandra-13265-trun-dtest_stdout.txt, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197195#comment-16197195 ] Christian Esken commented on CASSANDRA-13265: - PR closed: https://github.com/apache/cassandra/pull/95 > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.14, 3.11.0, 4.0 > > Attachments: cassandra-13265-2.2-dtest_stdout.txt, > cassandra-13265-trun-dtest_stdout.txt, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197193#comment-16197193 ] ASF GitHub Bot commented on CASSANDRA-13265: Github user christian-esken commented on the issue: https://github.com/apache/cassandra/pull/95 Closing PR, as it has been merged in all relevant banches. See https://issues.apache.org/jira/browse/CASSANDRA-13265 > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.14, 3.11.0, 4.0 > > Attachments: cassandra-13265-2.2-dtest_stdout.txt, > cassandra-13265-trun-dtest_stdout.txt, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements
[ https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197294#comment-16197294 ] Ariel Weisberg commented on CASSANDRA-13442: bq. Considering this seems to be mostly about reducing storage costs so write bound workloads can run "dense" nodes, It's not either/or as these two things compose further reducing costs. Dense nodes don't reduce costs 10-20x. With dense nodes you still need to pay for the additional RAM, disk, and datacenter power, cooling do go up slightly as you stuff more power utilization into each box. Dense nodes also still need to process each request so you also need to scale up read and write throughput which is not always a dimension we are claiming to improve with dense nodes. Dense nodes won't let you increase replication on hardware where you can't fit an entire replica of your data set in most cases. Such as racks or DCs in a region that have limited capacity and by limited I mean many times less capacity. What do we expect from dense nodes? 2x? 4x? Are all use cases going to behave well with various strategies we use to get to dense nodes? bq. While this idea does seem interesting, it seems very complex and you are still trading off replicas for additional storage. The target is 10x to 20x less storage. So additional storage yes, but not the same order of magnitude. In other words we pay something (complexity) and we get something (some replicas require 10-20x less hardware). I also think 10-20x storage savings is a conservative estimate assuming the worse case utilization during an outage where transient data must be stored at transient replicas. With vnodes data would be spread out over several nodes so the additional utilization at each node could be substantially less. bq. Seems that the primary use case would be multiple datacenters with transient replicas, which granted would be nice, Multiple data centers aren't required to benefit. Many people will be able to go from RF=3 in a DC today to RF=5 and lose two nodes with no availability or data loss instead of just one node. There are other permutations where being able to inexpensively add a transient replica can increase availability like RF=3 with one replica at each DC. Write at CL.ALL, read at LOCAL_ONE, fall back to reading from a remote DC if LOCAL_ONE fails. You get strong consistency, but not write availability. Add a transient replicas at each DC and write at EACH_QUORUM and you get write availability after a single node fails. bq. you're probably able to just store less replicas in each datacenter anyway, at least if we had more flexible consistency levels. I'm not sure what you mean by flexibility. Not without losing either availability or consistency under failure scenarios. If you run RF=3 today with strong consistency you can't drop to RF=2 without losing availability if there is a node failure. > Support a means of strongly consistent highly available replication with > tunable storage requirements > - > > Key: CASSANDRA-13442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13442 > Project: Cassandra > Issue Type: Improvement > Components: Compaction, Coordination, Distributed Metadata, Local > Write-Read Paths >Reporter: Ariel Weisberg > > Replication factors like RF=2 can't provide strong consistency and > availability because if a single node is lost it's impossible to reach a > quorum of replicas. Stepping up to RF=3 will allow you to lose a node and > still achieve quorum for reads and writes, but requires committing additional > storage. > The requirement of a quorum for writes/reads doesn't seem to be something > that can be relaxed without additional constraints on queries, but it seems > like it should be possible to relax the requirement that 3 full copies of the > entire data set are kept. What is actually required is a covering data set > for the range and we should be able to achieve a covering data set and high > availability without having three full copies. > After a repair we know that some subset of the data set is fully replicated. > At that point we don't have to read from a quorum of nodes for the repaired > data. It is sufficient to read from a single node for the repaired data and a > quorum of nodes for the unrepaired data. > One way to exploit this would be to have N replicas, say the last N replicas > (where N varies with RF) in the preference list, delete all repaired data > after a repair completes. Subsequent quorum reads will be able to retrieve > the repaired data from any of the two full replicas and the unrepaired data > from a quorum read of any replica including the "transient" replicas. > Configuration
[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements
[ https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197326#comment-16197326 ] Jeff Jirsa commented on CASSANDRA-13442: {quote} Considering this seems to be mostly about reducing storage costs so write bound workloads can run "dense" nodes, and storage is meant to be cheap, it seems to me a less complex alternative would just be to remove the barriers to having large amounts of physical storage per node. {quote} Everything is meant to be cheap, but that doesn't mean it is. In a reasonably sized cluster (for example, 250 nodes * 2 datacenters * 4tb/node = 2 million GB of disk). This ticket would reduce that to something closer to 1,340,000 GB of disk for a cluster of that nature. Enterprise SSDs still retail for $0.50/GB. Let's pretend you get a great deal and you're paying $0.25/GB. The cost differential is $335k vs $500k, for a single cluster. If you're on AWS and using GP2 EBS, that's $0.10/GB/month. The cost differential is $134k/month vs $200k/month, or about $1.6M/year. Per cluster. That's JUST DISK savings, even if we pretend like everything else is free (and it's not). If you feel like there's more ROI to win by having denser storage, I'm sure nobody would mind seeing patches. > Support a means of strongly consistent highly available replication with > tunable storage requirements > - > > Key: CASSANDRA-13442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13442 > Project: Cassandra > Issue Type: Improvement > Components: Compaction, Coordination, Distributed Metadata, Local > Write-Read Paths >Reporter: Ariel Weisberg > > Replication factors like RF=2 can't provide strong consistency and > availability because if a single node is lost it's impossible to reach a > quorum of replicas. Stepping up to RF=3 will allow you to lose a node and > still achieve quorum for reads and writes, but requires committing additional > storage. > The requirement of a quorum for writes/reads doesn't seem to be something > that can be relaxed without additional constraints on queries, but it seems > like it should be possible to relax the requirement that 3 full copies of the > entire data set are kept. What is actually required is a covering data set > for the range and we should be able to achieve a covering data set and high > availability without having three full copies. > After a repair we know that some subset of the data set is fully replicated. > At that point we don't have to read from a quorum of nodes for the repaired > data. It is sufficient to read from a single node for the repaired data and a > quorum of nodes for the unrepaired data. > One way to exploit this would be to have N replicas, say the last N replicas > (where N varies with RF) in the preference list, delete all repaired data > after a repair completes. Subsequent quorum reads will be able to retrieve > the repaired data from any of the two full replicas and the unrepaired data > from a quorum read of any replica including the "transient" replicas. > Configuration for something like this in NTS might be something similar to { > DC1="3-1", DC2="3-2" } where the first value is the replication factor used > for consistency and the second values is the number of transient replicas. If > you specify { DC1=3, DC2=3 } then the number of transient replicas defaults > to 0 and you get the same behavior you have today. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements
[ https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197326#comment-16197326 ] Jeff Jirsa edited comment on CASSANDRA-13442 at 10/9/17 5:17 PM: - {quote} Considering this seems to be mostly about reducing storage costs so write bound workloads can run "dense" nodes, and storage is meant to be cheap, it seems to me a less complex alternative would just be to remove the barriers to having large amounts of physical storage per node. {quote} Everything is meant to be cheap, but that doesn't mean it is. In a reasonably sized cluster (for example, 250 nodes * 2 datacenters * 4tb/node = 2 million GB of disk). This ticket would reduce that to something closer to 1,340,000 GB of disk for a cluster of that nature. Enterprise SSDs still retail for $0.50/GB. Let's pretend you get a great deal and you're paying $0.25/GB. The cost differential is $335k vs $500k, for a single cluster. If you're on AWS and using GP2 EBS, that's $0.10/GB/month. The cost differential is $134k/month vs $200k/month, or about $800k/year. Per cluster. That's JUST DISK savings, even if we pretend like everything else is free (and it's not). If you feel like there's more ROI to win by having denser storage, I'm sure nobody would mind seeing patches. was (Author: jjirsa): {quote} Considering this seems to be mostly about reducing storage costs so write bound workloads can run "dense" nodes, and storage is meant to be cheap, it seems to me a less complex alternative would just be to remove the barriers to having large amounts of physical storage per node. {quote} Everything is meant to be cheap, but that doesn't mean it is. In a reasonably sized cluster (for example, 250 nodes * 2 datacenters * 4tb/node = 2 million GB of disk). This ticket would reduce that to something closer to 1,340,000 GB of disk for a cluster of that nature. Enterprise SSDs still retail for $0.50/GB. Let's pretend you get a great deal and you're paying $0.25/GB. The cost differential is $335k vs $500k, for a single cluster. If you're on AWS and using GP2 EBS, that's $0.10/GB/month. The cost differential is $134k/month vs $200k/month, or about $1.6M/year. Per cluster. That's JUST DISK savings, even if we pretend like everything else is free (and it's not). If you feel like there's more ROI to win by having denser storage, I'm sure nobody would mind seeing patches. > Support a means of strongly consistent highly available replication with > tunable storage requirements > - > > Key: CASSANDRA-13442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13442 > Project: Cassandra > Issue Type: Improvement > Components: Compaction, Coordination, Distributed Metadata, Local > Write-Read Paths >Reporter: Ariel Weisberg > > Replication factors like RF=2 can't provide strong consistency and > availability because if a single node is lost it's impossible to reach a > quorum of replicas. Stepping up to RF=3 will allow you to lose a node and > still achieve quorum for reads and writes, but requires committing additional > storage. > The requirement of a quorum for writes/reads doesn't seem to be something > that can be relaxed without additional constraints on queries, but it seems > like it should be possible to relax the requirement that 3 full copies of the > entire data set are kept. What is actually required is a covering data set > for the range and we should be able to achieve a covering data set and high > availability without having three full copies. > After a repair we know that some subset of the data set is fully replicated. > At that point we don't have to read from a quorum of nodes for the repaired > data. It is sufficient to read from a single node for the repaired data and a > quorum of nodes for the unrepaired data. > One way to exploit this would be to have N replicas, say the last N replicas > (where N varies with RF) in the preference list, delete all repaired data > after a repair completes. Subsequent quorum reads will be able to retrieve > the repaired data from any of the two full replicas and the unrepaired data > from a quorum read of any replica including the "transient" replicas. > Configuration for something like this in NTS might be something similar to { > DC1="3-1", DC2="3-2" } where the first value is the replication factor used > for consistency and the second values is the number of transient replicas. If > you specify { DC1=3, DC2=3 } then the number of transient replicas defaults > to 0 and you get the same behavior you have today. -- This message was sent by Atlassian JIRA (v6.4.14#64029) -
[jira] [Updated] (CASSANDRA-13943) Infinite compaction of L0 SSTables in JBOD
[ https://issues.apache.org/jira/browse/CASSANDRA-13943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Kinder updated CASSANDRA-13943: --- Attachment: debug.log > Infinite compaction of L0 SSTables in JBOD > -- > > Key: CASSANDRA-13943 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13943 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 3.11.0 / Centos 6 >Reporter: Dan Kinder > Attachments: debug.log > > > I recently upgraded from 2.2.6 to 3.11.0. > I am seeing Cassandra loop infinitely compacting the same data over and over. > Attaching logs. > It is compacting two tables, one on /srv/disk10, the other on /srv/disk1. It > does create new SSTables but immediately recompacts again. Note that I am not > inserting anything at the moment, there is no flushing happening on this > table (Memtable switch count has not changed). > My theory is that it somehow thinks those should be compaction candidates. > But they shouldn't be, they are on different disks and I ran nodetool > relocatesstables as well as nodetool compact. So, it tries to compact them > together, but the compaction results in the exact same 2 SSTables on the 2 > disks, because the keys are split by data disk. > This is pretty serious, because all our nodes right now are consuming CPU > doing this for multiple tables, it seems. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-13943) Infinite compaction of L0 SSTables in JBOD
Dan Kinder created CASSANDRA-13943: -- Summary: Infinite compaction of L0 SSTables in JBOD Key: CASSANDRA-13943 URL: https://issues.apache.org/jira/browse/CASSANDRA-13943 Project: Cassandra Issue Type: Bug Components: Compaction Environment: Cassandra 3.11.0 / Centos 6 Reporter: Dan Kinder I recently upgraded from 2.2.6 to 3.11.0. I am seeing Cassandra loop infinitely compacting the same data over and over. Attaching logs. It is compacting two tables, one on /srv/disk10, the other on /srv/disk1. It does create new SSTables but immediately recompacts again. Note that I am not inserting anything at the moment, there is no flushing happening on this table (Memtable switch count has not changed). My theory is that it somehow thinks those should be compaction candidates. But they shouldn't be, they are on different disks and I ran nodetool relocatesstables as well as nodetool compact. So, it tries to compact them together, but the compaction results in the exact same 2 SSTables on the 2 disks, because the keys are split by data disk. This is pretty serious, because all our nodes right now are consuming CPU doing this for multiple tables, it seems. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-13943) Infinite compaction of L0 SSTables in JBOD
[ https://issues.apache.org/jira/browse/CASSANDRA-13943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson reassigned CASSANDRA-13943: --- Assignee: Marcus Eriksson > Infinite compaction of L0 SSTables in JBOD > -- > > Key: CASSANDRA-13943 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13943 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 3.11.0 / Centos 6 >Reporter: Dan Kinder >Assignee: Marcus Eriksson > Attachments: debug.log > > > I recently upgraded from 2.2.6 to 3.11.0. > I am seeing Cassandra loop infinitely compacting the same data over and over. > Attaching logs. > It is compacting two tables, one on /srv/disk10, the other on /srv/disk1. It > does create new SSTables but immediately recompacts again. Note that I am not > inserting anything at the moment, there is no flushing happening on this > table (Memtable switch count has not changed). > My theory is that it somehow thinks those should be compaction candidates. > But they shouldn't be, they are on different disks and I ran nodetool > relocatesstables as well as nodetool compact. So, it tries to compact them > together, but the compaction results in the exact same 2 SSTables on the 2 > disks, because the keys are split by data disk. > This is pretty serious, because all our nodes right now are consuming CPU > doing this for multiple tables, it seems. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13943) Infinite compaction of L0 SSTables in JBOD
[ https://issues.apache.org/jira/browse/CASSANDRA-13943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197350#comment-16197350 ] Marcus Eriksson commented on CASSANDRA-13943: - {{/srv/disk10/..., /srv/disk1/...}} - I guess there is a prefix matching problem somewhere - I'll get a patch out tomorrow > Infinite compaction of L0 SSTables in JBOD > -- > > Key: CASSANDRA-13943 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13943 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 3.11.0 / Centos 6 >Reporter: Dan Kinder > Attachments: debug.log > > > I recently upgraded from 2.2.6 to 3.11.0. > I am seeing Cassandra loop infinitely compacting the same data over and over. > Attaching logs. > It is compacting two tables, one on /srv/disk10, the other on /srv/disk1. It > does create new SSTables but immediately recompacts again. Note that I am not > inserting anything at the moment, there is no flushing happening on this > table (Memtable switch count has not changed). > My theory is that it somehow thinks those should be compaction candidates. > But they shouldn't be, they are on different disks and I ran nodetool > relocatesstables as well as nodetool compact. So, it tries to compact them > together, but the compaction results in the exact same 2 SSTables on the 2 > disks, because the keys are split by data disk. > This is pretty serious, because all our nodes right now are consuming CPU > doing this for multiple tables, it seems. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13930) Avoid grabbing the read lock when checking if compaction strategy should do defragmentation
[ https://issues.apache.org/jira/browse/CASSANDRA-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197374#comment-16197374 ] Jeff Jirsa commented on CASSANDRA-13930: lgtm if dtests are happy (expect it should be fine). > Avoid grabbing the read lock when checking if compaction strategy should do > defragmentation > --- > > Key: CASSANDRA-13930 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13930 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson > Fix For: 3.11.x, 4.x > > > We grab the read lock when checking whether the compaction strategy benefits > from defragmentation, avoid that. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13942) Open Cassandra.yaml for developers to extend custom settings
[ https://issues.apache.org/jira/browse/CASSANDRA-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197383#comment-16197383 ] Blake Eggleston commented on CASSANDRA-13942: - you should be able to achieve this by providing your own {{ConfigurationLoader}} implementation. You can't extend DatabaseDescriptor, but you would be able to configure a class referenced by your index classes. See {{DatabaseDescriptor#loadConfig}} > Open Cassandra.yaml for developers to extend custom settings > > > Key: CASSANDRA-13942 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13942 > Project: Cassandra > Issue Type: Wish > Components: Configuration >Reporter: zhaoyan > > we now try to write one index plugin for cassandra. > we want to put some more settings in cassandra.yaml. and read it in our code. > we find the cassandra use DatabaseDescriptor.java and Config.java to save the > configurations in cassandra.yaml. but we cant extend it > so I advice cassandra provide some interfaces for deleopers to extend custom > settings > Thank you -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements
[ https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197385#comment-16197385 ] DOAN DuyHai commented on CASSANDRA-13442: - {quote}The target is 10x to 20x less storage{quote} As far as I understand, with RF=3, if you remove repaired data on transient replicas, you'll reduce storage by 1/3. Where do you get this 10x - 20x then ? > Support a means of strongly consistent highly available replication with > tunable storage requirements > - > > Key: CASSANDRA-13442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13442 > Project: Cassandra > Issue Type: Improvement > Components: Compaction, Coordination, Distributed Metadata, Local > Write-Read Paths >Reporter: Ariel Weisberg > > Replication factors like RF=2 can't provide strong consistency and > availability because if a single node is lost it's impossible to reach a > quorum of replicas. Stepping up to RF=3 will allow you to lose a node and > still achieve quorum for reads and writes, but requires committing additional > storage. > The requirement of a quorum for writes/reads doesn't seem to be something > that can be relaxed without additional constraints on queries, but it seems > like it should be possible to relax the requirement that 3 full copies of the > entire data set are kept. What is actually required is a covering data set > for the range and we should be able to achieve a covering data set and high > availability without having three full copies. > After a repair we know that some subset of the data set is fully replicated. > At that point we don't have to read from a quorum of nodes for the repaired > data. It is sufficient to read from a single node for the repaired data and a > quorum of nodes for the unrepaired data. > One way to exploit this would be to have N replicas, say the last N replicas > (where N varies with RF) in the preference list, delete all repaired data > after a repair completes. Subsequent quorum reads will be able to retrieve > the repaired data from any of the two full replicas and the unrepaired data > from a quorum read of any replica including the "transient" replicas. > Configuration for something like this in NTS might be something similar to { > DC1="3-1", DC2="3-2" } where the first value is the replication factor used > for consistency and the second values is the number of transient replicas. If > you specify { DC1=3, DC2=3 } then the number of transient replicas defaults > to 0 and you get the same behavior you have today. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13943) Infinite compaction of L0 SSTables in JBOD
[ https://issues.apache.org/jira/browse/CASSANDRA-13943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197459#comment-16197459 ] Dan Kinder commented on CASSANDRA-13943: I do see a questionable {{startsWith}} here: https://github.com/apache/cassandra/blob/7d4d1a32581ff40ed1049833631832054bcf2316/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java#L309 Also here: https://github.com/apache/cassandra/blob/3cec208c40b85e1be0ff8c68fc9d9017945a1ed8/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L570 > Infinite compaction of L0 SSTables in JBOD > -- > > Key: CASSANDRA-13943 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13943 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 3.11.0 / Centos 6 >Reporter: Dan Kinder >Assignee: Marcus Eriksson > Attachments: debug.log > > > I recently upgraded from 2.2.6 to 3.11.0. > I am seeing Cassandra loop infinitely compacting the same data over and over. > Attaching logs. > It is compacting two tables, one on /srv/disk10, the other on /srv/disk1. It > does create new SSTables but immediately recompacts again. Note that I am not > inserting anything at the moment, there is no flushing happening on this > table (Memtable switch count has not changed). > My theory is that it somehow thinks those should be compaction candidates. > But they shouldn't be, they are on different disks and I ran nodetool > relocatesstables as well as nodetool compact. So, it tries to compact them > together, but the compaction results in the exact same 2 SSTables on the 2 > disks, because the keys are split by data disk. > This is pretty serious, because all our nodes right now are consuming CPU > doing this for multiple tables, it seems. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13930) Avoid grabbing the read lock when checking if compaction strategy should do defragmentation
[ https://issues.apache.org/jira/browse/CASSANDRA-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197502#comment-16197502 ] Eduard Tudenhoefner commented on CASSANDRA-13930: - changes LGTM. Looks like most dtests failed because they couldn't clone the repo. > Avoid grabbing the read lock when checking if compaction strategy should do > defragmentation > --- > > Key: CASSANDRA-13930 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13930 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson > Fix For: 3.11.x, 4.x > > > We grab the read lock when checking whether the compaction strategy benefits > from defragmentation, avoid that. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13930) Avoid grabbing the read lock when checking if compaction strategy should do defragmentation
[ https://issues.apache.org/jira/browse/CASSANDRA-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197502#comment-16197502 ] Eduard Tudenhoefner edited comment on CASSANDRA-13930 at 10/9/17 6:58 PM: -- changes LGTM. Looks like majority of the failed dtests are because they couldn't clone the repo. was (Author: eduard.tudenhoefner): changes LGTM. Looks like most dtests failed because they couldn't clone the repo. > Avoid grabbing the read lock when checking if compaction strategy should do > defragmentation > --- > > Key: CASSANDRA-13930 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13930 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson > Fix For: 3.11.x, 4.x > > > We grab the read lock when checking whether the compaction strategy benefits > from defragmentation, avoid that. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13834) nodetool resetlocalschema failed because of JMX registration
[ https://issues.apache.org/jira/browse/CASSANDRA-13834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197515#comment-16197515 ] Eduard Tudenhoefner commented on CASSANDRA-13834: - code changes LGTM but needs an official contributor for a review/commit. [~jjirsa] can you review maybe? I wonder if we should have a more tolerant wrapper around {{MBeanServer}} that just allows you to register an mbean no matter what and without all the hassle of needing to deal with a potential {{InstanceAlreadyExistsException}} or needing to check for {{mbs.isRegistered(objectName)}}. > nodetool resetlocalschema failed because of JMX registration > > > Key: CASSANDRA-13834 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13834 > Project: Cassandra > Issue Type: Bug > Components: Metrics >Reporter: vincent royer >Priority: Minor > Labels: easyfix > Fix For: 3.11.0 > > Attachments: 0001-CASSANDRA-13834.patch > > > nodetool resetlocalschema failed because of the following exception. > This is because the table MBean was already registred in the MBeanServer with > the same name. > 2017-08-31 14:00:57,989 ERROR [InternalResponseStage:18] > CassandraDaemon.java:231 uncaughtException Exception in thread > Thread[InternalResponseStage:18,5,main] > java.lang.RuntimeException: javax.management.InstanceAlreadyExistsException: > org.apache.cassandra.db:type=Tables,key > space=elastic_admin,table=metadata > at > org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:468) > at > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:618) > at > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:592) > at > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:583) > at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:414) > at org.apache.cassandra.config.Schema.addTable(Schema.java:609) > at > java.util.LinkedHashMap$LinkedValues.forEach(LinkedHashMap.java:608) > at > java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080) > at > org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1421) > at > org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1386) > at > org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1336) > at > org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:91) > at > org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13834) nodetool resetlocalschema failed because of JMX registration
[ https://issues.apache.org/jira/browse/CASSANDRA-13834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197515#comment-16197515 ] Eduard Tudenhoefner edited comment on CASSANDRA-13834 at 10/9/17 7:08 PM: -- code changes LGTM but needs an official contributor for a review/commit. [~jjirsa] can you review maybe? I wonder if we should have a more tolerant wrapper around {{MBeanServer}} that just allows you to register an mbean no matter what and without all the hassle of needing to deal with a potential {{InstanceAlreadyExistsException}} or needing to check for {{mbs.isRegistered(objectName)}}. There are a bunch of other places where something like that could happen. was (Author: eduard.tudenhoefner): code changes LGTM but needs an official contributor for a review/commit. [~jjirsa] can you review maybe? I wonder if we should have a more tolerant wrapper around {{MBeanServer}} that just allows you to register an mbean no matter what and without all the hassle of needing to deal with a potential {{InstanceAlreadyExistsException}} or needing to check for {{mbs.isRegistered(objectName)}}. > nodetool resetlocalschema failed because of JMX registration > > > Key: CASSANDRA-13834 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13834 > Project: Cassandra > Issue Type: Bug > Components: Metrics >Reporter: vincent royer >Priority: Minor > Labels: easyfix > Fix For: 3.11.0 > > Attachments: 0001-CASSANDRA-13834.patch > > > nodetool resetlocalschema failed because of the following exception. > This is because the table MBean was already registred in the MBeanServer with > the same name. > 2017-08-31 14:00:57,989 ERROR [InternalResponseStage:18] > CassandraDaemon.java:231 uncaughtException Exception in thread > Thread[InternalResponseStage:18,5,main] > java.lang.RuntimeException: javax.management.InstanceAlreadyExistsException: > org.apache.cassandra.db:type=Tables,key > space=elastic_admin,table=metadata > at > org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:468) > at > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:618) > at > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:592) > at > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:583) > at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:414) > at org.apache.cassandra.config.Schema.addTable(Schema.java:609) > at > java.util.LinkedHashMap$LinkedValues.forEach(LinkedHashMap.java:608) > at > java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080) > at > org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1421) > at > org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1386) > at > org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1336) > at > org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:91) > at > org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13834) nodetool resetlocalschema failed because of JMX registration
[ https://issues.apache.org/jira/browse/CASSANDRA-13834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197545#comment-16197545 ] Jeff Jirsa commented on CASSANDRA-13834: I don't expect to have availability to properly review, even though it looks like a trivial fix. > nodetool resetlocalschema failed because of JMX registration > > > Key: CASSANDRA-13834 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13834 > Project: Cassandra > Issue Type: Bug > Components: Metrics >Reporter: vincent royer >Priority: Minor > Labels: easyfix > Fix For: 3.11.0 > > Attachments: 0001-CASSANDRA-13834.patch > > > nodetool resetlocalschema failed because of the following exception. > This is because the table MBean was already registred in the MBeanServer with > the same name. > 2017-08-31 14:00:57,989 ERROR [InternalResponseStage:18] > CassandraDaemon.java:231 uncaughtException Exception in thread > Thread[InternalResponseStage:18,5,main] > java.lang.RuntimeException: javax.management.InstanceAlreadyExistsException: > org.apache.cassandra.db:type=Tables,key > space=elastic_admin,table=metadata > at > org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:468) > at > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:618) > at > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:592) > at > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:583) > at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:414) > at org.apache.cassandra.config.Schema.addTable(Schema.java:609) > at > java.util.LinkedHashMap$LinkedValues.forEach(LinkedHashMap.java:608) > at > java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080) > at > org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1421) > at > org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1386) > at > org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1336) > at > org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:91) > at > org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id
[ https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197546#comment-16197546 ] Alex Petrov commented on CASSANDRA-10786: - All right, going to commit it tomorrow morning as there're no objections. > Include hash of result set metadata in prepared statement id > > > Key: CASSANDRA-10786 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10786 > Project: Cassandra > Issue Type: Sub-task > Components: CQL >Reporter: Olivier Michallat >Assignee: Alex Petrov >Priority: Minor > Labels: client-impacting, doc-impacting, protocolv5 > Fix For: 4.x > > > *_Initial description:_* > This is a follow-up to CASSANDRA-7910, which was about invalidating a > prepared statement when the table is altered, to force clients to update > their local copy of the metadata. > There's still an issue if multiple clients are connected to the same host. > The first client to execute the query after the cache was invalidated will > receive an UNPREPARED response, re-prepare, and update its local metadata. > But other clients might miss it entirely (the MD5 hasn't changed), and they > will keep using their old metadata. For example: > # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, > clientA and clientB both have a cache of the metadata (columns b and c) > locally > # column a gets added to the table, C* invalidates its cache entry > # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, > re-prepares on the fly and updates its local metadata to (a, b, c) > # prepared statement is now in C*’s cache again, with the same md5 abc123 > # clientB sends an EXECUTE request for id abc123. Because the cache has been > populated again, the query succeeds. But clientB still has not updated its > metadata, it’s still (b,c) > One solution that was suggested is to include a hash of the result set > metadata in the md5. This way the md5 would change at step 3, and any client > using the old md5 would get an UNPREPARED, regardless of whether another > client already reprepared. > - > *_Resolution (2017/02/13):_* > The following changes were made to native protocol v5: > - the PREPARED response includes {{result_metadata_id}}, a hash of the result > set metadata. > - every EXECUTE message must provide {{result_metadata_id}} in addition to > the prepared statement id. If it doesn't match the current one on the server, > it means the client is operating on a stale schema. > - to notify the client, the server returns a ROWS response with a new > {{Metadata_changed}} flag, the new {{result_metadata_id}} and the updated > result metadata (this overrides the {{No_metadata}} flag, even if the client > had requested it) > - the client updates its copy of the result metadata before it decodes the > results. > So the scenario above would now look like: > # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, and > result set (b, c) that hashes to cde456 > # column a gets added to the table, C* does not invalidate its cache entry, > but only updates the result set to (a, b, c) which hashes to fff789 > # client sends an EXECUTE request for (statementId=abc123, resultId=cde456) > and skip_metadata flag > # cde456!=fff789, so C* responds with ROWS(..., no_metadata=false, > metadata_changed=true, new_metadata_id=fff789,col specs for (a,b,c)) > # client updates its column specifications, and will send the next execute > queries with (statementId=abc123, resultId=fff789) > This works the same with multiple clients. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-13834) nodetool resetlocalschema failed because of JMX registration
[ https://issues.apache.org/jira/browse/CASSANDRA-13834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa reassigned CASSANDRA-13834: -- Assignee: vincent royer > nodetool resetlocalschema failed because of JMX registration > > > Key: CASSANDRA-13834 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13834 > Project: Cassandra > Issue Type: Bug > Components: Metrics >Reporter: vincent royer >Assignee: vincent royer >Priority: Minor > Labels: easyfix > Fix For: 3.11.0 > > Attachments: 0001-CASSANDRA-13834.patch > > > nodetool resetlocalschema failed because of the following exception. > This is because the table MBean was already registred in the MBeanServer with > the same name. > 2017-08-31 14:00:57,989 ERROR [InternalResponseStage:18] > CassandraDaemon.java:231 uncaughtException Exception in thread > Thread[InternalResponseStage:18,5,main] > java.lang.RuntimeException: javax.management.InstanceAlreadyExistsException: > org.apache.cassandra.db:type=Tables,key > space=elastic_admin,table=metadata > at > org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:468) > at > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:618) > at > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:592) > at > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:583) > at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:414) > at org.apache.cassandra.config.Schema.addTable(Schema.java:609) > at > java.util.LinkedHashMap$LinkedValues.forEach(LinkedHashMap.java:608) > at > java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080) > at > org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1421) > at > org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1386) > at > org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1336) > at > org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:91) > at > org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13834) nodetool resetlocalschema failed because of JMX registration
[ https://issues.apache.org/jira/browse/CASSANDRA-13834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197545#comment-16197545 ] Jeff Jirsa edited comment on CASSANDRA-13834 at 10/9/17 7:16 PM: - I don't expect to have availability to properly review, even though it looks like a trivial fix. Most reviewers will probably want to see a small regression test, though, which will likely be more effort than the actual patch. was (Author: jjirsa): I don't expect to have availability to properly review, even though it looks like a trivial fix. > nodetool resetlocalschema failed because of JMX registration > > > Key: CASSANDRA-13834 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13834 > Project: Cassandra > Issue Type: Bug > Components: Metrics >Reporter: vincent royer >Assignee: vincent royer >Priority: Minor > Labels: easyfix > Fix For: 3.11.0 > > Attachments: 0001-CASSANDRA-13834.patch > > > nodetool resetlocalschema failed because of the following exception. > This is because the table MBean was already registred in the MBeanServer with > the same name. > 2017-08-31 14:00:57,989 ERROR [InternalResponseStage:18] > CassandraDaemon.java:231 uncaughtException Exception in thread > Thread[InternalResponseStage:18,5,main] > java.lang.RuntimeException: javax.management.InstanceAlreadyExistsException: > org.apache.cassandra.db:type=Tables,key > space=elastic_admin,table=metadata > at > org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:468) > at > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:618) > at > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:592) > at > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:583) > at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:414) > at org.apache.cassandra.config.Schema.addTable(Schema.java:609) > at > java.util.LinkedHashMap$LinkedValues.forEach(LinkedHashMap.java:608) > at > java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080) > at > org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1421) > at > org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1386) > at > org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1336) > at > org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:91) > at > org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id
[ https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197550#comment-16197550 ] Jeff Jirsa commented on CASSANDRA-10786: {quote} . I'll create an additional ticket that would be a 4.0 blocker to pull in the latest release of both drivers and restore build.xml entries with the corresponding versions to make sure this is not getting missed. {quote} sounds good to me. > Include hash of result set metadata in prepared statement id > > > Key: CASSANDRA-10786 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10786 > Project: Cassandra > Issue Type: Sub-task > Components: CQL >Reporter: Olivier Michallat >Assignee: Alex Petrov >Priority: Minor > Labels: client-impacting, doc-impacting, protocolv5 > Fix For: 4.x > > > *_Initial description:_* > This is a follow-up to CASSANDRA-7910, which was about invalidating a > prepared statement when the table is altered, to force clients to update > their local copy of the metadata. > There's still an issue if multiple clients are connected to the same host. > The first client to execute the query after the cache was invalidated will > receive an UNPREPARED response, re-prepare, and update its local metadata. > But other clients might miss it entirely (the MD5 hasn't changed), and they > will keep using their old metadata. For example: > # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, > clientA and clientB both have a cache of the metadata (columns b and c) > locally > # column a gets added to the table, C* invalidates its cache entry > # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, > re-prepares on the fly and updates its local metadata to (a, b, c) > # prepared statement is now in C*’s cache again, with the same md5 abc123 > # clientB sends an EXECUTE request for id abc123. Because the cache has been > populated again, the query succeeds. But clientB still has not updated its > metadata, it’s still (b,c) > One solution that was suggested is to include a hash of the result set > metadata in the md5. This way the md5 would change at step 3, and any client > using the old md5 would get an UNPREPARED, regardless of whether another > client already reprepared. > - > *_Resolution (2017/02/13):_* > The following changes were made to native protocol v5: > - the PREPARED response includes {{result_metadata_id}}, a hash of the result > set metadata. > - every EXECUTE message must provide {{result_metadata_id}} in addition to > the prepared statement id. If it doesn't match the current one on the server, > it means the client is operating on a stale schema. > - to notify the client, the server returns a ROWS response with a new > {{Metadata_changed}} flag, the new {{result_metadata_id}} and the updated > result metadata (this overrides the {{No_metadata}} flag, even if the client > had requested it) > - the client updates its copy of the result metadata before it decodes the > results. > So the scenario above would now look like: > # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, and > result set (b, c) that hashes to cde456 > # column a gets added to the table, C* does not invalidate its cache entry, > but only updates the result set to (a, b, c) which hashes to fff789 > # client sends an EXECUTE request for (statementId=abc123, resultId=cde456) > and skip_metadata flag > # cde456!=fff789, so C* responds with ROWS(..., no_metadata=false, > metadata_changed=true, new_metadata_id=fff789,col specs for (a,b,c)) > # client updates its column specifications, and will send the next execute > queries with (statementId=abc123, resultId=fff789) > This works the same with multiple clients. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements
[ https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197574#comment-16197574 ] Ariel Weisberg commented on CASSANDRA-13442: bq. As far as I understand, with RF=3, if you remove repaired data on transient replicas, you'll reduce storage by 1/3. Where do you get this 10x - 20x then ? 10-20x on transient replicas. Not at full replicas or overall. The new capability is adding replicas without having to commit the full amount of additional hardware. > Support a means of strongly consistent highly available replication with > tunable storage requirements > - > > Key: CASSANDRA-13442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13442 > Project: Cassandra > Issue Type: Improvement > Components: Compaction, Coordination, Distributed Metadata, Local > Write-Read Paths >Reporter: Ariel Weisberg > > Replication factors like RF=2 can't provide strong consistency and > availability because if a single node is lost it's impossible to reach a > quorum of replicas. Stepping up to RF=3 will allow you to lose a node and > still achieve quorum for reads and writes, but requires committing additional > storage. > The requirement of a quorum for writes/reads doesn't seem to be something > that can be relaxed without additional constraints on queries, but it seems > like it should be possible to relax the requirement that 3 full copies of the > entire data set are kept. What is actually required is a covering data set > for the range and we should be able to achieve a covering data set and high > availability without having three full copies. > After a repair we know that some subset of the data set is fully replicated. > At that point we don't have to read from a quorum of nodes for the repaired > data. It is sufficient to read from a single node for the repaired data and a > quorum of nodes for the unrepaired data. > One way to exploit this would be to have N replicas, say the last N replicas > (where N varies with RF) in the preference list, delete all repaired data > after a repair completes. Subsequent quorum reads will be able to retrieve > the repaired data from any of the two full replicas and the unrepaired data > from a quorum read of any replica including the "transient" replicas. > Configuration for something like this in NTS might be something similar to { > DC1="3-1", DC2="3-2" } where the first value is the replication factor used > for consistency and the second values is the number of transient replicas. If > you specify { DC1=3, DC2=3 } then the number of transient replicas defaults > to 0 and you get the same behavior you have today. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements
[ https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197574#comment-16197574 ] Ariel Weisberg edited comment on CASSANDRA-13442 at 10/9/17 7:33 PM: - bq. As far as I understand, with RF=3, if you remove repaired data on transient replicas, you'll reduce storage by 1/3. Where do you get this 10x - 20x then ? 10-20x on transient replicas. Not at full replicas or overall. The new capability is adding replicas without having to commit the full amount of additional hardware. If you are running RF=3 today you might be able to switch to RF=5 with two transient replicas. You would be able tolerate more failures and you might be able to do it without adding additional capacity to your deployment. was (Author: aweisberg): bq. As far as I understand, with RF=3, if you remove repaired data on transient replicas, you'll reduce storage by 1/3. Where do you get this 10x - 20x then ? 10-20x on transient replicas. Not at full replicas or overall. The new capability is adding replicas without having to commit the full amount of additional hardware. > Support a means of strongly consistent highly available replication with > tunable storage requirements > - > > Key: CASSANDRA-13442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13442 > Project: Cassandra > Issue Type: Improvement > Components: Compaction, Coordination, Distributed Metadata, Local > Write-Read Paths >Reporter: Ariel Weisberg > > Replication factors like RF=2 can't provide strong consistency and > availability because if a single node is lost it's impossible to reach a > quorum of replicas. Stepping up to RF=3 will allow you to lose a node and > still achieve quorum for reads and writes, but requires committing additional > storage. > The requirement of a quorum for writes/reads doesn't seem to be something > that can be relaxed without additional constraints on queries, but it seems > like it should be possible to relax the requirement that 3 full copies of the > entire data set are kept. What is actually required is a covering data set > for the range and we should be able to achieve a covering data set and high > availability without having three full copies. > After a repair we know that some subset of the data set is fully replicated. > At that point we don't have to read from a quorum of nodes for the repaired > data. It is sufficient to read from a single node for the repaired data and a > quorum of nodes for the unrepaired data. > One way to exploit this would be to have N replicas, say the last N replicas > (where N varies with RF) in the preference list, delete all repaired data > after a repair completes. Subsequent quorum reads will be able to retrieve > the repaired data from any of the two full replicas and the unrepaired data > from a quorum read of any replica including the "transient" replicas. > Configuration for something like this in NTS might be something similar to { > DC1="3-1", DC2="3-2" } where the first value is the replication factor used > for consistency and the second values is the number of transient replicas. If > you specify { DC1=3, DC2=3 } then the number of transient replicas defaults > to 0 and you get the same behavior you have today. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-4763) SSTableLoader shouldn't get keyspace from path
[ https://issues.apache.org/jira/browse/CASSANDRA-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eduard Tudenhoefner updated CASSANDRA-4763: --- Reviewer: Alex Petrov > SSTableLoader shouldn't get keyspace from path > -- > > Key: CASSANDRA-4763 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4763 > Project: Cassandra > Issue Type: Improvement > Components: Tools >Affects Versions: 1.2.0 beta 1 >Reporter: Nick Bailey >Assignee: Eduard Tudenhoefner >Priority: Minor > Fix For: 4.0 > > > SSTableLoader currently gets the keyspace it is going to load to from the > path of the directoy of sstables it is loading. This isn't really documented > (or I didn't see it), but also isn't really a good way of doing it in general. > {noformat} > this.keyspace = directory.getParentFile().getName(); > {noformat} > We should probably just let users pass the name in. If you are loading a > snapshot the file names will have the keyspace which is slightly better but > people manually creating their own sstables might not format them the same. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements
[ https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197593#comment-16197593 ] DOAN DuyHai commented on CASSANDRA-13442: - Ok I get it, so you can provision transient replicas with much less disk space than normal replicas --> cost saving That being said, I think the cost saving become a real argument for very huge clusters. For average C* users in the range of 10 - 20 nodes, not sure the added complexity in reasoning worths the disk space saving. > Support a means of strongly consistent highly available replication with > tunable storage requirements > - > > Key: CASSANDRA-13442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13442 > Project: Cassandra > Issue Type: Improvement > Components: Compaction, Coordination, Distributed Metadata, Local > Write-Read Paths >Reporter: Ariel Weisberg > > Replication factors like RF=2 can't provide strong consistency and > availability because if a single node is lost it's impossible to reach a > quorum of replicas. Stepping up to RF=3 will allow you to lose a node and > still achieve quorum for reads and writes, but requires committing additional > storage. > The requirement of a quorum for writes/reads doesn't seem to be something > that can be relaxed without additional constraints on queries, but it seems > like it should be possible to relax the requirement that 3 full copies of the > entire data set are kept. What is actually required is a covering data set > for the range and we should be able to achieve a covering data set and high > availability without having three full copies. > After a repair we know that some subset of the data set is fully replicated. > At that point we don't have to read from a quorum of nodes for the repaired > data. It is sufficient to read from a single node for the repaired data and a > quorum of nodes for the unrepaired data. > One way to exploit this would be to have N replicas, say the last N replicas > (where N varies with RF) in the preference list, delete all repaired data > after a repair completes. Subsequent quorum reads will be able to retrieve > the repaired data from any of the two full replicas and the unrepaired data > from a quorum read of any replica including the "transient" replicas. > Configuration for something like this in NTS might be something similar to { > DC1="3-1", DC2="3-2" } where the first value is the replication factor used > for consistency and the second values is the number of transient replicas. If > you specify { DC1=3, DC2=3 } then the number of transient replicas defaults > to 0 and you get the same behavior you have today. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from
[ https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197610#comment-16197610 ] Eduard Tudenhoefner commented on CASSANDRA-13639: - I agree with [~spo...@gmail.com] here because I think having a cmd line parameter seems to be better. Something like {{--localOutboundAddressSSL}} or {{--sslLocalOutboundAddress}}, which defaults to {{FBUtilities.getLocalAddress()}}. If *outboundBindAny* would be set to *true*, then the SSL Socket would be bound to *any* local address, which is most likely not what we want, so not sure why we would ever want to set *outboundBindAny* to *true* anyway. > SSTableLoader always uses hostname to stream files from > --- > > Key: CASSANDRA-13639 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13639 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Jan Karlsson >Assignee: Jan Karlsson > Fix For: 4.x > > Attachments: 13639-trunk > > > I stumbled upon an issue where SSTableLoader was ignoring our routing by > using the wrong interface to send the SSTables to the other nodes. Looking at > the code, it seems that we are using FBUtilities.getLocalAddress() to fetch > out the hostname, even if the yaml file specifies a different host. I am not > sure why we call this function instead of using the routing by leaving it > blank, perhaps someone could enlighten me. > This behaviour comes from the fact that we use a default created > DatabaseDescriptor which does not set the values for listenAddress and > listenInterface. This causes the aforementioned function to retrieve the > hostname at all times, even if it is not the interface used in the yaml file. > I propose we break out the function that handles listenAddress and > listenInterface and call it so that listenAddress or listenInterface is > getting populated in the DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from
[ https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197610#comment-16197610 ] Eduard Tudenhoefner edited comment on CASSANDRA-13639 at 10/9/17 8:01 PM: -- I agree with [~spo...@gmail.com] here because I think having a cmd line parameter seems to be better. Something like {{\-\-localOutboundAddressSSL}} or {{--sslLocalOutboundAddress}}, which defaults to {{FBUtilities.getLocalAddress()}}. If *outboundBindAny* would be set to *true*, then the SSL Socket would be bound to *any* local address, which is most likely not what we want, so not sure why we would ever want to set *outboundBindAny* to *true* anyway. was (Author: eduard.tudenhoefner): I agree with [~spo...@gmail.com] here because I think having a cmd line parameter seems to be better. Something like {{--localOutboundAddressSSL}} or {{--sslLocalOutboundAddress}}, which defaults to {{FBUtilities.getLocalAddress()}}. If *outboundBindAny* would be set to *true*, then the SSL Socket would be bound to *any* local address, which is most likely not what we want, so not sure why we would ever want to set *outboundBindAny* to *true* anyway. > SSTableLoader always uses hostname to stream files from > --- > > Key: CASSANDRA-13639 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13639 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Jan Karlsson >Assignee: Jan Karlsson > Fix For: 4.x > > Attachments: 13639-trunk > > > I stumbled upon an issue where SSTableLoader was ignoring our routing by > using the wrong interface to send the SSTables to the other nodes. Looking at > the code, it seems that we are using FBUtilities.getLocalAddress() to fetch > out the hostname, even if the yaml file specifies a different host. I am not > sure why we call this function instead of using the routing by leaving it > blank, perhaps someone could enlighten me. > This behaviour comes from the fact that we use a default created > DatabaseDescriptor which does not set the values for listenAddress and > listenInterface. This causes the aforementioned function to retrieve the > hostname at all times, even if it is not the interface used in the yaml file. > I propose we break out the function that handles listenAddress and > listenInterface and call it so that listenAddress or listenInterface is > getting populated in the DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements
[ https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197621#comment-16197621 ] Ariel Weisberg commented on CASSANDRA-13442: bq. For average C* users in the range of 10 - 20 nodes, not sure the added complexity in reasoning worths the disk space saving. They don't have to reason about it if they don't enable it. It's pay for what you use. > Support a means of strongly consistent highly available replication with > tunable storage requirements > - > > Key: CASSANDRA-13442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13442 > Project: Cassandra > Issue Type: Improvement > Components: Compaction, Coordination, Distributed Metadata, Local > Write-Read Paths >Reporter: Ariel Weisberg > > Replication factors like RF=2 can't provide strong consistency and > availability because if a single node is lost it's impossible to reach a > quorum of replicas. Stepping up to RF=3 will allow you to lose a node and > still achieve quorum for reads and writes, but requires committing additional > storage. > The requirement of a quorum for writes/reads doesn't seem to be something > that can be relaxed without additional constraints on queries, but it seems > like it should be possible to relax the requirement that 3 full copies of the > entire data set are kept. What is actually required is a covering data set > for the range and we should be able to achieve a covering data set and high > availability without having three full copies. > After a repair we know that some subset of the data set is fully replicated. > At that point we don't have to read from a quorum of nodes for the repaired > data. It is sufficient to read from a single node for the repaired data and a > quorum of nodes for the unrepaired data. > One way to exploit this would be to have N replicas, say the last N replicas > (where N varies with RF) in the preference list, delete all repaired data > after a repair completes. Subsequent quorum reads will be able to retrieve > the repaired data from any of the two full replicas and the unrepaired data > from a quorum read of any replica including the "transient" replicas. > Configuration for something like this in NTS might be something similar to { > DC1="3-1", DC2="3-2" } where the first value is the replication factor used > for consistency and the second values is the number of transient replicas. If > you specify { DC1=3, DC2=3 } then the number of transient replicas defaults > to 0 and you get the same behavior you have today. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements
[ https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197624#comment-16197624 ] DOAN DuyHai commented on CASSANDRA-13442: - I did not mean about end-users, I meant about core C* developers. We need to introduce some code change in order to accomodate the asymmetry between replicas in the code base. > Support a means of strongly consistent highly available replication with > tunable storage requirements > - > > Key: CASSANDRA-13442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13442 > Project: Cassandra > Issue Type: Improvement > Components: Compaction, Coordination, Distributed Metadata, Local > Write-Read Paths >Reporter: Ariel Weisberg > > Replication factors like RF=2 can't provide strong consistency and > availability because if a single node is lost it's impossible to reach a > quorum of replicas. Stepping up to RF=3 will allow you to lose a node and > still achieve quorum for reads and writes, but requires committing additional > storage. > The requirement of a quorum for writes/reads doesn't seem to be something > that can be relaxed without additional constraints on queries, but it seems > like it should be possible to relax the requirement that 3 full copies of the > entire data set are kept. What is actually required is a covering data set > for the range and we should be able to achieve a covering data set and high > availability without having three full copies. > After a repair we know that some subset of the data set is fully replicated. > At that point we don't have to read from a quorum of nodes for the repaired > data. It is sufficient to read from a single node for the repaired data and a > quorum of nodes for the unrepaired data. > One way to exploit this would be to have N replicas, say the last N replicas > (where N varies with RF) in the preference list, delete all repaired data > after a repair completes. Subsequent quorum reads will be able to retrieve > the repaired data from any of the two full replicas and the unrepaired data > from a quorum read of any replica including the "transient" replicas. > Configuration for something like this in NTS might be something similar to { > DC1="3-1", DC2="3-2" } where the first value is the replication factor used > for consistency and the second values is the number of transient replicas. If > you specify { DC1=3, DC2=3 } then the number of transient replicas defaults > to 0 and you get the same behavior you have today. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements
[ https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197649#comment-16197649 ] Ariel Weisberg commented on CASSANDRA-13442: bq. I did not mean about end-users, I meant about core C* developers. We need to introduce some code change in order to accomodate the asymmetry between replicas in the code base. I agree. I don't think that's something we can quantify until someone submits a patch with unit and integration tests so we can weight the cost against the measured gains and tradeoffs of a real implementation. Some of it might end up being part of overlapping functionality. I can hope. > Support a means of strongly consistent highly available replication with > tunable storage requirements > - > > Key: CASSANDRA-13442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13442 > Project: Cassandra > Issue Type: Improvement > Components: Compaction, Coordination, Distributed Metadata, Local > Write-Read Paths >Reporter: Ariel Weisberg > > Replication factors like RF=2 can't provide strong consistency and > availability because if a single node is lost it's impossible to reach a > quorum of replicas. Stepping up to RF=3 will allow you to lose a node and > still achieve quorum for reads and writes, but requires committing additional > storage. > The requirement of a quorum for writes/reads doesn't seem to be something > that can be relaxed without additional constraints on queries, but it seems > like it should be possible to relax the requirement that 3 full copies of the > entire data set are kept. What is actually required is a covering data set > for the range and we should be able to achieve a covering data set and high > availability without having three full copies. > After a repair we know that some subset of the data set is fully replicated. > At that point we don't have to read from a quorum of nodes for the repaired > data. It is sufficient to read from a single node for the repaired data and a > quorum of nodes for the unrepaired data. > One way to exploit this would be to have N replicas, say the last N replicas > (where N varies with RF) in the preference list, delete all repaired data > after a repair completes. Subsequent quorum reads will be able to retrieve > the repaired data from any of the two full replicas and the unrepaired data > from a quorum read of any replica including the "transient" replicas. > Configuration for something like this in NTS might be something similar to { > DC1="3-1", DC2="3-2" } where the first value is the replication factor used > for consistency and the second values is the number of transient replicas. If > you specify { DC1=3, DC2=3 } then the number of transient replicas defaults > to 0 and you get the same behavior you have today. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements
[ https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197649#comment-16197649 ] Ariel Weisberg edited comment on CASSANDRA-13442 at 10/9/17 8:28 PM: - bq. I did not mean about end-users, I meant about core C* developers. We need to introduce some code change in order to accomodate the asymmetry between replicas in the code base. I agree. I don't think that's something we can quantify until someone submits a patch with unit and integration tests so we can weigh the cost against the measured gains and tradeoffs of a real implementation. Some of it might end up being part of overlapping functionality. I can hope. was (Author: aweisberg): bq. I did not mean about end-users, I meant about core C* developers. We need to introduce some code change in order to accomodate the asymmetry between replicas in the code base. I agree. I don't think that's something we can quantify until someone submits a patch with unit and integration tests so we can weight the cost against the measured gains and tradeoffs of a real implementation. Some of it might end up being part of overlapping functionality. I can hope. > Support a means of strongly consistent highly available replication with > tunable storage requirements > - > > Key: CASSANDRA-13442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13442 > Project: Cassandra > Issue Type: Improvement > Components: Compaction, Coordination, Distributed Metadata, Local > Write-Read Paths >Reporter: Ariel Weisberg > > Replication factors like RF=2 can't provide strong consistency and > availability because if a single node is lost it's impossible to reach a > quorum of replicas. Stepping up to RF=3 will allow you to lose a node and > still achieve quorum for reads and writes, but requires committing additional > storage. > The requirement of a quorum for writes/reads doesn't seem to be something > that can be relaxed without additional constraints on queries, but it seems > like it should be possible to relax the requirement that 3 full copies of the > entire data set are kept. What is actually required is a covering data set > for the range and we should be able to achieve a covering data set and high > availability without having three full copies. > After a repair we know that some subset of the data set is fully replicated. > At that point we don't have to read from a quorum of nodes for the repaired > data. It is sufficient to read from a single node for the repaired data and a > quorum of nodes for the unrepaired data. > One way to exploit this would be to have N replicas, say the last N replicas > (where N varies with RF) in the preference list, delete all repaired data > after a repair completes. Subsequent quorum reads will be able to retrieve > the repaired data from any of the two full replicas and the unrepaired data > from a quorum read of any replica including the "transient" replicas. > Configuration for something like this in NTS might be something similar to { > DC1="3-1", DC2="3-2" } where the first value is the replication factor used > for consistency and the second values is the number of transient replicas. If > you specify { DC1=3, DC2=3 } then the number of transient replicas defaults > to 0 and you get the same behavior you have today. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13943) Infinite compaction of L0 SSTables in JBOD
[ https://issues.apache.org/jira/browse/CASSANDRA-13943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197682#comment-16197682 ] Dan Kinder commented on CASSANDRA-13943: FYI: {noformat} data_file_directories: - /srv/disk1/cassandra-data - /srv/disk2/cassandra-data - /srv/disk3/cassandra-data - /srv/disk4/cassandra-data - /srv/disk5/cassandra-data - /srv/disk6/cassandra-data - /srv/disk7/cassandra-data - /srv/disk8/cassandra-data - /srv/disk9/cassandra-data - /srv/disk10/cassandra-data - /srv/disk11/cassandra-data - /srv/disk12/cassandra-data {noformat} > Infinite compaction of L0 SSTables in JBOD > -- > > Key: CASSANDRA-13943 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13943 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 3.11.0 / Centos 6 >Reporter: Dan Kinder >Assignee: Marcus Eriksson > Attachments: debug.log > > > I recently upgraded from 2.2.6 to 3.11.0. > I am seeing Cassandra loop infinitely compacting the same data over and over. > Attaching logs. > It is compacting two tables, one on /srv/disk10, the other on /srv/disk1. It > does create new SSTables but immediately recompacts again. Note that I am not > inserting anything at the moment, there is no flushing happening on this > table (Memtable switch count has not changed). > My theory is that it somehow thinks those should be compaction candidates. > But they shouldn't be, they are on different disks and I ran nodetool > relocatesstables as well as nodetool compact. So, it tries to compact them > together, but the compaction results in the exact same 2 SSTables on the 2 > disks, because the keys are split by data disk. > This is pretty serious, because all our nodes right now are consuming CPU > doing this for multiple tables, it seems. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-13944) Throw descriptive errors for mixed mode repair attempts
Blake Eggleston created CASSANDRA-13944: --- Summary: Throw descriptive errors for mixed mode repair attempts Key: CASSANDRA-13944 URL: https://issues.apache.org/jira/browse/CASSANDRA-13944 Project: Cassandra Issue Type: Bug Components: Repair Reporter: Blake Eggleston Assignee: Blake Eggleston Priority: Minor Fix For: 4.0 We often make breaking changes to streaming and repair between major versions, and don't usually support either in mixed mode clusters. Streaming connections check protocol versions, but repair message handling doesn't, which means cryptic exceptions show up in the logs when operators forget to turn off whatever's scheduling repairs on their cluster. Refusing to send or receive repair messages to/ from incompatible messaging service versions, and throwing a descriptive exception would make it clearer why repair is not working, as well as prevent any potentially unexpected behavior. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13943) Infinite compaction of L0 SSTables in JBOD
[ https://issues.apache.org/jira/browse/CASSANDRA-13943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197459#comment-16197459 ] Dan Kinder edited comment on CASSANDRA-13943 at 10/9/17 11:45 PM: -- I do see a questionable {{startsWith}} in a few places: https://github.com/apache/cassandra/blob/ba87ab4e954ad2e537f6690953bd7ebaa069f5cd/src/java/org/apache/cassandra/db/Directories.java#L281 https://github.com/apache/cassandra/blob/7d4d1a32581ff40ed1049833631832054bcf2316/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java#L309 https://github.com/apache/cassandra/blob/3cec208c40b85e1be0ff8c68fc9d9017945a1ed8/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L570 was (Author: dkinder): I do see a questionable {{startsWith}} here: https://github.com/apache/cassandra/blob/7d4d1a32581ff40ed1049833631832054bcf2316/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java#L309 Also here: https://github.com/apache/cassandra/blob/3cec208c40b85e1be0ff8c68fc9d9017945a1ed8/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L570 > Infinite compaction of L0 SSTables in JBOD > -- > > Key: CASSANDRA-13943 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13943 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 3.11.0 / Centos 6 >Reporter: Dan Kinder >Assignee: Marcus Eriksson > Attachments: debug.log > > > I recently upgraded from 2.2.6 to 3.11.0. > I am seeing Cassandra loop infinitely compacting the same data over and over. > Attaching logs. > It is compacting two tables, one on /srv/disk10, the other on /srv/disk1. It > does create new SSTables but immediately recompacts again. Note that I am not > inserting anything at the moment, there is no flushing happening on this > table (Memtable switch count has not changed). > My theory is that it somehow thinks those should be compaction candidates. > But they shouldn't be, they are on different disks and I ran nodetool > relocatesstables as well as nodetool compact. So, it tries to compact them > together, but the compaction results in the exact same 2 SSTables on the 2 > disks, because the keys are split by data disk. > This is pretty serious, because all our nodes right now are consuming CPU > doing this for multiple tables, it seems. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13475) First version of pluggable storage engine API.
[ https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dikang Gu updated CASSANDRA-13475: -- Summary: First version of pluggable storage engine API. (was: Define pluggable storage engine API.) > First version of pluggable storage engine API. > -- > > Key: CASSANDRA-13475 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13475 > Project: Cassandra > Issue Type: Sub-task >Reporter: Dikang Gu > > In order to support pluggable storage engine, we need to define a unified > interface/API, which can allow us to plug in different storage engines for > different requirements. > In very high level, the storage engine interface should include APIs to: > 1. Apply update into the engine. > 2. Query data from the engine. > 3. Stream data in/out to/from the engine. > 4. Table operations, like create/drop/truncate a table, etc. > 5. Various stats about the engine. > I create this ticket to start the discussions about the interface. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-13475) First version of pluggable storage engine API.
[ https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dikang Gu reassigned CASSANDRA-13475: - Assignee: Dikang Gu > First version of pluggable storage engine API. > -- > > Key: CASSANDRA-13475 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13475 > Project: Cassandra > Issue Type: Sub-task >Reporter: Dikang Gu >Assignee: Dikang Gu > > In order to support pluggable storage engine, we need to define a unified > interface/API, which can allow us to plug in different storage engines for > different requirements. > In very high level, the storage engine interface should include APIs to: > 1. Apply update into the engine. > 2. Query data from the engine. > 3. Stream data in/out to/from the engine. > 4. Table operations, like create/drop/truncate a table, etc. > 5. Various stats about the engine. > I create this ticket to start the discussions about the interface. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13848) Allow sstabledump to do a json object per partition to better handle large sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wern updated CASSANDRA-13848: --- Reviewer: Jeff Jirsa Status: Patch Available (was: Open) >From 834cab8a0a67dbbefa608ddd47109bb9883025a2 Mon Sep 17 00:00:00 2001 From: Kevin Wern Date: Mon, 9 Oct 2017 04:26:25 -0400 Subject: [PATCH] sstabledump: add -l option for jsonl --- .../apache/cassandra/tools/JsonTransformer.java| 35 +- .../org/apache/cassandra/tools/SSTableExport.java | 8 + 2 files changed, 36 insertions(+), 7 deletions(-) diff --git a/src/java/org/apache/cassandra/tools/JsonTransformer.java b/src/java/org/apache/cassandra/tools/JsonTransformer.java index e6aaf07..0c7ed7e 100644 --- a/src/java/org/apache/cassandra/tools/JsonTransformer.java +++ b/src/java/org/apache/cassandra/tools/JsonTransformer.java @@ -56,6 +56,7 @@ import org.codehaus.jackson.JsonGenerator; import org.codehaus.jackson.impl.Indenter; import org.codehaus.jackson.util.DefaultPrettyPrinter.NopIndenter; import org.codehaus.jackson.util.DefaultPrettyPrinter; +import org.codehaus.jackson.util.MinimalPrettyPrinter; public final class JsonTransformer { @@ -78,17 +79,26 @@ public final class JsonTransformer private long currentPosition = 0; -private JsonTransformer(JsonGenerator json, ISSTableScanner currentScanner, boolean rawTime, TableMetadata metadata) +private JsonTransformer(JsonGenerator json, ISSTableScanner currentScanner, boolean rawTime, TableMetadata metadata, boolean isJsonLines) { this.json = json; this.metadata = metadata; this.currentScanner = currentScanner; this.rawTime = rawTime; -DefaultPrettyPrinter prettyPrinter = new DefaultPrettyPrinter(); -prettyPrinter.indentObjectsWith(objectIndenter); -prettyPrinter.indentArraysWith(arrayIndenter); -json.setPrettyPrinter(prettyPrinter); +if (isJsonLines) +{ +MinimalPrettyPrinter minimalPrettyPrinter = new MinimalPrettyPrinter(); +minimalPrettyPrinter.setRootValueSeparator("\n"); +json.setPrettyPrinter(minimalPrettyPrinter); +} +else +{ +DefaultPrettyPrinter prettyPrinter = new DefaultPrettyPrinter(); +prettyPrinter.indentObjectsWith(objectIndenter); +prettyPrinter.indentArraysWith(arrayIndenter); +json.setPrettyPrinter(prettyPrinter); +} } public static void toJson(ISSTableScanner currentScanner, Stream partitions, boolean rawTime, TableMetadata metadata, OutputStream out) @@ -96,18 +106,28 @@ public final class JsonTransformer { try (JsonGenerator json = jsonFactory.createJsonGenerator(new OutputStreamWriter(out, StandardCharsets.UTF_8))) { -JsonTransformer transformer = new JsonTransformer(json, currentScanner, rawTime, metadata); +JsonTransformer transformer = new JsonTransformer(json, currentScanner, rawTime, metadata, false); json.writeStartArray(); partitions.forEach(transformer::serializePartition); json.writeEndArray(); } } +public static void toJsonLines(ISSTableScanner currentScanner, Stream partitions, boolean rawTime, TableMetadata metadata, OutputStream out) +throws IOException +{ +try (JsonGenerator json = jsonFactory.createJsonGenerator(new OutputStreamWriter(out, StandardCharsets.UTF_8))) +{ +JsonTransformer transformer = new JsonTransformer(json, currentScanner, rawTime, metadata, true); +partitions.forEach(transformer::serializePartition); +} +} + public static void keysToJson(ISSTableScanner currentScanner, Stream keys, boolean rawTime, TableMetadata metadata, OutputStream out) throws IOException { try (JsonGenerator json = jsonFactory.createJsonGenerator(new OutputStreamWriter(out, StandardCharsets.UTF_8))) { -JsonTransformer transformer = new JsonTransformer(json, currentScanner, rawTime, metadata); +JsonTransformer transformer = new JsonTransformer(json, currentScanner, rawTime, metadata, false); json.writeStartArray(); keys.forEach(transformer::serializePartitionKey); json.writeEndArray(); @@ -221,6 +241,7 @@ public final class JsonTransformer json.writeEndObject(); } } + catch (IOException e) { String key = metadata.partitionKeyType.getString(partition.partitionKey().getKey()); diff --git a/src/java/org/apache/cassandra/tools/SSTableExport.java b/src/java/org/apache/cassandra/tools/SSTableExport.java index 95e3ed6..4079ee7 100644 --- a/src/java/org/apache/cassandra/tools/SSTableExport.java +++ b/src/java/org/apache/cassandra/tools/SSTableExport.java
[jira] [Issue Comment Deleted] (CASSANDRA-13848) Allow sstabledump to do a json object per partition to better handle large sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wern updated CASSANDRA-13848: --- Comment: was deleted (was: From 834cab8a0a67dbbefa608ddd47109bb9883025a2 Mon Sep 17 00:00:00 2001 From: Kevin Wern Date: Mon, 9 Oct 2017 04:26:25 -0400 Subject: [PATCH] sstabledump: add -l option for jsonl --- .../apache/cassandra/tools/JsonTransformer.java| 35 +- .../org/apache/cassandra/tools/SSTableExport.java | 8 + 2 files changed, 36 insertions(+), 7 deletions(-) diff --git a/src/java/org/apache/cassandra/tools/JsonTransformer.java b/src/java/org/apache/cassandra/tools/JsonTransformer.java index e6aaf07..0c7ed7e 100644 --- a/src/java/org/apache/cassandra/tools/JsonTransformer.java +++ b/src/java/org/apache/cassandra/tools/JsonTransformer.java @@ -56,6 +56,7 @@ import org.codehaus.jackson.JsonGenerator; import org.codehaus.jackson.impl.Indenter; import org.codehaus.jackson.util.DefaultPrettyPrinter.NopIndenter; import org.codehaus.jackson.util.DefaultPrettyPrinter; +import org.codehaus.jackson.util.MinimalPrettyPrinter; public final class JsonTransformer { @@ -78,17 +79,26 @@ public final class JsonTransformer private long currentPosition = 0; -private JsonTransformer(JsonGenerator json, ISSTableScanner currentScanner, boolean rawTime, TableMetadata metadata) +private JsonTransformer(JsonGenerator json, ISSTableScanner currentScanner, boolean rawTime, TableMetadata metadata, boolean isJsonLines) { this.json = json; this.metadata = metadata; this.currentScanner = currentScanner; this.rawTime = rawTime; -DefaultPrettyPrinter prettyPrinter = new DefaultPrettyPrinter(); -prettyPrinter.indentObjectsWith(objectIndenter); -prettyPrinter.indentArraysWith(arrayIndenter); -json.setPrettyPrinter(prettyPrinter); +if (isJsonLines) +{ +MinimalPrettyPrinter minimalPrettyPrinter = new MinimalPrettyPrinter(); +minimalPrettyPrinter.setRootValueSeparator("\n"); +json.setPrettyPrinter(minimalPrettyPrinter); +} +else +{ +DefaultPrettyPrinter prettyPrinter = new DefaultPrettyPrinter(); +prettyPrinter.indentObjectsWith(objectIndenter); +prettyPrinter.indentArraysWith(arrayIndenter); +json.setPrettyPrinter(prettyPrinter); +} } public static void toJson(ISSTableScanner currentScanner, Stream partitions, boolean rawTime, TableMetadata metadata, OutputStream out) @@ -96,18 +106,28 @@ public final class JsonTransformer { try (JsonGenerator json = jsonFactory.createJsonGenerator(new OutputStreamWriter(out, StandardCharsets.UTF_8))) { -JsonTransformer transformer = new JsonTransformer(json, currentScanner, rawTime, metadata); +JsonTransformer transformer = new JsonTransformer(json, currentScanner, rawTime, metadata, false); json.writeStartArray(); partitions.forEach(transformer::serializePartition); json.writeEndArray(); } } +public static void toJsonLines(ISSTableScanner currentScanner, Stream partitions, boolean rawTime, TableMetadata metadata, OutputStream out) +throws IOException +{ +try (JsonGenerator json = jsonFactory.createJsonGenerator(new OutputStreamWriter(out, StandardCharsets.UTF_8))) +{ +JsonTransformer transformer = new JsonTransformer(json, currentScanner, rawTime, metadata, true); +partitions.forEach(transformer::serializePartition); +} +} + public static void keysToJson(ISSTableScanner currentScanner, Stream keys, boolean rawTime, TableMetadata metadata, OutputStream out) throws IOException { try (JsonGenerator json = jsonFactory.createJsonGenerator(new OutputStreamWriter(out, StandardCharsets.UTF_8))) { -JsonTransformer transformer = new JsonTransformer(json, currentScanner, rawTime, metadata); +JsonTransformer transformer = new JsonTransformer(json, currentScanner, rawTime, metadata, false); json.writeStartArray(); keys.forEach(transformer::serializePartitionKey); json.writeEndArray(); @@ -221,6 +241,7 @@ public final class JsonTransformer json.writeEndObject(); } } + catch (IOException e) { String key = metadata.partitionKeyType.getString(partition.partitionKey().getKey()); diff --git a/src/java/org/apache/cassandra/tools/SSTableExport.java b/src/java/org/apache/cassandra/tools/SSTableExport.java index 95e3ed6..4079ee7 100644 --- a/src/java/org/apache/cassandra/tools/SSTableExport.java +++ b/src/java/org/apache/cassandra/tools/SSTableExport.java @@ -62,6 +62,7 @@ public class SSTable
[jira] [Updated] (CASSANDRA-13848) Allow sstabledump to do a json object per partition to better handle large sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wern updated CASSANDRA-13848: --- Attachment: 0001-sstabledump-add-l-option-for-jsonl.patch > Allow sstabledump to do a json object per partition to better handle large > sstables > --- > > Key: CASSANDRA-13848 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13848 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jeff Jirsa >Assignee: Kevin Wern >Priority: Trivial > Labels: lhf > Attachments: 0001-sstabledump-add-l-option-for-jsonl.patch > > > sstable2json / sstabledump make a huge json document of the whole file. For > very large sstables this makes it impossible to load in memory to do anything > with it. Allowing users to Break it into small json objects per partition > would be useful. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13848) Allow sstabledump to do a json object per partition to better handle large sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197921#comment-16197921 ] Kevin Wern commented on CASSANDRA-13848: Took longer than I expected to revisit this, but above is my attempt. > Allow sstabledump to do a json object per partition to better handle large > sstables > --- > > Key: CASSANDRA-13848 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13848 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jeff Jirsa >Assignee: Kevin Wern >Priority: Trivial > Labels: lhf > Attachments: 0001-sstabledump-add-l-option-for-jsonl.patch > > > sstable2json / sstabledump make a huge json document of the whole file. For > very large sstables this makes it impossible to load in memory to do anything > with it. Allowing users to Break it into small json objects per partition > would be useful. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13942) Open Cassandra.yaml for developers to extend custom settings
[ https://issues.apache.org/jira/browse/CASSANDRA-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197938#comment-16197938 ] zhaoyan commented on CASSANDRA-13942: - Hi [~bdeggleston] Thank you advice. I can achieve this by creating a new ConfigurationLoader But I dont think it is a friendly way to extend by create a new ConfigurationLoader。 Another new ConfigurationLoader may be designed to load configurations from DB, properties, network etc source other than yaml. I only want to add more settings to cassandra.yaml。 > Open Cassandra.yaml for developers to extend custom settings > > > Key: CASSANDRA-13942 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13942 > Project: Cassandra > Issue Type: Wish > Components: Configuration >Reporter: zhaoyan > > we now try to write one index plugin for cassandra. > we want to put some more settings in cassandra.yaml. and read it in our code. > we find the cassandra use DatabaseDescriptor.java and Config.java to save the > configurations in cassandra.yaml. but we cant extend it > so I advice cassandra provide some interfaces for deleopers to extend custom > settings > Thank you -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13942) Open Cassandra.yaml for developers to extend custom settings
[ https://issues.apache.org/jira/browse/CASSANDRA-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197938#comment-16197938 ] zhaoyan edited comment on CASSANDRA-13942 at 10/10/17 12:29 AM: Hi [~bdeggleston] Thank you for your advice. I can achieve this by creating a new ConfigurationLoader But I dont think it is a friendly way to extend by create a new ConfigurationLoader。 Another new ConfigurationLoader may be designed to load configurations from DB, properties, network etc source other than yaml. I only want to add more settings to cassandra.yaml。 was (Author: zhaoyan): Hi [~bdeggleston] Thank you advice. I can achieve this by creating a new ConfigurationLoader But I dont think it is a friendly way to extend by create a new ConfigurationLoader。 Another new ConfigurationLoader may be designed to load configurations from DB, properties, network etc source other than yaml. I only want to add more settings to cassandra.yaml。 > Open Cassandra.yaml for developers to extend custom settings > > > Key: CASSANDRA-13942 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13942 > Project: Cassandra > Issue Type: Wish > Components: Configuration >Reporter: zhaoyan > > we now try to write one index plugin for cassandra. > we want to put some more settings in cassandra.yaml. and read it in our code. > we find the cassandra use DatabaseDescriptor.java and Config.java to save the > configurations in cassandra.yaml. but we cant extend it > so I advice cassandra provide some interfaces for deleopers to extend custom > settings > Thank you -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13942) Open Cassandra.yaml for developers to extend custom settings
[ https://issues.apache.org/jira/browse/CASSANDRA-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197938#comment-16197938 ] zhaoyan edited comment on CASSANDRA-13942 at 10/10/17 12:38 AM: Hi [~bdeggleston] Thank you for your advice. I can achieve this by creating a new ConfigurationLoader But I dont think it is a friendly way to extend by create a new ConfigurationLoader。 Another new ConfigurationLoader may be designed to load configurations from DB, properties, network etc source other than yaml. I only want to add more settings to cassandra.yaml。 dont want to copy YamlConfigurationLoader.java again~:D was (Author: zhaoyan): Hi [~bdeggleston] Thank you for your advice. I can achieve this by creating a new ConfigurationLoader But I dont think it is a friendly way to extend by create a new ConfigurationLoader。 Another new ConfigurationLoader may be designed to load configurations from DB, properties, network etc source other than yaml. I only want to add more settings to cassandra.yaml。 > Open Cassandra.yaml for developers to extend custom settings > > > Key: CASSANDRA-13942 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13942 > Project: Cassandra > Issue Type: Wish > Components: Configuration >Reporter: zhaoyan > > we now try to write one index plugin for cassandra. > we want to put some more settings in cassandra.yaml. and read it in our code. > we find the cassandra use DatabaseDescriptor.java and Config.java to save the > configurations in cassandra.yaml. but we cant extend it > so I advice cassandra provide some interfaces for deleopers to extend custom > settings > Thank you -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13848) Allow sstabledump to do a json object per partition to better handle large sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197956#comment-16197956 ] Jeff Jirsa commented on CASSANDRA-13848: Thanks [~kwern] - took a quick peek and it looks reasonable, but I'll try to review soon > Allow sstabledump to do a json object per partition to better handle large > sstables > --- > > Key: CASSANDRA-13848 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13848 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jeff Jirsa >Assignee: Kevin Wern >Priority: Trivial > Labels: lhf > Attachments: 0001-sstabledump-add-l-option-for-jsonl.patch > > > sstable2json / sstabledump make a huge json document of the whole file. For > very large sstables this makes it impossible to load in memory to do anything > with it. Allowing users to Break it into small json objects per partition > would be useful. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13813) Don't let user drop (or generally break) tables in system_distributed
[ https://issues.apache.org/jira/browse/CASSANDRA-13813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197996#comment-16197996 ] Kurt Greaves commented on CASSANDRA-13813: -- I think if we can't provide a data model for our tables that works for all scenarios then we need to allow operators to make changes. I've had quite a few occasions where modifying "system" tables was necessary, and I'm sure more tables will be introduced that don't work in all scenarios in the future. While there is the workaround of just inserting into the system_schema tables that is fraught with peril, and far more likely for them to do something that breaks things. I can't see someone saying "woops I accidentally DROPped/ALTERed a random column in system_distributed.view_build_status", but I can definitely see someone trying to insert into system_schema.tables and making mistakes. As soon as we make them replicated we hand over some responsibility to the operator to manage them (not that the non-replicated keyspaces have a history of being perfect though), and I'd expect to be able to change table properties that potentially affect the cluster. Cassandra already requires you to know what your doing as an operator, this really doesn't increase that expectation. There are a million other bad choices you could make when managing a cluster that would be far more catastrophic (and far more likely). I would like to move away from that, but a lot of that sort of thing requires major changes to fix. As in this case it seems we'll need the capability limitation framework or other major changes to make a reasonable compromise. > Don't let user drop (or generally break) tables in system_distributed > - > > Key: CASSANDRA-13813 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13813 > Project: Cassandra > Issue Type: Bug > Components: Distributed Metadata >Reporter: Sylvain Lebresne >Assignee: Aleksey Yeschenko > Fix For: 3.0.x, 3.11.x > > > There is not currently no particular restrictions on schema modifications to > tables of the {{system_distributed}} keyspace. This does mean you can drop > those tables, or even alter them in wrong ways like dropping or renaming > columns. All of which is guaranteed to break stuffs (that is, repair if you > mess up with on of it's table, or MVs if you mess up with > {{view_build_status}}). > I'm pretty sure this was never intended and is an oversight of the condition > on {{ALTERABLE_SYSTEM_KEYSPACES}} in > [ClientState|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L397]. > That condition is such that any keyspace not listed in > {{ALTERABLE_SYSTEM_KEYSPACES}} (which happens to be the case for > {{system_distributed}}) has no specific restrictions whatsoever, while given > the naming it's fair to assume the intention that exactly the opposite. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements
[ https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198063#comment-16198063 ] Kurt Greaves commented on CASSANDRA-13442: -- Yeah OK I'm convinced (if it can be proven, obviously), however let's not go around making it incredibly misleading. bq. 10-20x on transient replicas. Not at full replicas or overall. Saying 10-20x is really misleading. No one is actually going to see a 10 - 20x improvement in disk usage. Even a reduction of 1/3 would be optimistic I'm sure. bq. With vnodes data would be spread out over several nodes so the additional utilization at each node could be substantially less. Let's not pretend people running vnodes can actually run repairs. bq. Some of it might end up being part of overlapping functionality. I can hope. Not sure if there is a ticket for it but I've been meaning to create one which would probably benefit from this change. Need a way to change RF without downtime and without costing a fortune (DC migration). I can see ways in which transient replicas would give this functionality, as will need some way to change RF on the fly and not cause nodes to be responsible for data they don't yet have. If you could add a replica as transient at any time this would almost solve the RF change problem, assuming you had some way to transition between transient and real replicas. > Support a means of strongly consistent highly available replication with > tunable storage requirements > - > > Key: CASSANDRA-13442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13442 > Project: Cassandra > Issue Type: Improvement > Components: Compaction, Coordination, Distributed Metadata, Local > Write-Read Paths >Reporter: Ariel Weisberg > > Replication factors like RF=2 can't provide strong consistency and > availability because if a single node is lost it's impossible to reach a > quorum of replicas. Stepping up to RF=3 will allow you to lose a node and > still achieve quorum for reads and writes, but requires committing additional > storage. > The requirement of a quorum for writes/reads doesn't seem to be something > that can be relaxed without additional constraints on queries, but it seems > like it should be possible to relax the requirement that 3 full copies of the > entire data set are kept. What is actually required is a covering data set > for the range and we should be able to achieve a covering data set and high > availability without having three full copies. > After a repair we know that some subset of the data set is fully replicated. > At that point we don't have to read from a quorum of nodes for the repaired > data. It is sufficient to read from a single node for the repaired data and a > quorum of nodes for the unrepaired data. > One way to exploit this would be to have N replicas, say the last N replicas > (where N varies with RF) in the preference list, delete all repaired data > after a repair completes. Subsequent quorum reads will be able to retrieve > the repaired data from any of the two full replicas and the unrepaired data > from a quorum read of any replica including the "transient" replicas. > Configuration for something like this in NTS might be something similar to { > DC1="3-1", DC2="3-2" } where the first value is the replication factor used > for consistency and the second values is the number of transient replicas. If > you specify { DC1=3, DC2=3 } then the number of transient replicas defaults > to 0 and you get the same behavior you have today. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13475) First version of pluggable storage engine API.
[ https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dikang Gu updated CASSANDRA-13475: -- Status: Patch Available (was: Open) Here is the first version of the pluggable storage engine api, based on trunk. https://github.com/DikangGu/cassandra/commit/f1c69f688d05504f7409dd735e1473982c59fa52 It contains the API, and a little bit refactoring of the streaming part. You can check https://github.com/Instagram/cassandra/tree/rocks_3.0 for the RocksDB based implementation. > First version of pluggable storage engine API. > -- > > Key: CASSANDRA-13475 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13475 > Project: Cassandra > Issue Type: Sub-task >Reporter: Dikang Gu >Assignee: Dikang Gu > > In order to support pluggable storage engine, we need to define a unified > interface/API, which can allow us to plug in different storage engines for > different requirements. > In very high level, the storage engine interface should include APIs to: > 1. Apply update into the engine. > 2. Query data from the engine. > 3. Stream data in/out to/from the engine. > 4. Table operations, like create/drop/truncate a table, etc. > 5. Various stats about the engine. > I create this ticket to start the discussions about the interface. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements
[ https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198253#comment-16198253 ] DOAN DuyHai commented on CASSANDRA-13442: - So I've wrapped my head around the design of transient replicas. So far I can spot 2 concerns 1) It is not working with ONE or LOCAL_ONE. Of course transient replication is an opt-in feature but it means users should be super-careful about issuing queries at ONE/LOCAL_ONE for the keyspaces having transient replication enabled. Considering that ONE/LOCAL_ONE is the *default consistency level* for drivers and spark connector, maybe should we throw exception whenever a query with those consistency level are issued against transiently replicated keyspaces ? 2) *Consistency level* and *repair* have been 2 distinct and orthogonal notions so far. With transient replication they are strongly tied. Indeed transient replication relies heavily on incremental repair. Of course it is a detail of impl, [~aweisberg] has mentioned replicated hints as another impl alternative but in this case we're making transient replication dependent of hints impl. Same story The consequence of point 2) is that any bug in the incremental repair/replicated hints will impact terribly the correctness/assumptions of transient replication. This point worries me much more than point 1) > Support a means of strongly consistent highly available replication with > tunable storage requirements > - > > Key: CASSANDRA-13442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13442 > Project: Cassandra > Issue Type: Improvement > Components: Compaction, Coordination, Distributed Metadata, Local > Write-Read Paths >Reporter: Ariel Weisberg > > Replication factors like RF=2 can't provide strong consistency and > availability because if a single node is lost it's impossible to reach a > quorum of replicas. Stepping up to RF=3 will allow you to lose a node and > still achieve quorum for reads and writes, but requires committing additional > storage. > The requirement of a quorum for writes/reads doesn't seem to be something > that can be relaxed without additional constraints on queries, but it seems > like it should be possible to relax the requirement that 3 full copies of the > entire data set are kept. What is actually required is a covering data set > for the range and we should be able to achieve a covering data set and high > availability without having three full copies. > After a repair we know that some subset of the data set is fully replicated. > At that point we don't have to read from a quorum of nodes for the repaired > data. It is sufficient to read from a single node for the repaired data and a > quorum of nodes for the unrepaired data. > One way to exploit this would be to have N replicas, say the last N replicas > (where N varies with RF) in the preference list, delete all repaired data > after a repair completes. Subsequent quorum reads will be able to retrieve > the repaired data from any of the two full replicas and the unrepaired data > from a quorum read of any replica including the "transient" replicas. > Configuration for something like this in NTS might be something similar to { > DC1="3-1", DC2="3-2" } where the first value is the replication factor used > for consistency and the second values is the number of transient replicas. If > you specify { DC1=3, DC2=3 } then the number of transient replicas defaults > to 0 and you get the same behavior you have today. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org