[jira] [Commented] (CASSANDRA-12497) COPY ... TO STDOUT regression in 2.2.7

2017-10-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196594#comment-16196594
 ] 

ASF GitHub Bot commented on CASSANDRA-12497:


Github user salomvary closed the pull request at:

https://github.com/apache/cassandra/pull/92


> COPY ... TO STDOUT regression in 2.2.7
> --
>
> Key: CASSANDRA-12497
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12497
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Max Bowsher
>Assignee: Márton Salomváry
> Fix For: 2.2.10, 3.0.12, 3.11.0, 4.0
>
>
> Cassandra 2.2.7 introduces a regression over 2.2.6 breaking COPY ... TO 
> STDOUT.
> In pylib/cqlshlib/copyutil.py, in CopyTask.__init__, self.printmsg is 
> conditionally defined to EITHER a module level function accepting arguments 
> (msg, eol=, encoding=), OR a lambda accepting arguments only (_, eol=).
> Consequently, when the lambda is in use (which requires COPY ... TO STDOUT 
> without --debug), any attempt to call CopyTask.printmsg with an encoding 
> parameter causes an exception.
> This occurs in ExportTask.run, thus rendering all COPY ... TO STDOUT without 
> --debug broken.
> The fix is to update the lambda's arguments to include encoding, or better, 
> replace it with a module-level function defined next to printmsg, so that 
> people realize the two argument lists must be kept in sync.
> The regression was introduced in this commit:
> commit 5de9de1f5832f2a0e92783e2f4412874423e6e15
> Author: Tyler Hobbs 
> Date:   Thu May 5 11:33:35 2016 -0500
> cqlsh: Handle non-ascii chars in error messages
> 
> Patch by Tyler Hobbs; reviewed by Paulo Motta for CASSANDRA-11626



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13930) Avoid grabbing the read lock when checking if compaction strategy should do defragmentation

2017-10-09 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196660#comment-16196660
 ] 

Marcus Eriksson commented on CASSANDRA-13930:
-

bq. What do you think about a similar fix for fanout? 
makes sense, at least it doesn't hurt (until we want to have LCS change fanout 
dynamically or something)

pushed up a new commit to the same branch and rerunning tests
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/363/
https://circleci.com/gh/krummas/cassandra/148

> Avoid grabbing the read lock when checking if compaction strategy should do 
> defragmentation
> ---
>
> Key: CASSANDRA-13930
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13930
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 3.11.x, 4.x
>
>
> We grab the read lock when checking whether the compaction strategy benefits 
> from defragmentation, avoid that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-13942) Open Cassandra.yaml for developers to extend custom settings

2017-10-09 Thread zhaoyan (JIRA)
zhaoyan created CASSANDRA-13942:
---

 Summary: Open Cassandra.yaml for developers to extend custom 
settings
 Key: CASSANDRA-13942
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13942
 Project: Cassandra
  Issue Type: Wish
  Components: Configuration
Reporter: zhaoyan


we now try to write one index plugin for cassandra.

we want to put some more settings in cassandra.yaml. and read it in our code.

we find the cassandra use DatabaseDescriptor.java and Config.java to save the 
configurations in cassandra.yaml. but we cant extend it

so I advice cassandra provide some interfaces for deleopers to extend custom 
settings

Thank you



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13910) Consider deprecating (then removing) read_repair_chance/dclocal_read_repair_chance

2017-10-09 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196782#comment-16196782
 ] 

Robert Stupp commented on CASSANDRA-13910:
--

(If nodes are down for longer than the hint-window (or some similar operational 
issues), you should run an AE repair and not rely on RR.)

Considering the misunderstandings in the wild how RR works, I'm also +1 on 
deprecating it in 3.11.x and removing in 4.0.

In fact, I haven't seen a single use case where RR was used intentionally, but 
many because of "it's the default, so it must be good to have it".

> Consider deprecating (then removing) 
> read_repair_chance/dclocal_read_repair_chance
> --
>
> Key: CASSANDRA-13910
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13910
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Priority: Minor
>  Labels: CommunityFeedbackRequested
>
> First, let me clarify so this is not misunderstood that I'm not *at all* 
> suggesting to remove the read-repair mechanism of detecting and repairing 
> inconsistencies between read responses: that mechanism is imo fine and 
> useful.  But the {{read_repair_chance}} and {{dclocal_read_repair_chance}} 
> have never been about _enabling_ that mechanism, they are about querying all 
> replicas (even when this is not required by the consistency level) for the 
> sole purpose of maybe read-repairing some of the replica that wouldn't have 
> been queried otherwise. Which btw, bring me to reason 1 for considering their 
> removal: their naming/behavior is super confusing. Over the years, I've seen 
> countless users (and not only newbies) misunderstanding what those options 
> do, and as a consequence misunderstand when read-repair itself was happening.
> But my 2nd reason for suggesting this is that I suspect 
> {{read_repair_chance}}/{{dclocal_read_repair_chance}} are, especially 
> nowadays, more harmful than anything else when enabled. When those option 
> kick in, what you trade-off is additional resources consumption (all nodes 
> have to execute the read) for a _fairly remote chance_ of having some 
> inconsistencies repaired on _some_ replica _a bit faster_ than they would 
> otherwise be. To justify that last part, let's recall that:
> # most inconsistencies are actually fixed by hints in practice; and in the 
> case where a node stay dead for a long time so that hints ends up timing-out, 
> you really should repair the node when it comes back (if not simply 
> re-bootstrapping it).  Read-repair probably don't fix _that_ much stuff in 
> the first place.
> # again, read-repair do happen without those options kicking in. If you do 
> reads at {{QUORUM}}, inconsistencies will eventually get read-repaired all 
> the same.  Just a tiny bit less quickly.
> # I suspect almost everyone use a low "chance" for those options at best 
> (because the extra resources consumption is real), so at the end of the day, 
> it's up to chance how much faster this fixes inconsistencies.
> Overall, I'm having a hard time imagining real cases where that trade-off 
> really make sense. Don't get me wrong, those options had their places a long 
> time ago when hints weren't working all that well, but I think they bring 
> more confusion than benefits now.
> And I think it's sane to reconsider stuffs every once in a while, and to 
> clean up anything that may not make all that much sense anymore, which I 
> think is the case here.
> Tl;dr, I feel the benefits brought by those options are very slim at best and 
> well overshadowed by the confusion they bring, and not worth maintaining the 
> code that supports them (which, to be fair, isn't huge, but getting rid of 
> {{ReadCallback.AsyncRepairRunner}} wouldn't hurt for instance).
> Lastly, if the consensus here ends up being that they can have their use in 
> weird case and that we fill supporting those cases is worth confusing 
> everyone else and maintaining that code, I would still suggest disabling them 
> totally by default.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13813) Don't let user drop (or generally break) tables in system_distributed

2017-10-09 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196789#comment-16196789
 ] 

Aleksey Yeschenko commented on CASSANDRA-13813:
---

bq. I'm not sure I understand the problem. If user have manually and knowingly 
updated some table params, my guess is that they expect (even rely on) future 
changes to defaults to not override their changes. Isn't the whole point why 
we've picked 0 for our hardcoded timestamp in fact?

Right. But the way ALTER works, we serialise the whole table, including all 
params and all columns, with the new timestamp in {{system_schema.*}} tables. 
Which makes it impossible for us to change the defaults later, even those that 
the user didn't modify on purpose. And this isn't something we can change very 
easily in a minor I'm afraid.

This is why we don't allow altering anything beyond keyspace params, and why 
this issue is, as it stands, a serious bug, and was never intended to be 
allowed.

> Don't let user drop (or generally break) tables in system_distributed
> -
>
> Key: CASSANDRA-13813
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13813
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Sylvain Lebresne
>Assignee: Aleksey Yeschenko
> Fix For: 3.0.x, 3.11.x
>
>
> There is not currently no particular restrictions on schema modifications to 
> tables of the {{system_distributed}} keyspace. This does mean you can drop 
> those tables, or even alter them in wrong ways like dropping or renaming 
> columns. All of which is guaranteed to break stuffs (that is, repair if you 
> mess up with on of it's table, or MVs if you mess up with 
> {{view_build_status}}).
> I'm pretty sure this was never intended and is an oversight of the condition 
> on {{ALTERABLE_SYSTEM_KEYSPACES}} in 
> [ClientState|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L397].
>  That condition is such that any keyspace not listed in 
> {{ALTERABLE_SYSTEM_KEYSPACES}} (which happens to be the case for 
> {{system_distributed}}) has no specific restrictions whatsoever, while given 
> the naming it's fair to assume the intention that exactly the opposite.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13813) Don't let user drop (or generally break) tables in system_distributed

2017-10-09 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196803#comment-16196803
 ] 

Aleksey Yeschenko commented on CASSANDRA-13813:
---

FWIW, I'll be the first to admit that the current situation is not ideal. It 
wasn't me who came up with it, but I share part of the blame - replicated 
system keyspaces are a bit of a mess, and this has already caused us some 
issues, and hassle with {{system_auth}}, and it won't be the last.

We can't even fix CASSANDRA-12701 properly in a minor without causing migration 
mismatch fun. So all things considered, my personal preference would be to 
shield existing users from causing further issues for themselves by 
accidentally or intentionally modifying those tables. At least until we have a 
good answer to these related issues, which I don't :(

> Don't let user drop (or generally break) tables in system_distributed
> -
>
> Key: CASSANDRA-13813
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13813
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Sylvain Lebresne
>Assignee: Aleksey Yeschenko
> Fix For: 3.0.x, 3.11.x
>
>
> There is not currently no particular restrictions on schema modifications to 
> tables of the {{system_distributed}} keyspace. This does mean you can drop 
> those tables, or even alter them in wrong ways like dropping or renaming 
> columns. All of which is guaranteed to break stuffs (that is, repair if you 
> mess up with on of it's table, or MVs if you mess up with 
> {{view_build_status}}).
> I'm pretty sure this was never intended and is an oversight of the condition 
> on {{ALTERABLE_SYSTEM_KEYSPACES}} in 
> [ClientState|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L397].
>  That condition is such that any keyspace not listed in 
> {{ALTERABLE_SYSTEM_KEYSPACES}} (which happens to be the case for 
> {{system_distributed}}) has no specific restrictions whatsoever, while given 
> the naming it's fair to assume the intention that exactly the opposite.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13910) Consider deprecating (then removing) read_repair_chance/dclocal_read_repair_chance

2017-10-09 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196820#comment-16196820
 ] 

Aleksey Yeschenko commented on CASSANDRA-13910:
---

Right. But you can't *rely* on speculative RR unless your chance is set to an 
absurdly high value. Which I don't see why anyone would do, and in that case 
you might as well set speculative retry to {{ALWAYS}} instead.

bq. There's also very minimal gain from removing this from the codebase.

Says who? I've got plans to rewrite our coordinator read path level. To me 
having a new clean implementation that doesn't need to concern itself with 
baggage like RR has more than minimal gain - I only need to worry about 
speculative retry being a complication.

Anyway, repeating my +1 here for removal in 4.0.

[~slebresne] Do you want to write up a patch for this? If not, feel free to 
assign the JIRA to me.

> Consider deprecating (then removing) 
> read_repair_chance/dclocal_read_repair_chance
> --
>
> Key: CASSANDRA-13910
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13910
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Priority: Minor
>  Labels: CommunityFeedbackRequested
>
> First, let me clarify so this is not misunderstood that I'm not *at all* 
> suggesting to remove the read-repair mechanism of detecting and repairing 
> inconsistencies between read responses: that mechanism is imo fine and 
> useful.  But the {{read_repair_chance}} and {{dclocal_read_repair_chance}} 
> have never been about _enabling_ that mechanism, they are about querying all 
> replicas (even when this is not required by the consistency level) for the 
> sole purpose of maybe read-repairing some of the replica that wouldn't have 
> been queried otherwise. Which btw, bring me to reason 1 for considering their 
> removal: their naming/behavior is super confusing. Over the years, I've seen 
> countless users (and not only newbies) misunderstanding what those options 
> do, and as a consequence misunderstand when read-repair itself was happening.
> But my 2nd reason for suggesting this is that I suspect 
> {{read_repair_chance}}/{{dclocal_read_repair_chance}} are, especially 
> nowadays, more harmful than anything else when enabled. When those option 
> kick in, what you trade-off is additional resources consumption (all nodes 
> have to execute the read) for a _fairly remote chance_ of having some 
> inconsistencies repaired on _some_ replica _a bit faster_ than they would 
> otherwise be. To justify that last part, let's recall that:
> # most inconsistencies are actually fixed by hints in practice; and in the 
> case where a node stay dead for a long time so that hints ends up timing-out, 
> you really should repair the node when it comes back (if not simply 
> re-bootstrapping it).  Read-repair probably don't fix _that_ much stuff in 
> the first place.
> # again, read-repair do happen without those options kicking in. If you do 
> reads at {{QUORUM}}, inconsistencies will eventually get read-repaired all 
> the same.  Just a tiny bit less quickly.
> # I suspect almost everyone use a low "chance" for those options at best 
> (because the extra resources consumption is real), so at the end of the day, 
> it's up to chance how much faster this fixes inconsistencies.
> Overall, I'm having a hard time imagining real cases where that trade-off 
> really make sense. Don't get me wrong, those options had their places a long 
> time ago when hints weren't working all that well, but I think they bring 
> more confusion than benefits now.
> And I think it's sane to reconsider stuffs every once in a while, and to 
> clean up anything that may not make all that much sense anymore, which I 
> think is the case here.
> Tl;dr, I feel the benefits brought by those options are very slim at best and 
> well overshadowed by the confusion they bring, and not worth maintaining the 
> code that supports them (which, to be fair, isn't huge, but getting rid of 
> {{ReadCallback.AsyncRepairRunner}} wouldn't hurt for instance).
> Lastly, if the consensus here ends up being that they can have their use in 
> weird case and that we fill supporting those cases is worth confusing 
> everyone else and maintaining that code, I would still suggest disabling them 
> totally by default.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13910) Consider deprecating (then removing) read_repair_chance/dclocal_read_repair_chance

2017-10-09 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-13910:
--
Fix Version/s: 3.11.x
   4.0

> Consider deprecating (then removing) 
> read_repair_chance/dclocal_read_repair_chance
> --
>
> Key: CASSANDRA-13910
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13910
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Priority: Minor
>  Labels: CommunityFeedbackRequested
> Fix For: 4.0, 3.11.x
>
>
> First, let me clarify so this is not misunderstood that I'm not *at all* 
> suggesting to remove the read-repair mechanism of detecting and repairing 
> inconsistencies between read responses: that mechanism is imo fine and 
> useful.  But the {{read_repair_chance}} and {{dclocal_read_repair_chance}} 
> have never been about _enabling_ that mechanism, they are about querying all 
> replicas (even when this is not required by the consistency level) for the 
> sole purpose of maybe read-repairing some of the replica that wouldn't have 
> been queried otherwise. Which btw, bring me to reason 1 for considering their 
> removal: their naming/behavior is super confusing. Over the years, I've seen 
> countless users (and not only newbies) misunderstanding what those options 
> do, and as a consequence misunderstand when read-repair itself was happening.
> But my 2nd reason for suggesting this is that I suspect 
> {{read_repair_chance}}/{{dclocal_read_repair_chance}} are, especially 
> nowadays, more harmful than anything else when enabled. When those option 
> kick in, what you trade-off is additional resources consumption (all nodes 
> have to execute the read) for a _fairly remote chance_ of having some 
> inconsistencies repaired on _some_ replica _a bit faster_ than they would 
> otherwise be. To justify that last part, let's recall that:
> # most inconsistencies are actually fixed by hints in practice; and in the 
> case where a node stay dead for a long time so that hints ends up timing-out, 
> you really should repair the node when it comes back (if not simply 
> re-bootstrapping it).  Read-repair probably don't fix _that_ much stuff in 
> the first place.
> # again, read-repair do happen without those options kicking in. If you do 
> reads at {{QUORUM}}, inconsistencies will eventually get read-repaired all 
> the same.  Just a tiny bit less quickly.
> # I suspect almost everyone use a low "chance" for those options at best 
> (because the extra resources consumption is real), so at the end of the day, 
> it's up to chance how much faster this fixes inconsistencies.
> Overall, I'm having a hard time imagining real cases where that trade-off 
> really make sense. Don't get me wrong, those options had their places a long 
> time ago when hints weren't working all that well, but I think they bring 
> more confusion than benefits now.
> And I think it's sane to reconsider stuffs every once in a while, and to 
> clean up anything that may not make all that much sense anymore, which I 
> think is the case here.
> Tl;dr, I feel the benefits brought by those options are very slim at best and 
> well overshadowed by the confusion they bring, and not worth maintaining the 
> code that supports them (which, to be fair, isn't huge, but getting rid of 
> {{ReadCallback.AsyncRepairRunner}} wouldn't hurt for instance).
> Lastly, if the consensus here ends up being that they can have their use in 
> weird case and that we fill supporting those cases is worth confusing 
> everyone else and maintaining that code, I would still suggest disabling them 
> totally by default.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13910) Consider deprecating (then removing) read_repair_chance/dclocal_read_repair_chance

2017-10-09 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196820#comment-16196820
 ] 

Aleksey Yeschenko edited comment on CASSANDRA-13910 at 10/9/17 11:22 AM:
-

Right. But you can't *rely* on chance RR unless your chance is set to an 
absurdly high value. Which I don't see why anyone would do, and in that case 
you might as well set speculative retry to {{ALWAYS}} instead.

bq. There's also very minimal gain from removing this from the codebase.

Says who? I've got plans to rewrite our coordinator read path level. To me 
having a new clean implementation that doesn't need to concern itself with 
baggage like RR has more than minimal gain - I only need to worry about 
speculative retry being a complication.

Anyway, repeating my +1 here for removal in 4.0.

[~slebresne] Do you want to write up a patch for this? If not, feel free to 
assign the JIRA to me.


was (Author: iamaleksey):
Right. But you can't *rely* on speculative RR unless your chance is set to an 
absurdly high value. Which I don't see why anyone would do, and in that case 
you might as well set speculative retry to {{ALWAYS}} instead.

bq. There's also very minimal gain from removing this from the codebase.

Says who? I've got plans to rewrite our coordinator read path level. To me 
having a new clean implementation that doesn't need to concern itself with 
baggage like RR has more than minimal gain - I only need to worry about 
speculative retry being a complication.

Anyway, repeating my +1 here for removal in 4.0.

[~slebresne] Do you want to write up a patch for this? If not, feel free to 
assign the JIRA to me.

> Consider deprecating (then removing) 
> read_repair_chance/dclocal_read_repair_chance
> --
>
> Key: CASSANDRA-13910
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13910
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Priority: Minor
>  Labels: CommunityFeedbackRequested
> Fix For: 4.0, 3.11.x
>
>
> First, let me clarify so this is not misunderstood that I'm not *at all* 
> suggesting to remove the read-repair mechanism of detecting and repairing 
> inconsistencies between read responses: that mechanism is imo fine and 
> useful.  But the {{read_repair_chance}} and {{dclocal_read_repair_chance}} 
> have never been about _enabling_ that mechanism, they are about querying all 
> replicas (even when this is not required by the consistency level) for the 
> sole purpose of maybe read-repairing some of the replica that wouldn't have 
> been queried otherwise. Which btw, bring me to reason 1 for considering their 
> removal: their naming/behavior is super confusing. Over the years, I've seen 
> countless users (and not only newbies) misunderstanding what those options 
> do, and as a consequence misunderstand when read-repair itself was happening.
> But my 2nd reason for suggesting this is that I suspect 
> {{read_repair_chance}}/{{dclocal_read_repair_chance}} are, especially 
> nowadays, more harmful than anything else when enabled. When those option 
> kick in, what you trade-off is additional resources consumption (all nodes 
> have to execute the read) for a _fairly remote chance_ of having some 
> inconsistencies repaired on _some_ replica _a bit faster_ than they would 
> otherwise be. To justify that last part, let's recall that:
> # most inconsistencies are actually fixed by hints in practice; and in the 
> case where a node stay dead for a long time so that hints ends up timing-out, 
> you really should repair the node when it comes back (if not simply 
> re-bootstrapping it).  Read-repair probably don't fix _that_ much stuff in 
> the first place.
> # again, read-repair do happen without those options kicking in. If you do 
> reads at {{QUORUM}}, inconsistencies will eventually get read-repaired all 
> the same.  Just a tiny bit less quickly.
> # I suspect almost everyone use a low "chance" for those options at best 
> (because the extra resources consumption is real), so at the end of the day, 
> it's up to chance how much faster this fixes inconsistencies.
> Overall, I'm having a hard time imagining real cases where that trade-off 
> really make sense. Don't get me wrong, those options had their places a long 
> time ago when hints weren't working all that well, but I think they bring 
> more confusion than benefits now.
> And I think it's sane to reconsider stuffs every once in a while, and to 
> clean up anything that may not make all that much sense anymore, which I 
> think is the case here.
> Tl;dr, I feel the benefits brought by those options are very slim at best and 
> well overshadowed by the confusion they bring, and not worth maintainin

[jira] [Commented] (CASSANDRA-13910) Consider deprecating (then removing) read_repair_chance/dclocal_read_repair_chance

2017-10-09 Thread Kurt Greaves (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196835#comment-16196835
 ] 

Kurt Greaves commented on CASSANDRA-13910:
--

Says me. It's probably the least complex code in the whole read path. Even if 
you kept it through a re-write it would amount to almost nothing. But that's 
all besides the point. Just because you've got plans doesn't mean you should 
rush removal of a feature that's existed for years. It's a database. Change 
doesn't have to happen at lightspeed.

> Consider deprecating (then removing) 
> read_repair_chance/dclocal_read_repair_chance
> --
>
> Key: CASSANDRA-13910
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13910
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Priority: Minor
>  Labels: CommunityFeedbackRequested
> Fix For: 4.0, 3.11.x
>
>
> First, let me clarify so this is not misunderstood that I'm not *at all* 
> suggesting to remove the read-repair mechanism of detecting and repairing 
> inconsistencies between read responses: that mechanism is imo fine and 
> useful.  But the {{read_repair_chance}} and {{dclocal_read_repair_chance}} 
> have never been about _enabling_ that mechanism, they are about querying all 
> replicas (even when this is not required by the consistency level) for the 
> sole purpose of maybe read-repairing some of the replica that wouldn't have 
> been queried otherwise. Which btw, bring me to reason 1 for considering their 
> removal: their naming/behavior is super confusing. Over the years, I've seen 
> countless users (and not only newbies) misunderstanding what those options 
> do, and as a consequence misunderstand when read-repair itself was happening.
> But my 2nd reason for suggesting this is that I suspect 
> {{read_repair_chance}}/{{dclocal_read_repair_chance}} are, especially 
> nowadays, more harmful than anything else when enabled. When those option 
> kick in, what you trade-off is additional resources consumption (all nodes 
> have to execute the read) for a _fairly remote chance_ of having some 
> inconsistencies repaired on _some_ replica _a bit faster_ than they would 
> otherwise be. To justify that last part, let's recall that:
> # most inconsistencies are actually fixed by hints in practice; and in the 
> case where a node stay dead for a long time so that hints ends up timing-out, 
> you really should repair the node when it comes back (if not simply 
> re-bootstrapping it).  Read-repair probably don't fix _that_ much stuff in 
> the first place.
> # again, read-repair do happen without those options kicking in. If you do 
> reads at {{QUORUM}}, inconsistencies will eventually get read-repaired all 
> the same.  Just a tiny bit less quickly.
> # I suspect almost everyone use a low "chance" for those options at best 
> (because the extra resources consumption is real), so at the end of the day, 
> it's up to chance how much faster this fixes inconsistencies.
> Overall, I'm having a hard time imagining real cases where that trade-off 
> really make sense. Don't get me wrong, those options had their places a long 
> time ago when hints weren't working all that well, but I think they bring 
> more confusion than benefits now.
> And I think it's sane to reconsider stuffs every once in a while, and to 
> clean up anything that may not make all that much sense anymore, which I 
> think is the case here.
> Tl;dr, I feel the benefits brought by those options are very slim at best and 
> well overshadowed by the confusion they bring, and not worth maintaining the 
> code that supports them (which, to be fair, isn't huge, but getting rid of 
> {{ReadCallback.AsyncRepairRunner}} wouldn't hurt for instance).
> Lastly, if the consensus here ends up being that they can have their use in 
> weird case and that we fill supporting those cases is worth confusing 
> everyone else and maintaining that code, I would still suggest disabling them 
> totally by default.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

2017-10-09 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196856#comment-16196856
 ] 

Robert Stupp commented on CASSANDRA-10786:
--

I'm ok with the approach to commit this patch as it is (and resolve this 
ticket) and create a follow-up blocker for 4.0 to pull in a release version of 
the driver.

> Include hash of result set metadata in prepared statement id
> 
>
> Key: CASSANDRA-10786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: CQL
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: client-impacting, doc-impacting, protocolv5
> Fix For: 4.x
>
>
> *_Initial description:_*
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set 
> metadata in the md5. This way the md5 would change at step 3, and any client 
> using the old md5 would get an UNPREPARED, regardless of whether another 
> client already reprepared.
> -
> *_Resolution (2017/02/13):_*
> The following changes were made to native protocol v5:
> - the PREPARED response includes {{result_metadata_id}}, a hash of the result 
> set metadata.
> - every EXECUTE message must provide {{result_metadata_id}} in addition to 
> the prepared statement id. If it doesn't match the current one on the server, 
> it means the client is operating on a stale schema.
> - to notify the client, the server returns a ROWS response with a new 
> {{Metadata_changed}} flag, the new {{result_metadata_id}} and the updated 
> result metadata (this overrides the {{No_metadata}} flag, even if the client 
> had requested it)
> - the client updates its copy of the result metadata before it decodes the 
> results.
> So the scenario above would now look like:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, and 
> result set (b, c) that hashes to cde456
> # column a gets added to the table, C* does not invalidate its cache entry, 
> but only updates the result set to (a, b, c) which hashes to fff789
> # client sends an EXECUTE request for (statementId=abc123, resultId=cde456) 
> and skip_metadata flag
> # cde456!=fff789, so C* responds with ROWS(..., no_metadata=false, 
> metadata_changed=true, new_metadata_id=fff789,col specs for (a,b,c))
> # client updates its column specifications, and will send the next execute 
> queries with (statementId=abc123, resultId=fff789)
> This works the same with multiple clients.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13910) Consider deprecating (then removing) read_repair_chance/dclocal_read_repair_chance

2017-10-09 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196909#comment-16196909
 ] 

Aleksey Yeschenko commented on CASSANDRA-13910:
---

bq. Just because you've got plans doesn't mean you should rush removal of a 
feature that's existed for years.

It's not 'just because'. There is two pages of comments here justifying the 
change, including one in the comment you are replying to.

If something has a negative value staying in (which I and others are arguing is 
the fact), then it should absolutely be removed as soon as possible - which is 
the next major.

> Consider deprecating (then removing) 
> read_repair_chance/dclocal_read_repair_chance
> --
>
> Key: CASSANDRA-13910
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13910
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Priority: Minor
>  Labels: CommunityFeedbackRequested
> Fix For: 4.0, 3.11.x
>
>
> First, let me clarify so this is not misunderstood that I'm not *at all* 
> suggesting to remove the read-repair mechanism of detecting and repairing 
> inconsistencies between read responses: that mechanism is imo fine and 
> useful.  But the {{read_repair_chance}} and {{dclocal_read_repair_chance}} 
> have never been about _enabling_ that mechanism, they are about querying all 
> replicas (even when this is not required by the consistency level) for the 
> sole purpose of maybe read-repairing some of the replica that wouldn't have 
> been queried otherwise. Which btw, bring me to reason 1 for considering their 
> removal: their naming/behavior is super confusing. Over the years, I've seen 
> countless users (and not only newbies) misunderstanding what those options 
> do, and as a consequence misunderstand when read-repair itself was happening.
> But my 2nd reason for suggesting this is that I suspect 
> {{read_repair_chance}}/{{dclocal_read_repair_chance}} are, especially 
> nowadays, more harmful than anything else when enabled. When those option 
> kick in, what you trade-off is additional resources consumption (all nodes 
> have to execute the read) for a _fairly remote chance_ of having some 
> inconsistencies repaired on _some_ replica _a bit faster_ than they would 
> otherwise be. To justify that last part, let's recall that:
> # most inconsistencies are actually fixed by hints in practice; and in the 
> case where a node stay dead for a long time so that hints ends up timing-out, 
> you really should repair the node when it comes back (if not simply 
> re-bootstrapping it).  Read-repair probably don't fix _that_ much stuff in 
> the first place.
> # again, read-repair do happen without those options kicking in. If you do 
> reads at {{QUORUM}}, inconsistencies will eventually get read-repaired all 
> the same.  Just a tiny bit less quickly.
> # I suspect almost everyone use a low "chance" for those options at best 
> (because the extra resources consumption is real), so at the end of the day, 
> it's up to chance how much faster this fixes inconsistencies.
> Overall, I'm having a hard time imagining real cases where that trade-off 
> really make sense. Don't get me wrong, those options had their places a long 
> time ago when hints weren't working all that well, but I think they bring 
> more confusion than benefits now.
> And I think it's sane to reconsider stuffs every once in a while, and to 
> clean up anything that may not make all that much sense anymore, which I 
> think is the case here.
> Tl;dr, I feel the benefits brought by those options are very slim at best and 
> well overshadowed by the confusion they bring, and not worth maintaining the 
> code that supports them (which, to be fair, isn't huge, but getting rid of 
> {{ReadCallback.AsyncRepairRunner}} wouldn't hurt for instance).
> Lastly, if the consensus here ends up being that they can have their use in 
> weird case and that we fill supporting those cases is worth confusing 
> everyone else and maintaining that code, I would still suggest disabling them 
> totally by default.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13813) Don't let user drop (or generally break) tables in system_distributed

2017-10-09 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196920#comment-16196920
 ] 

Sylvain Lebresne commented on CASSANDRA-13813:
--

bq.  But the way ALTER works, we serialise the whole table, including all 
params and all columns

Good point, I hadn't though about that part. Sad!

So I guess I would agree in principle about shielding user against clearly 
dysfunctional behaviors. The problem is that in practice I know for a fact that 
CASSANDRA-12701 has been an issue for some users, where the tables had been 
growing way too much, to the point that being able to work-around that by 
setting a TTL manually probably override concerns about hypothetical future 
changes to defaults not being picked up.

Or to put it another way, none of this is ideal, but I wonder is "repair 
history tables regularly grows out of control regularly" isn't a bigger problem 
in practice than "future defaults changes to system tables may not be picked 
up". Anyway, again, not opposed on the current patch personally, but uneased by 
it, so wouldn't mind a few additional opinion to see if it's just me being 
difficult (which is possible).

> Don't let user drop (or generally break) tables in system_distributed
> -
>
> Key: CASSANDRA-13813
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13813
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Sylvain Lebresne
>Assignee: Aleksey Yeschenko
> Fix For: 3.0.x, 3.11.x
>
>
> There is not currently no particular restrictions on schema modifications to 
> tables of the {{system_distributed}} keyspace. This does mean you can drop 
> those tables, or even alter them in wrong ways like dropping or renaming 
> columns. All of which is guaranteed to break stuffs (that is, repair if you 
> mess up with on of it's table, or MVs if you mess up with 
> {{view_build_status}}).
> I'm pretty sure this was never intended and is an oversight of the condition 
> on {{ALTERABLE_SYSTEM_KEYSPACES}} in 
> [ClientState|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L397].
>  That condition is such that any keyspace not listed in 
> {{ALTERABLE_SYSTEM_KEYSPACES}} (which happens to be the case for 
> {{system_distributed}}) has no specific restrictions whatsoever, while given 
> the naming it's fair to assume the intention that exactly the opposite.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13813) Don't let user drop (or generally break) tables in system_distributed

2017-10-09 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196923#comment-16196923
 ] 

Aleksey Yeschenko commented on CASSANDRA-13813:
---

bq. so wouldn't mind a few additional opinion to see if it's just me being 
difficult (which is possible).

Oh, you've never been difficult. Neither have I. FWIW I don't feel very 
strongly about this going to 3.0.x vs. this going to 4.0 only. Worst case I'll 
just fix this for us internally.

Seeing that neither of us feels really strongly about this, I don't mind 
getting some opinions from others, either. I'll throw a signal on IRC and 
hopefully someone will reply. Either way it's not urgent.

> Don't let user drop (or generally break) tables in system_distributed
> -
>
> Key: CASSANDRA-13813
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13813
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Sylvain Lebresne
>Assignee: Aleksey Yeschenko
> Fix For: 3.0.x, 3.11.x
>
>
> There is not currently no particular restrictions on schema modifications to 
> tables of the {{system_distributed}} keyspace. This does mean you can drop 
> those tables, or even alter them in wrong ways like dropping or renaming 
> columns. All of which is guaranteed to break stuffs (that is, repair if you 
> mess up with on of it's table, or MVs if you mess up with 
> {{view_build_status}}).
> I'm pretty sure this was never intended and is an oversight of the condition 
> on {{ALTERABLE_SYSTEM_KEYSPACES}} in 
> [ClientState|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L397].
>  That condition is such that any keyspace not listed in 
> {{ALTERABLE_SYSTEM_KEYSPACES}} (which happens to be the case for 
> {{system_distributed}}) has no specific restrictions whatsoever, while given 
> the naming it's fair to assume the intention that exactly the opposite.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12701) Repair history tables should have TTL and TWCS

2017-10-09 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196954#comment-16196954
 ] 

Aleksey Yeschenko commented on CASSANDRA-12701:
---

CASSANDRA-13813 has implications on the workaround suggested here. I favour 
tackling CASSANDRA-13813 independently, and finding a way to correct 
CASSANDRA-12701 in this JIRA, without blocking the former on the latter. If you 
feel the same, or otherwise, or have a third option, please let your opinion 
known in CASSANDRA-13813 comments. Cheers.

> Repair history tables should have TTL and TWCS
> --
>
> Key: CASSANDRA-12701
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12701
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Lohfink
>  Labels: lhf
> Attachments: CASSANDRA-12701.txt
>
>
> Some tools schedule a lot of small subrange repairs which can lead to a lot 
> of repairs constantly being run. These partitions can grow pretty big in 
> theory. I dont think much reads from them which might help but its still 
> kinda wasted disk space. I think a month TTL (longer than gc grace) and maybe 
> a 1 day twcs window makes sense to me.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10404) Node to Node encryption transitional mode

2017-10-09 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196999#comment-16196999
 ] 

Stefan Podkowinski commented on CASSANDRA-10404:


bq. As I explained in the previous comment, this is the trickiest part of this 
patch. The upgraded node, after it bounces, must have at least one 3.0 node 
connect to it

1) Would it make sense to fallback to {{SystemKeyspace.getReleaseVersion(ep)}} 
in case we don't have the version available through gossip? The method seems to 
be dead code by now, but the "peers" table is still being updated.

bq. Maybe we can add another property under the server_encryption_options, 
something like enable_legacy_ssl_storage_port. That would also clean up 
MessagingService#listen a little bit. wdyt?

2) Having that flag next to the new {{enabled}} flag should work. The yaml file 
needs attention during upgrade anyways. So if you upgrade from 3.0 with ssl 
enabled, you'd have to set both "enabled: true" and  
"enable_legacy_ssl_storage_port: true" in your config.

3) Hostname verification: I've pushed a commit 
[here|https://github.com/spodkowinski/cassandra/commit/fb2ca6ee87ccc5a8dcb92739237f21a49585ec7a]
 that will honor the {{require_endpoint_verification}} flag for incoming 
connections.

4) If we want to avoid potential attacks with invalid or stolen certificates, 
we should also enable {{require_client_auth}} by default. This should not cause 
any issues, as the truststores need to be managed for outgoing connections 
anyways. So why not validate incoming connections as well?


> Node to Node encryption transitional mode
> -
>
> Key: CASSANDRA-10404
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10404
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Tom Lewis
>Assignee: Jason Brown
> Fix For: 4.x
>
>
> Create a transitional mode for encryption that allows encrypted and 
> unencrypted traffic node-to-node during a change over to encryption from 
> unencrypted. This alleviates downtime during the switch.
>  This is similar to CASSANDRA-10559 which is intended for client-to-node



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13813) Don't let user drop (or generally break) tables in system_distributed

2017-10-09 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197002#comment-16197002
 ] 

Jeremiah Jordan commented on CASSANDRA-13813:
-

I am a little concerned about this change not letting anything be updated, but 
I do understand the reasons, and I can't really see a way around them. Given 
that an experience person can still get around this restriction by doing 
inserts into the schema tables, that is probably enough if there are any future 
bugs to be worked around.  Un-experienced users should not be changing these 
values by themselves.

> Don't let user drop (or generally break) tables in system_distributed
> -
>
> Key: CASSANDRA-13813
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13813
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Sylvain Lebresne
>Assignee: Aleksey Yeschenko
> Fix For: 3.0.x, 3.11.x
>
>
> There is not currently no particular restrictions on schema modifications to 
> tables of the {{system_distributed}} keyspace. This does mean you can drop 
> those tables, or even alter them in wrong ways like dropping or renaming 
> columns. All of which is guaranteed to break stuffs (that is, repair if you 
> mess up with on of it's table, or MVs if you mess up with 
> {{view_build_status}}).
> I'm pretty sure this was never intended and is an oversight of the condition 
> on {{ALTERABLE_SYSTEM_KEYSPACES}} in 
> [ClientState|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L397].
>  That condition is such that any keyspace not listed in 
> {{ALTERABLE_SYSTEM_KEYSPACES}} (which happens to be the case for 
> {{system_distributed}}) has no specific restrictions whatsoever, while given 
> the naming it's fair to assume the intention that exactly the opposite.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread

2017-10-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197194#comment-16197194
 ] 

ASF GitHub Bot commented on CASSANDRA-13265:


Github user christian-esken closed the pull request at:

https://github.com/apache/cassandra/pull/95


> Expiration in OutboundTcpConnection can block the reader Thread
> ---
>
> Key: CASSANDRA-13265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 
> 1.8.0_112-b15)
> Linux 3.16
>Reporter: Christian Esken
>Assignee: Christian Esken
> Fix For: 3.0.14, 3.11.0, 4.0
>
> Attachments: cassandra-13265-2.2-dtest_stdout.txt, 
> cassandra-13265-trun-dtest_stdout.txt, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to 
> communicate to the other nodes. This can happen at any time, during peak load 
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the 
> situation and am already developing a possible fix. Here is the analysis so 
> far:
> - A Threaddump in this situation showed  324 Threads in the 
> OutboundTcpConnection class that want to lock the backlog queue for doing 
> expiration.
> - A class histogram shows 262508 instances of 
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain 
> amount of queued messages, it starts thrashing itself to death. Each of the 
> Thread fully locks the Queue for reading and writing by calling 
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually 
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and 
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be 
> starved which makes the situation even worse.
> -
> The setup is:
>  - 3-node cluster
>  - replication factor 2
>  - Consistency LOCAL_ONE
>  - No remote DC's
>  - high write throughput (10 INSERT statements per second and more during 
> peak times).
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread

2017-10-09 Thread Christian Esken (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197195#comment-16197195
 ] 

Christian Esken commented on CASSANDRA-13265:
-

PR closed: https://github.com/apache/cassandra/pull/95

> Expiration in OutboundTcpConnection can block the reader Thread
> ---
>
> Key: CASSANDRA-13265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 
> 1.8.0_112-b15)
> Linux 3.16
>Reporter: Christian Esken
>Assignee: Christian Esken
> Fix For: 3.0.14, 3.11.0, 4.0
>
> Attachments: cassandra-13265-2.2-dtest_stdout.txt, 
> cassandra-13265-trun-dtest_stdout.txt, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to 
> communicate to the other nodes. This can happen at any time, during peak load 
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the 
> situation and am already developing a possible fix. Here is the analysis so 
> far:
> - A Threaddump in this situation showed  324 Threads in the 
> OutboundTcpConnection class that want to lock the backlog queue for doing 
> expiration.
> - A class histogram shows 262508 instances of 
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain 
> amount of queued messages, it starts thrashing itself to death. Each of the 
> Thread fully locks the Queue for reading and writing by calling 
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually 
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and 
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be 
> starved which makes the situation even worse.
> -
> The setup is:
>  - 3-node cluster
>  - replication factor 2
>  - Consistency LOCAL_ONE
>  - No remote DC's
>  - high write throughput (10 INSERT statements per second and more during 
> peak times).
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread

2017-10-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197193#comment-16197193
 ] 

ASF GitHub Bot commented on CASSANDRA-13265:


Github user christian-esken commented on the issue:

https://github.com/apache/cassandra/pull/95
  
Closing PR, as it has been merged in all relevant banches. See 
https://issues.apache.org/jira/browse/CASSANDRA-13265


> Expiration in OutboundTcpConnection can block the reader Thread
> ---
>
> Key: CASSANDRA-13265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 
> 1.8.0_112-b15)
> Linux 3.16
>Reporter: Christian Esken
>Assignee: Christian Esken
> Fix For: 3.0.14, 3.11.0, 4.0
>
> Attachments: cassandra-13265-2.2-dtest_stdout.txt, 
> cassandra-13265-trun-dtest_stdout.txt, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to 
> communicate to the other nodes. This can happen at any time, during peak load 
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the 
> situation and am already developing a possible fix. Here is the analysis so 
> far:
> - A Threaddump in this situation showed  324 Threads in the 
> OutboundTcpConnection class that want to lock the backlog queue for doing 
> expiration.
> - A class histogram shows 262508 instances of 
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain 
> amount of queued messages, it starts thrashing itself to death. Each of the 
> Thread fully locks the Queue for reading and writing by calling 
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually 
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and 
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be 
> starved which makes the situation even worse.
> -
> The setup is:
>  - 3-node cluster
>  - replication factor 2
>  - Consistency LOCAL_ONE
>  - No remote DC's
>  - high write throughput (10 INSERT statements per second and more during 
> peak times).
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements

2017-10-09 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197294#comment-16197294
 ] 

Ariel Weisberg commented on CASSANDRA-13442:


bq. Considering this seems to be mostly about reducing storage costs so write 
bound workloads can run "dense" nodes, 
It's not either/or as these two things compose further reducing costs.

Dense nodes don't reduce costs 10-20x. With dense nodes you still need to pay 
for the additional RAM, disk, and datacenter power, cooling do go up slightly 
as you stuff more power utilization into each box. Dense nodes also still need 
to process each request so you also need to scale up read and write throughput 
which is not always a dimension we are claiming to improve with dense nodes.

Dense nodes won't let you increase replication on hardware where you can't fit 
an entire replica of your data set in most cases. Such as racks or DCs in a 
region that have limited capacity and by limited I mean many times less 
capacity. What do we expect from dense nodes? 2x? 4x? Are all use cases going 
to behave well with various strategies we use to get to dense nodes?

bq. While this idea does seem interesting, it seems very complex and you are 
still trading off replicas for additional storage. 
The target is 10x to 20x less storage. So additional storage yes, but not the 
same order of magnitude. In other words we pay something (complexity) and we 
get something (some replicas require 10-20x less hardware).

I also think 10-20x storage savings is a conservative estimate assuming the 
worse case utilization during an outage where transient data must be stored at 
transient replicas. With vnodes data would be spread out over several nodes so 
the additional utilization at each node could be substantially less.

bq. Seems that the primary use case would be multiple datacenters with 
transient replicas, which granted would be nice, 
Multiple data centers aren't required to benefit. Many people will be able to 
go from RF=3 in a DC today to RF=5 and lose two nodes with no availability or 
data loss instead of just one node.

There are other permutations where being able to inexpensively add a transient 
replica can increase availability like RF=3 with one replica at each DC. Write 
at CL.ALL, read at LOCAL_ONE, fall back to reading from a remote DC if 
LOCAL_ONE fails. You get strong consistency, but not write availability. Add a 
transient replicas at each DC and write at EACH_QUORUM and you get write 
availability after a single node fails.

bq. you're probably able to just store less replicas in each datacenter anyway, 
at least if we had more flexible consistency levels.
I'm not sure what you mean by flexibility.

Not without losing either availability or consistency under failure scenarios. 
If you run RF=3 today with strong consistency you can't drop to RF=2 without 
losing availability if there is a node failure.

> Support a means of strongly consistent highly available replication with 
> tunable storage requirements
> -
>
> Key: CASSANDRA-13442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13442
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Coordination, Distributed Metadata, Local 
> Write-Read Paths
>Reporter: Ariel Weisberg
>
> Replication factors like RF=2 can't provide strong consistency and 
> availability because if a single node is lost it's impossible to reach a 
> quorum of replicas. Stepping up to RF=3 will allow you to lose a node and 
> still achieve quorum for reads and writes, but requires committing additional 
> storage.
> The requirement of a quorum for writes/reads doesn't seem to be something 
> that can be relaxed without additional constraints on queries, but it seems 
> like it should be possible to relax the requirement that 3 full copies of the 
> entire data set are kept. What is actually required is a covering data set 
> for the range and we should be able to achieve a covering data set and high 
> availability without having three full copies. 
> After a repair we know that some subset of the data set is fully replicated. 
> At that point we don't have to read from a quorum of nodes for the repaired 
> data. It is sufficient to read from a single node for the repaired data and a 
> quorum of nodes for the unrepaired data.
> One way to exploit this would be to have N replicas, say the last N replicas 
> (where N varies with RF) in the preference list, delete all repaired data 
> after a repair completes. Subsequent quorum reads will be able to retrieve 
> the repaired data from any of the two full replicas and the unrepaired data 
> from a quorum read of any replica including the "transient" replicas.
> Configuration 

[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements

2017-10-09 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197326#comment-16197326
 ] 

Jeff Jirsa commented on CASSANDRA-13442:


{quote}
Considering this seems to be mostly about reducing storage costs so write bound 
workloads can run "dense" nodes, and storage is meant to be cheap, it seems to 
me a less complex alternative would just be to remove the barriers to having 
large amounts of physical storage per node.
{quote}

Everything is meant to be cheap, but that doesn't mean it is.

In a reasonably sized cluster (for example, 250 nodes * 2 datacenters * 
4tb/node = 2 million GB of disk). This ticket would reduce that to something 
closer to 1,340,000 GB of disk for a cluster of that nature.

Enterprise SSDs still retail for $0.50/GB. Let's pretend you get a great deal 
and you're paying $0.25/GB. The cost differential is $335k vs $500k, for a 
single cluster.
If you're on AWS and using GP2 EBS, that's $0.10/GB/month. The cost 
differential is $134k/month vs $200k/month, or about $1.6M/year. Per cluster. 

That's JUST DISK savings, even if we pretend like everything else is free (and 
it's not).

If you feel like there's more ROI to win by having denser storage, I'm sure 
nobody would mind seeing patches.


> Support a means of strongly consistent highly available replication with 
> tunable storage requirements
> -
>
> Key: CASSANDRA-13442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13442
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Coordination, Distributed Metadata, Local 
> Write-Read Paths
>Reporter: Ariel Weisberg
>
> Replication factors like RF=2 can't provide strong consistency and 
> availability because if a single node is lost it's impossible to reach a 
> quorum of replicas. Stepping up to RF=3 will allow you to lose a node and 
> still achieve quorum for reads and writes, but requires committing additional 
> storage.
> The requirement of a quorum for writes/reads doesn't seem to be something 
> that can be relaxed without additional constraints on queries, but it seems 
> like it should be possible to relax the requirement that 3 full copies of the 
> entire data set are kept. What is actually required is a covering data set 
> for the range and we should be able to achieve a covering data set and high 
> availability without having three full copies. 
> After a repair we know that some subset of the data set is fully replicated. 
> At that point we don't have to read from a quorum of nodes for the repaired 
> data. It is sufficient to read from a single node for the repaired data and a 
> quorum of nodes for the unrepaired data.
> One way to exploit this would be to have N replicas, say the last N replicas 
> (where N varies with RF) in the preference list, delete all repaired data 
> after a repair completes. Subsequent quorum reads will be able to retrieve 
> the repaired data from any of the two full replicas and the unrepaired data 
> from a quorum read of any replica including the "transient" replicas.
> Configuration for something like this in NTS might be something similar to { 
> DC1="3-1", DC2="3-2" } where the first value is the replication factor used 
> for consistency and the second values is the number of transient replicas. If 
> you specify { DC1=3, DC2=3 } then the number of transient replicas defaults 
> to 0 and you get the same behavior you have today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements

2017-10-09 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197326#comment-16197326
 ] 

Jeff Jirsa edited comment on CASSANDRA-13442 at 10/9/17 5:17 PM:
-

{quote}
Considering this seems to be mostly about reducing storage costs so write bound 
workloads can run "dense" nodes, and storage is meant to be cheap, it seems to 
me a less complex alternative would just be to remove the barriers to having 
large amounts of physical storage per node.
{quote}

Everything is meant to be cheap, but that doesn't mean it is.

In a reasonably sized cluster (for example, 250 nodes * 2 datacenters * 
4tb/node = 2 million GB of disk). This ticket would reduce that to something 
closer to 1,340,000 GB of disk for a cluster of that nature.

Enterprise SSDs still retail for $0.50/GB. Let's pretend you get a great deal 
and you're paying $0.25/GB. The cost differential is $335k vs $500k, for a 
single cluster.
If you're on AWS and using GP2 EBS, that's $0.10/GB/month. The cost 
differential is $134k/month vs $200k/month, or about $800k/year. Per cluster. 

That's JUST DISK savings, even if we pretend like everything else is free (and 
it's not).

If you feel like there's more ROI to win by having denser storage, I'm sure 
nobody would mind seeing patches.



was (Author: jjirsa):
{quote}
Considering this seems to be mostly about reducing storage costs so write bound 
workloads can run "dense" nodes, and storage is meant to be cheap, it seems to 
me a less complex alternative would just be to remove the barriers to having 
large amounts of physical storage per node.
{quote}

Everything is meant to be cheap, but that doesn't mean it is.

In a reasonably sized cluster (for example, 250 nodes * 2 datacenters * 
4tb/node = 2 million GB of disk). This ticket would reduce that to something 
closer to 1,340,000 GB of disk for a cluster of that nature.

Enterprise SSDs still retail for $0.50/GB. Let's pretend you get a great deal 
and you're paying $0.25/GB. The cost differential is $335k vs $500k, for a 
single cluster.
If you're on AWS and using GP2 EBS, that's $0.10/GB/month. The cost 
differential is $134k/month vs $200k/month, or about $1.6M/year. Per cluster. 

That's JUST DISK savings, even if we pretend like everything else is free (and 
it's not).

If you feel like there's more ROI to win by having denser storage, I'm sure 
nobody would mind seeing patches.


> Support a means of strongly consistent highly available replication with 
> tunable storage requirements
> -
>
> Key: CASSANDRA-13442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13442
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Coordination, Distributed Metadata, Local 
> Write-Read Paths
>Reporter: Ariel Weisberg
>
> Replication factors like RF=2 can't provide strong consistency and 
> availability because if a single node is lost it's impossible to reach a 
> quorum of replicas. Stepping up to RF=3 will allow you to lose a node and 
> still achieve quorum for reads and writes, but requires committing additional 
> storage.
> The requirement of a quorum for writes/reads doesn't seem to be something 
> that can be relaxed without additional constraints on queries, but it seems 
> like it should be possible to relax the requirement that 3 full copies of the 
> entire data set are kept. What is actually required is a covering data set 
> for the range and we should be able to achieve a covering data set and high 
> availability without having three full copies. 
> After a repair we know that some subset of the data set is fully replicated. 
> At that point we don't have to read from a quorum of nodes for the repaired 
> data. It is sufficient to read from a single node for the repaired data and a 
> quorum of nodes for the unrepaired data.
> One way to exploit this would be to have N replicas, say the last N replicas 
> (where N varies with RF) in the preference list, delete all repaired data 
> after a repair completes. Subsequent quorum reads will be able to retrieve 
> the repaired data from any of the two full replicas and the unrepaired data 
> from a quorum read of any replica including the "transient" replicas.
> Configuration for something like this in NTS might be something similar to { 
> DC1="3-1", DC2="3-2" } where the first value is the replication factor used 
> for consistency and the second values is the number of transient replicas. If 
> you specify { DC1=3, DC2=3 } then the number of transient replicas defaults 
> to 0 and you get the same behavior you have today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-

[jira] [Updated] (CASSANDRA-13943) Infinite compaction of L0 SSTables in JBOD

2017-10-09 Thread Dan Kinder (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Kinder updated CASSANDRA-13943:
---
Attachment: debug.log

> Infinite compaction of L0 SSTables in JBOD
> --
>
> Key: CASSANDRA-13943
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13943
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: Cassandra 3.11.0 / Centos 6
>Reporter: Dan Kinder
> Attachments: debug.log
>
>
> I recently upgraded from 2.2.6 to 3.11.0.
> I am seeing Cassandra loop infinitely compacting the same data over and over. 
> Attaching logs.
> It is compacting two tables, one on /srv/disk10, the other on /srv/disk1. It 
> does create new SSTables but immediately recompacts again. Note that I am not 
> inserting anything at the moment, there is no flushing happening on this 
> table (Memtable switch count has not changed).
> My theory is that it somehow thinks those should be compaction candidates. 
> But they shouldn't be, they are on different disks and I ran nodetool 
> relocatesstables as well as nodetool compact. So, it tries to compact them 
> together, but the compaction results in the exact same 2 SSTables on the 2 
> disks, because the keys are split by data disk.
> This is pretty serious, because all our nodes right now are consuming CPU 
> doing this for multiple tables, it seems.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-13943) Infinite compaction of L0 SSTables in JBOD

2017-10-09 Thread Dan Kinder (JIRA)
Dan Kinder created CASSANDRA-13943:
--

 Summary: Infinite compaction of L0 SSTables in JBOD
 Key: CASSANDRA-13943
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13943
 Project: Cassandra
  Issue Type: Bug
  Components: Compaction
 Environment: Cassandra 3.11.0 / Centos 6
Reporter: Dan Kinder


I recently upgraded from 2.2.6 to 3.11.0.

I am seeing Cassandra loop infinitely compacting the same data over and over. 
Attaching logs.

It is compacting two tables, one on /srv/disk10, the other on /srv/disk1. It 
does create new SSTables but immediately recompacts again. Note that I am not 
inserting anything at the moment, there is no flushing happening on this table 
(Memtable switch count has not changed).

My theory is that it somehow thinks those should be compaction candidates. But 
they shouldn't be, they are on different disks and I ran nodetool 
relocatesstables as well as nodetool compact. So, it tries to compact them 
together, but the compaction results in the exact same 2 SSTables on the 2 
disks, because the keys are split by data disk.

This is pretty serious, because all our nodes right now are consuming CPU doing 
this for multiple tables, it seems.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-13943) Infinite compaction of L0 SSTables in JBOD

2017-10-09 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson reassigned CASSANDRA-13943:
---

Assignee: Marcus Eriksson

> Infinite compaction of L0 SSTables in JBOD
> --
>
> Key: CASSANDRA-13943
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13943
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: Cassandra 3.11.0 / Centos 6
>Reporter: Dan Kinder
>Assignee: Marcus Eriksson
> Attachments: debug.log
>
>
> I recently upgraded from 2.2.6 to 3.11.0.
> I am seeing Cassandra loop infinitely compacting the same data over and over. 
> Attaching logs.
> It is compacting two tables, one on /srv/disk10, the other on /srv/disk1. It 
> does create new SSTables but immediately recompacts again. Note that I am not 
> inserting anything at the moment, there is no flushing happening on this 
> table (Memtable switch count has not changed).
> My theory is that it somehow thinks those should be compaction candidates. 
> But they shouldn't be, they are on different disks and I ran nodetool 
> relocatesstables as well as nodetool compact. So, it tries to compact them 
> together, but the compaction results in the exact same 2 SSTables on the 2 
> disks, because the keys are split by data disk.
> This is pretty serious, because all our nodes right now are consuming CPU 
> doing this for multiple tables, it seems.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13943) Infinite compaction of L0 SSTables in JBOD

2017-10-09 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197350#comment-16197350
 ] 

Marcus Eriksson commented on CASSANDRA-13943:
-

{{/srv/disk10/..., /srv/disk1/...}} - I guess there is a prefix matching 
problem somewhere - I'll get a patch out tomorrow


> Infinite compaction of L0 SSTables in JBOD
> --
>
> Key: CASSANDRA-13943
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13943
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: Cassandra 3.11.0 / Centos 6
>Reporter: Dan Kinder
> Attachments: debug.log
>
>
> I recently upgraded from 2.2.6 to 3.11.0.
> I am seeing Cassandra loop infinitely compacting the same data over and over. 
> Attaching logs.
> It is compacting two tables, one on /srv/disk10, the other on /srv/disk1. It 
> does create new SSTables but immediately recompacts again. Note that I am not 
> inserting anything at the moment, there is no flushing happening on this 
> table (Memtable switch count has not changed).
> My theory is that it somehow thinks those should be compaction candidates. 
> But they shouldn't be, they are on different disks and I ran nodetool 
> relocatesstables as well as nodetool compact. So, it tries to compact them 
> together, but the compaction results in the exact same 2 SSTables on the 2 
> disks, because the keys are split by data disk.
> This is pretty serious, because all our nodes right now are consuming CPU 
> doing this for multiple tables, it seems.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13930) Avoid grabbing the read lock when checking if compaction strategy should do defragmentation

2017-10-09 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197374#comment-16197374
 ] 

Jeff Jirsa commented on CASSANDRA-13930:


lgtm if dtests are happy (expect it should be fine). 


> Avoid grabbing the read lock when checking if compaction strategy should do 
> defragmentation
> ---
>
> Key: CASSANDRA-13930
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13930
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 3.11.x, 4.x
>
>
> We grab the read lock when checking whether the compaction strategy benefits 
> from defragmentation, avoid that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13942) Open Cassandra.yaml for developers to extend custom settings

2017-10-09 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197383#comment-16197383
 ] 

Blake Eggleston commented on CASSANDRA-13942:
-

you should be able to achieve this by providing your own 
{{ConfigurationLoader}} implementation. You can't extend DatabaseDescriptor, 
but you would be able to configure a class referenced by your index classes.

See {{DatabaseDescriptor#loadConfig}}

> Open Cassandra.yaml for developers to extend custom settings
> 
>
> Key: CASSANDRA-13942
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13942
> Project: Cassandra
>  Issue Type: Wish
>  Components: Configuration
>Reporter: zhaoyan
>
> we now try to write one index plugin for cassandra.
> we want to put some more settings in cassandra.yaml. and read it in our code.
> we find the cassandra use DatabaseDescriptor.java and Config.java to save the 
> configurations in cassandra.yaml. but we cant extend it
> so I advice cassandra provide some interfaces for deleopers to extend custom 
> settings
> Thank you



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements

2017-10-09 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197385#comment-16197385
 ] 

DOAN DuyHai commented on CASSANDRA-13442:
-

{quote}The target is 10x to 20x less storage{quote}

As far as I understand, with RF=3, if you remove repaired data on transient 
replicas, you'll reduce storage by 1/3. Where do you get this 10x - 20x then ?

> Support a means of strongly consistent highly available replication with 
> tunable storage requirements
> -
>
> Key: CASSANDRA-13442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13442
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Coordination, Distributed Metadata, Local 
> Write-Read Paths
>Reporter: Ariel Weisberg
>
> Replication factors like RF=2 can't provide strong consistency and 
> availability because if a single node is lost it's impossible to reach a 
> quorum of replicas. Stepping up to RF=3 will allow you to lose a node and 
> still achieve quorum for reads and writes, but requires committing additional 
> storage.
> The requirement of a quorum for writes/reads doesn't seem to be something 
> that can be relaxed without additional constraints on queries, but it seems 
> like it should be possible to relax the requirement that 3 full copies of the 
> entire data set are kept. What is actually required is a covering data set 
> for the range and we should be able to achieve a covering data set and high 
> availability without having three full copies. 
> After a repair we know that some subset of the data set is fully replicated. 
> At that point we don't have to read from a quorum of nodes for the repaired 
> data. It is sufficient to read from a single node for the repaired data and a 
> quorum of nodes for the unrepaired data.
> One way to exploit this would be to have N replicas, say the last N replicas 
> (where N varies with RF) in the preference list, delete all repaired data 
> after a repair completes. Subsequent quorum reads will be able to retrieve 
> the repaired data from any of the two full replicas and the unrepaired data 
> from a quorum read of any replica including the "transient" replicas.
> Configuration for something like this in NTS might be something similar to { 
> DC1="3-1", DC2="3-2" } where the first value is the replication factor used 
> for consistency and the second values is the number of transient replicas. If 
> you specify { DC1=3, DC2=3 } then the number of transient replicas defaults 
> to 0 and you get the same behavior you have today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13943) Infinite compaction of L0 SSTables in JBOD

2017-10-09 Thread Dan Kinder (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197459#comment-16197459
 ] 

Dan Kinder commented on CASSANDRA-13943:


I do see a questionable {{startsWith}} here: 
https://github.com/apache/cassandra/blob/7d4d1a32581ff40ed1049833631832054bcf2316/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java#L309

Also here: 
https://github.com/apache/cassandra/blob/3cec208c40b85e1be0ff8c68fc9d9017945a1ed8/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L570

> Infinite compaction of L0 SSTables in JBOD
> --
>
> Key: CASSANDRA-13943
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13943
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: Cassandra 3.11.0 / Centos 6
>Reporter: Dan Kinder
>Assignee: Marcus Eriksson
> Attachments: debug.log
>
>
> I recently upgraded from 2.2.6 to 3.11.0.
> I am seeing Cassandra loop infinitely compacting the same data over and over. 
> Attaching logs.
> It is compacting two tables, one on /srv/disk10, the other on /srv/disk1. It 
> does create new SSTables but immediately recompacts again. Note that I am not 
> inserting anything at the moment, there is no flushing happening on this 
> table (Memtable switch count has not changed).
> My theory is that it somehow thinks those should be compaction candidates. 
> But they shouldn't be, they are on different disks and I ran nodetool 
> relocatesstables as well as nodetool compact. So, it tries to compact them 
> together, but the compaction results in the exact same 2 SSTables on the 2 
> disks, because the keys are split by data disk.
> This is pretty serious, because all our nodes right now are consuming CPU 
> doing this for multiple tables, it seems.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13930) Avoid grabbing the read lock when checking if compaction strategy should do defragmentation

2017-10-09 Thread Eduard Tudenhoefner (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197502#comment-16197502
 ] 

Eduard Tudenhoefner commented on CASSANDRA-13930:
-

changes LGTM. Looks like most dtests failed because they couldn't clone the 
repo.

> Avoid grabbing the read lock when checking if compaction strategy should do 
> defragmentation
> ---
>
> Key: CASSANDRA-13930
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13930
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 3.11.x, 4.x
>
>
> We grab the read lock when checking whether the compaction strategy benefits 
> from defragmentation, avoid that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13930) Avoid grabbing the read lock when checking if compaction strategy should do defragmentation

2017-10-09 Thread Eduard Tudenhoefner (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197502#comment-16197502
 ] 

Eduard Tudenhoefner edited comment on CASSANDRA-13930 at 10/9/17 6:58 PM:
--

changes LGTM. Looks like majority of the failed dtests are because they 
couldn't clone the repo.


was (Author: eduard.tudenhoefner):
changes LGTM. Looks like most dtests failed because they couldn't clone the 
repo.

> Avoid grabbing the read lock when checking if compaction strategy should do 
> defragmentation
> ---
>
> Key: CASSANDRA-13930
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13930
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 3.11.x, 4.x
>
>
> We grab the read lock when checking whether the compaction strategy benefits 
> from defragmentation, avoid that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13834) nodetool resetlocalschema failed because of JMX registration

2017-10-09 Thread Eduard Tudenhoefner (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197515#comment-16197515
 ] 

Eduard Tudenhoefner commented on CASSANDRA-13834:
-

code changes LGTM but needs an official contributor for a review/commit. 
[~jjirsa] can you review maybe?

I wonder if we should have a more tolerant wrapper around {{MBeanServer}} that 
just allows you to register an mbean no matter what and without all the hassle 
of needing to deal with a potential {{InstanceAlreadyExistsException}} or 
needing to check for {{mbs.isRegistered(objectName)}}.

> nodetool resetlocalschema failed because of JMX registration
> 
>
> Key: CASSANDRA-13834
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13834
> Project: Cassandra
>  Issue Type: Bug
>  Components: Metrics
>Reporter: vincent royer
>Priority: Minor
>  Labels: easyfix
> Fix For: 3.11.0
>
> Attachments: 0001-CASSANDRA-13834.patch
>
>
> nodetool resetlocalschema failed because of the following exception.
> This is because the table MBean was already registred in the MBeanServer with 
> the same name.
> 2017-08-31 14:00:57,989 ERROR [InternalResponseStage:18] 
> CassandraDaemon.java:231 uncaughtException Exception in thread 
> Thread[InternalResponseStage:18,5,main]
> java.lang.RuntimeException: javax.management.InstanceAlreadyExistsException: 
> org.apache.cassandra.db:type=Tables,key
> space=elastic_admin,table=metadata
> at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:468)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:618)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:592)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:583)
> at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:414)
> at org.apache.cassandra.config.Schema.addTable(Schema.java:609)
> at 
> java.util.LinkedHashMap$LinkedValues.forEach(LinkedHashMap.java:608)
> at 
> java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080)
> at 
> org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1421)
> at 
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1386)
> at 
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1336)
> at 
> org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:91)
> at 
> org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13834) nodetool resetlocalschema failed because of JMX registration

2017-10-09 Thread Eduard Tudenhoefner (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197515#comment-16197515
 ] 

Eduard Tudenhoefner edited comment on CASSANDRA-13834 at 10/9/17 7:08 PM:
--

code changes LGTM but needs an official contributor for a review/commit. 
[~jjirsa] can you review maybe?

I wonder if we should have a more tolerant wrapper around {{MBeanServer}} that 
just allows you to register an mbean no matter what and without all the hassle 
of needing to deal with a potential {{InstanceAlreadyExistsException}} or 
needing to check for {{mbs.isRegistered(objectName)}}. There are a bunch of 
other places where something like that could happen.


was (Author: eduard.tudenhoefner):
code changes LGTM but needs an official contributor for a review/commit. 
[~jjirsa] can you review maybe?

I wonder if we should have a more tolerant wrapper around {{MBeanServer}} that 
just allows you to register an mbean no matter what and without all the hassle 
of needing to deal with a potential {{InstanceAlreadyExistsException}} or 
needing to check for {{mbs.isRegistered(objectName)}}.

> nodetool resetlocalschema failed because of JMX registration
> 
>
> Key: CASSANDRA-13834
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13834
> Project: Cassandra
>  Issue Type: Bug
>  Components: Metrics
>Reporter: vincent royer
>Priority: Minor
>  Labels: easyfix
> Fix For: 3.11.0
>
> Attachments: 0001-CASSANDRA-13834.patch
>
>
> nodetool resetlocalschema failed because of the following exception.
> This is because the table MBean was already registred in the MBeanServer with 
> the same name.
> 2017-08-31 14:00:57,989 ERROR [InternalResponseStage:18] 
> CassandraDaemon.java:231 uncaughtException Exception in thread 
> Thread[InternalResponseStage:18,5,main]
> java.lang.RuntimeException: javax.management.InstanceAlreadyExistsException: 
> org.apache.cassandra.db:type=Tables,key
> space=elastic_admin,table=metadata
> at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:468)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:618)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:592)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:583)
> at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:414)
> at org.apache.cassandra.config.Schema.addTable(Schema.java:609)
> at 
> java.util.LinkedHashMap$LinkedValues.forEach(LinkedHashMap.java:608)
> at 
> java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080)
> at 
> org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1421)
> at 
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1386)
> at 
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1336)
> at 
> org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:91)
> at 
> org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13834) nodetool resetlocalschema failed because of JMX registration

2017-10-09 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197545#comment-16197545
 ] 

Jeff Jirsa commented on CASSANDRA-13834:


I don't expect to have availability to properly review, even though it looks 
like a trivial fix. 

> nodetool resetlocalschema failed because of JMX registration
> 
>
> Key: CASSANDRA-13834
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13834
> Project: Cassandra
>  Issue Type: Bug
>  Components: Metrics
>Reporter: vincent royer
>Priority: Minor
>  Labels: easyfix
> Fix For: 3.11.0
>
> Attachments: 0001-CASSANDRA-13834.patch
>
>
> nodetool resetlocalschema failed because of the following exception.
> This is because the table MBean was already registred in the MBeanServer with 
> the same name.
> 2017-08-31 14:00:57,989 ERROR [InternalResponseStage:18] 
> CassandraDaemon.java:231 uncaughtException Exception in thread 
> Thread[InternalResponseStage:18,5,main]
> java.lang.RuntimeException: javax.management.InstanceAlreadyExistsException: 
> org.apache.cassandra.db:type=Tables,key
> space=elastic_admin,table=metadata
> at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:468)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:618)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:592)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:583)
> at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:414)
> at org.apache.cassandra.config.Schema.addTable(Schema.java:609)
> at 
> java.util.LinkedHashMap$LinkedValues.forEach(LinkedHashMap.java:608)
> at 
> java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080)
> at 
> org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1421)
> at 
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1386)
> at 
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1336)
> at 
> org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:91)
> at 
> org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

2017-10-09 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197546#comment-16197546
 ] 

Alex Petrov commented on CASSANDRA-10786:
-

All right, going to commit it tomorrow morning as there're no objections.

> Include hash of result set metadata in prepared statement id
> 
>
> Key: CASSANDRA-10786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: CQL
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: client-impacting, doc-impacting, protocolv5
> Fix For: 4.x
>
>
> *_Initial description:_*
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set 
> metadata in the md5. This way the md5 would change at step 3, and any client 
> using the old md5 would get an UNPREPARED, regardless of whether another 
> client already reprepared.
> -
> *_Resolution (2017/02/13):_*
> The following changes were made to native protocol v5:
> - the PREPARED response includes {{result_metadata_id}}, a hash of the result 
> set metadata.
> - every EXECUTE message must provide {{result_metadata_id}} in addition to 
> the prepared statement id. If it doesn't match the current one on the server, 
> it means the client is operating on a stale schema.
> - to notify the client, the server returns a ROWS response with a new 
> {{Metadata_changed}} flag, the new {{result_metadata_id}} and the updated 
> result metadata (this overrides the {{No_metadata}} flag, even if the client 
> had requested it)
> - the client updates its copy of the result metadata before it decodes the 
> results.
> So the scenario above would now look like:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, and 
> result set (b, c) that hashes to cde456
> # column a gets added to the table, C* does not invalidate its cache entry, 
> but only updates the result set to (a, b, c) which hashes to fff789
> # client sends an EXECUTE request for (statementId=abc123, resultId=cde456) 
> and skip_metadata flag
> # cde456!=fff789, so C* responds with ROWS(..., no_metadata=false, 
> metadata_changed=true, new_metadata_id=fff789,col specs for (a,b,c))
> # client updates its column specifications, and will send the next execute 
> queries with (statementId=abc123, resultId=fff789)
> This works the same with multiple clients.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-13834) nodetool resetlocalschema failed because of JMX registration

2017-10-09 Thread Jeff Jirsa (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa reassigned CASSANDRA-13834:
--

Assignee: vincent royer

> nodetool resetlocalschema failed because of JMX registration
> 
>
> Key: CASSANDRA-13834
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13834
> Project: Cassandra
>  Issue Type: Bug
>  Components: Metrics
>Reporter: vincent royer
>Assignee: vincent royer
>Priority: Minor
>  Labels: easyfix
> Fix For: 3.11.0
>
> Attachments: 0001-CASSANDRA-13834.patch
>
>
> nodetool resetlocalschema failed because of the following exception.
> This is because the table MBean was already registred in the MBeanServer with 
> the same name.
> 2017-08-31 14:00:57,989 ERROR [InternalResponseStage:18] 
> CassandraDaemon.java:231 uncaughtException Exception in thread 
> Thread[InternalResponseStage:18,5,main]
> java.lang.RuntimeException: javax.management.InstanceAlreadyExistsException: 
> org.apache.cassandra.db:type=Tables,key
> space=elastic_admin,table=metadata
> at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:468)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:618)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:592)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:583)
> at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:414)
> at org.apache.cassandra.config.Schema.addTable(Schema.java:609)
> at 
> java.util.LinkedHashMap$LinkedValues.forEach(LinkedHashMap.java:608)
> at 
> java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080)
> at 
> org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1421)
> at 
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1386)
> at 
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1336)
> at 
> org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:91)
> at 
> org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13834) nodetool resetlocalschema failed because of JMX registration

2017-10-09 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197545#comment-16197545
 ] 

Jeff Jirsa edited comment on CASSANDRA-13834 at 10/9/17 7:16 PM:
-

I don't expect to have availability to properly review, even though it looks 
like a trivial fix. Most reviewers will probably want to see a small regression 
test, though, which will likely be more effort than the actual patch. 


was (Author: jjirsa):
I don't expect to have availability to properly review, even though it looks 
like a trivial fix. 

> nodetool resetlocalschema failed because of JMX registration
> 
>
> Key: CASSANDRA-13834
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13834
> Project: Cassandra
>  Issue Type: Bug
>  Components: Metrics
>Reporter: vincent royer
>Assignee: vincent royer
>Priority: Minor
>  Labels: easyfix
> Fix For: 3.11.0
>
> Attachments: 0001-CASSANDRA-13834.patch
>
>
> nodetool resetlocalschema failed because of the following exception.
> This is because the table MBean was already registred in the MBeanServer with 
> the same name.
> 2017-08-31 14:00:57,989 ERROR [InternalResponseStage:18] 
> CassandraDaemon.java:231 uncaughtException Exception in thread 
> Thread[InternalResponseStage:18,5,main]
> java.lang.RuntimeException: javax.management.InstanceAlreadyExistsException: 
> org.apache.cassandra.db:type=Tables,key
> space=elastic_admin,table=metadata
> at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:468)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:618)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:592)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:583)
> at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:414)
> at org.apache.cassandra.config.Schema.addTable(Schema.java:609)
> at 
> java.util.LinkedHashMap$LinkedValues.forEach(LinkedHashMap.java:608)
> at 
> java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080)
> at 
> org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1421)
> at 
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1386)
> at 
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1336)
> at 
> org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:91)
> at 
> org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

2017-10-09 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197550#comment-16197550
 ] 

Jeff Jirsa commented on CASSANDRA-10786:


{quote}
. I'll create an additional ticket that would be a 4.0 blocker to pull in the 
latest release of both drivers and restore build.xml entries with the 
corresponding versions to make sure this is not getting missed.
{quote}

sounds good to me.


> Include hash of result set metadata in prepared statement id
> 
>
> Key: CASSANDRA-10786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: CQL
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: client-impacting, doc-impacting, protocolv5
> Fix For: 4.x
>
>
> *_Initial description:_*
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set 
> metadata in the md5. This way the md5 would change at step 3, and any client 
> using the old md5 would get an UNPREPARED, regardless of whether another 
> client already reprepared.
> -
> *_Resolution (2017/02/13):_*
> The following changes were made to native protocol v5:
> - the PREPARED response includes {{result_metadata_id}}, a hash of the result 
> set metadata.
> - every EXECUTE message must provide {{result_metadata_id}} in addition to 
> the prepared statement id. If it doesn't match the current one on the server, 
> it means the client is operating on a stale schema.
> - to notify the client, the server returns a ROWS response with a new 
> {{Metadata_changed}} flag, the new {{result_metadata_id}} and the updated 
> result metadata (this overrides the {{No_metadata}} flag, even if the client 
> had requested it)
> - the client updates its copy of the result metadata before it decodes the 
> results.
> So the scenario above would now look like:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, and 
> result set (b, c) that hashes to cde456
> # column a gets added to the table, C* does not invalidate its cache entry, 
> but only updates the result set to (a, b, c) which hashes to fff789
> # client sends an EXECUTE request for (statementId=abc123, resultId=cde456) 
> and skip_metadata flag
> # cde456!=fff789, so C* responds with ROWS(..., no_metadata=false, 
> metadata_changed=true, new_metadata_id=fff789,col specs for (a,b,c))
> # client updates its column specifications, and will send the next execute 
> queries with (statementId=abc123, resultId=fff789)
> This works the same with multiple clients.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements

2017-10-09 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197574#comment-16197574
 ] 

Ariel Weisberg commented on CASSANDRA-13442:


bq. As far as I understand, with RF=3, if you remove repaired data on transient 
replicas, you'll reduce storage by 1/3. Where do you get this 10x - 20x then ?
10-20x on transient replicas. Not at full replicas or overall. The new 
capability is adding replicas without having to commit the full amount of 
additional hardware.

> Support a means of strongly consistent highly available replication with 
> tunable storage requirements
> -
>
> Key: CASSANDRA-13442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13442
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Coordination, Distributed Metadata, Local 
> Write-Read Paths
>Reporter: Ariel Weisberg
>
> Replication factors like RF=2 can't provide strong consistency and 
> availability because if a single node is lost it's impossible to reach a 
> quorum of replicas. Stepping up to RF=3 will allow you to lose a node and 
> still achieve quorum for reads and writes, but requires committing additional 
> storage.
> The requirement of a quorum for writes/reads doesn't seem to be something 
> that can be relaxed without additional constraints on queries, but it seems 
> like it should be possible to relax the requirement that 3 full copies of the 
> entire data set are kept. What is actually required is a covering data set 
> for the range and we should be able to achieve a covering data set and high 
> availability without having three full copies. 
> After a repair we know that some subset of the data set is fully replicated. 
> At that point we don't have to read from a quorum of nodes for the repaired 
> data. It is sufficient to read from a single node for the repaired data and a 
> quorum of nodes for the unrepaired data.
> One way to exploit this would be to have N replicas, say the last N replicas 
> (where N varies with RF) in the preference list, delete all repaired data 
> after a repair completes. Subsequent quorum reads will be able to retrieve 
> the repaired data from any of the two full replicas and the unrepaired data 
> from a quorum read of any replica including the "transient" replicas.
> Configuration for something like this in NTS might be something similar to { 
> DC1="3-1", DC2="3-2" } where the first value is the replication factor used 
> for consistency and the second values is the number of transient replicas. If 
> you specify { DC1=3, DC2=3 } then the number of transient replicas defaults 
> to 0 and you get the same behavior you have today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements

2017-10-09 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197574#comment-16197574
 ] 

Ariel Weisberg edited comment on CASSANDRA-13442 at 10/9/17 7:33 PM:
-

bq. As far as I understand, with RF=3, if you remove repaired data on transient 
replicas, you'll reduce storage by 1/3. Where do you get this 10x - 20x then ?
10-20x on transient replicas. Not at full replicas or overall. The new 
capability is adding replicas without having to commit the full amount of 
additional hardware.

If you are running RF=3 today you might be able to switch to RF=5 with two 
transient replicas. You would be able tolerate more failures and you might be 
able to do it without adding additional capacity to your deployment.


was (Author: aweisberg):
bq. As far as I understand, with RF=3, if you remove repaired data on transient 
replicas, you'll reduce storage by 1/3. Where do you get this 10x - 20x then ?
10-20x on transient replicas. Not at full replicas or overall. The new 
capability is adding replicas without having to commit the full amount of 
additional hardware.

> Support a means of strongly consistent highly available replication with 
> tunable storage requirements
> -
>
> Key: CASSANDRA-13442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13442
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Coordination, Distributed Metadata, Local 
> Write-Read Paths
>Reporter: Ariel Weisberg
>
> Replication factors like RF=2 can't provide strong consistency and 
> availability because if a single node is lost it's impossible to reach a 
> quorum of replicas. Stepping up to RF=3 will allow you to lose a node and 
> still achieve quorum for reads and writes, but requires committing additional 
> storage.
> The requirement of a quorum for writes/reads doesn't seem to be something 
> that can be relaxed without additional constraints on queries, but it seems 
> like it should be possible to relax the requirement that 3 full copies of the 
> entire data set are kept. What is actually required is a covering data set 
> for the range and we should be able to achieve a covering data set and high 
> availability without having three full copies. 
> After a repair we know that some subset of the data set is fully replicated. 
> At that point we don't have to read from a quorum of nodes for the repaired 
> data. It is sufficient to read from a single node for the repaired data and a 
> quorum of nodes for the unrepaired data.
> One way to exploit this would be to have N replicas, say the last N replicas 
> (where N varies with RF) in the preference list, delete all repaired data 
> after a repair completes. Subsequent quorum reads will be able to retrieve 
> the repaired data from any of the two full replicas and the unrepaired data 
> from a quorum read of any replica including the "transient" replicas.
> Configuration for something like this in NTS might be something similar to { 
> DC1="3-1", DC2="3-2" } where the first value is the replication factor used 
> for consistency and the second values is the number of transient replicas. If 
> you specify { DC1=3, DC2=3 } then the number of transient replicas defaults 
> to 0 and you get the same behavior you have today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-4763) SSTableLoader shouldn't get keyspace from path

2017-10-09 Thread Eduard Tudenhoefner (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eduard Tudenhoefner updated CASSANDRA-4763:
---
Reviewer: Alex Petrov

> SSTableLoader shouldn't get keyspace from path
> --
>
> Key: CASSANDRA-4763
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4763
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Affects Versions: 1.2.0 beta 1
>Reporter: Nick Bailey
>Assignee: Eduard Tudenhoefner
>Priority: Minor
> Fix For: 4.0
>
>
> SSTableLoader currently gets the keyspace it is going to load to from the 
> path of the directoy of sstables it is loading. This isn't really documented 
> (or I didn't see it), but also isn't really a good way of doing it in general.
> {noformat}
> this.keyspace = directory.getParentFile().getName();
> {noformat}
> We should probably just let users pass the name in. If you are loading a 
> snapshot the file names will have the keyspace which is slightly better but 
> people manually creating their own sstables might not format them the same.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements

2017-10-09 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197593#comment-16197593
 ] 

DOAN DuyHai commented on CASSANDRA-13442:
-

Ok I get it, so you can provision transient replicas with much less disk space 
than normal replicas --> cost saving

That being said, I think the cost saving become a real argument for very huge 
clusters. For average C* users in the range of 10 - 20 nodes, not sure the 
added complexity in reasoning worths the disk space saving.

> Support a means of strongly consistent highly available replication with 
> tunable storage requirements
> -
>
> Key: CASSANDRA-13442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13442
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Coordination, Distributed Metadata, Local 
> Write-Read Paths
>Reporter: Ariel Weisberg
>
> Replication factors like RF=2 can't provide strong consistency and 
> availability because if a single node is lost it's impossible to reach a 
> quorum of replicas. Stepping up to RF=3 will allow you to lose a node and 
> still achieve quorum for reads and writes, but requires committing additional 
> storage.
> The requirement of a quorum for writes/reads doesn't seem to be something 
> that can be relaxed without additional constraints on queries, but it seems 
> like it should be possible to relax the requirement that 3 full copies of the 
> entire data set are kept. What is actually required is a covering data set 
> for the range and we should be able to achieve a covering data set and high 
> availability without having three full copies. 
> After a repair we know that some subset of the data set is fully replicated. 
> At that point we don't have to read from a quorum of nodes for the repaired 
> data. It is sufficient to read from a single node for the repaired data and a 
> quorum of nodes for the unrepaired data.
> One way to exploit this would be to have N replicas, say the last N replicas 
> (where N varies with RF) in the preference list, delete all repaired data 
> after a repair completes. Subsequent quorum reads will be able to retrieve 
> the repaired data from any of the two full replicas and the unrepaired data 
> from a quorum read of any replica including the "transient" replicas.
> Configuration for something like this in NTS might be something similar to { 
> DC1="3-1", DC2="3-2" } where the first value is the replication factor used 
> for consistency and the second values is the number of transient replicas. If 
> you specify { DC1=3, DC2=3 } then the number of transient replicas defaults 
> to 0 and you get the same behavior you have today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from

2017-10-09 Thread Eduard Tudenhoefner (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197610#comment-16197610
 ] 

Eduard Tudenhoefner commented on CASSANDRA-13639:
-

I agree with [~spo...@gmail.com] here because I think having a cmd line 
parameter seems to be better. Something like {{--localOutboundAddressSSL}} or 
{{--sslLocalOutboundAddress}}, which defaults to 
{{FBUtilities.getLocalAddress()}}.

If *outboundBindAny* would be set to *true*, then the SSL Socket would be bound 
to *any* local address, which is most likely not what we want, so not sure why 
we would ever want to set *outboundBindAny* to *true* anyway.

> SSTableLoader always uses hostname to stream files from
> ---
>
> Key: CASSANDRA-13639
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13639
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
> Fix For: 4.x
>
> Attachments: 13639-trunk
>
>
> I stumbled upon an issue where SSTableLoader was ignoring our routing by 
> using the wrong interface to send the SSTables to the other nodes. Looking at 
> the code, it seems that we are using FBUtilities.getLocalAddress() to fetch 
> out the hostname, even if the yaml file specifies a different host. I am not 
> sure why we call this function instead of using the routing by leaving it 
> blank, perhaps someone could enlighten me.
> This behaviour comes from the fact that we use a default created 
> DatabaseDescriptor which does not set the values for listenAddress and 
> listenInterface. This causes the aforementioned function to retrieve the 
> hostname at all times, even if it is not the interface used in the yaml file.
> I propose we break out the function that handles listenAddress and 
> listenInterface and call it so that listenAddress or listenInterface is 
> getting populated in the DatabaseDescriptor.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from

2017-10-09 Thread Eduard Tudenhoefner (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197610#comment-16197610
 ] 

Eduard Tudenhoefner edited comment on CASSANDRA-13639 at 10/9/17 8:01 PM:
--

I agree with [~spo...@gmail.com] here because I think having a cmd line 
parameter seems to be better. Something like {{\-\-localOutboundAddressSSL}} or 
{{--sslLocalOutboundAddress}}, which defaults to 
{{FBUtilities.getLocalAddress()}}.

If *outboundBindAny* would be set to *true*, then the SSL Socket would be bound 
to *any* local address, which is most likely not what we want, so not sure why 
we would ever want to set *outboundBindAny* to *true* anyway.


was (Author: eduard.tudenhoefner):
I agree with [~spo...@gmail.com] here because I think having a cmd line 
parameter seems to be better. Something like {{--localOutboundAddressSSL}} or 
{{--sslLocalOutboundAddress}}, which defaults to 
{{FBUtilities.getLocalAddress()}}.

If *outboundBindAny* would be set to *true*, then the SSL Socket would be bound 
to *any* local address, which is most likely not what we want, so not sure why 
we would ever want to set *outboundBindAny* to *true* anyway.

> SSTableLoader always uses hostname to stream files from
> ---
>
> Key: CASSANDRA-13639
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13639
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
> Fix For: 4.x
>
> Attachments: 13639-trunk
>
>
> I stumbled upon an issue where SSTableLoader was ignoring our routing by 
> using the wrong interface to send the SSTables to the other nodes. Looking at 
> the code, it seems that we are using FBUtilities.getLocalAddress() to fetch 
> out the hostname, even if the yaml file specifies a different host. I am not 
> sure why we call this function instead of using the routing by leaving it 
> blank, perhaps someone could enlighten me.
> This behaviour comes from the fact that we use a default created 
> DatabaseDescriptor which does not set the values for listenAddress and 
> listenInterface. This causes the aforementioned function to retrieve the 
> hostname at all times, even if it is not the interface used in the yaml file.
> I propose we break out the function that handles listenAddress and 
> listenInterface and call it so that listenAddress or listenInterface is 
> getting populated in the DatabaseDescriptor.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements

2017-10-09 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197621#comment-16197621
 ] 

Ariel Weisberg commented on CASSANDRA-13442:


bq.  For average C* users in the range of 10 - 20 nodes, not sure the added 
complexity in reasoning worths the disk space saving.
They don't have to reason about it if they don't enable it. It's pay for what 
you use.

> Support a means of strongly consistent highly available replication with 
> tunable storage requirements
> -
>
> Key: CASSANDRA-13442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13442
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Coordination, Distributed Metadata, Local 
> Write-Read Paths
>Reporter: Ariel Weisberg
>
> Replication factors like RF=2 can't provide strong consistency and 
> availability because if a single node is lost it's impossible to reach a 
> quorum of replicas. Stepping up to RF=3 will allow you to lose a node and 
> still achieve quorum for reads and writes, but requires committing additional 
> storage.
> The requirement of a quorum for writes/reads doesn't seem to be something 
> that can be relaxed without additional constraints on queries, but it seems 
> like it should be possible to relax the requirement that 3 full copies of the 
> entire data set are kept. What is actually required is a covering data set 
> for the range and we should be able to achieve a covering data set and high 
> availability without having three full copies. 
> After a repair we know that some subset of the data set is fully replicated. 
> At that point we don't have to read from a quorum of nodes for the repaired 
> data. It is sufficient to read from a single node for the repaired data and a 
> quorum of nodes for the unrepaired data.
> One way to exploit this would be to have N replicas, say the last N replicas 
> (where N varies with RF) in the preference list, delete all repaired data 
> after a repair completes. Subsequent quorum reads will be able to retrieve 
> the repaired data from any of the two full replicas and the unrepaired data 
> from a quorum read of any replica including the "transient" replicas.
> Configuration for something like this in NTS might be something similar to { 
> DC1="3-1", DC2="3-2" } where the first value is the replication factor used 
> for consistency and the second values is the number of transient replicas. If 
> you specify { DC1=3, DC2=3 } then the number of transient replicas defaults 
> to 0 and you get the same behavior you have today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements

2017-10-09 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197624#comment-16197624
 ] 

DOAN DuyHai commented on CASSANDRA-13442:
-

I did not mean about end-users, I meant about core C* developers. We need to 
introduce some code change in order to accomodate the asymmetry between 
replicas in the code base.

> Support a means of strongly consistent highly available replication with 
> tunable storage requirements
> -
>
> Key: CASSANDRA-13442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13442
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Coordination, Distributed Metadata, Local 
> Write-Read Paths
>Reporter: Ariel Weisberg
>
> Replication factors like RF=2 can't provide strong consistency and 
> availability because if a single node is lost it's impossible to reach a 
> quorum of replicas. Stepping up to RF=3 will allow you to lose a node and 
> still achieve quorum for reads and writes, but requires committing additional 
> storage.
> The requirement of a quorum for writes/reads doesn't seem to be something 
> that can be relaxed without additional constraints on queries, but it seems 
> like it should be possible to relax the requirement that 3 full copies of the 
> entire data set are kept. What is actually required is a covering data set 
> for the range and we should be able to achieve a covering data set and high 
> availability without having three full copies. 
> After a repair we know that some subset of the data set is fully replicated. 
> At that point we don't have to read from a quorum of nodes for the repaired 
> data. It is sufficient to read from a single node for the repaired data and a 
> quorum of nodes for the unrepaired data.
> One way to exploit this would be to have N replicas, say the last N replicas 
> (where N varies with RF) in the preference list, delete all repaired data 
> after a repair completes. Subsequent quorum reads will be able to retrieve 
> the repaired data from any of the two full replicas and the unrepaired data 
> from a quorum read of any replica including the "transient" replicas.
> Configuration for something like this in NTS might be something similar to { 
> DC1="3-1", DC2="3-2" } where the first value is the replication factor used 
> for consistency and the second values is the number of transient replicas. If 
> you specify { DC1=3, DC2=3 } then the number of transient replicas defaults 
> to 0 and you get the same behavior you have today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements

2017-10-09 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197649#comment-16197649
 ] 

Ariel Weisberg commented on CASSANDRA-13442:


bq. I did not mean about end-users, I meant about core C* developers. We need 
to introduce some code change in order to accomodate the asymmetry between 
replicas in the code base.
I agree. I don't think that's something we can quantify until someone submits a 
patch with unit and integration tests so we can weight the cost against the 
measured gains and tradeoffs of a real implementation.

Some of it might end up being part of overlapping functionality. I can hope.

> Support a means of strongly consistent highly available replication with 
> tunable storage requirements
> -
>
> Key: CASSANDRA-13442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13442
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Coordination, Distributed Metadata, Local 
> Write-Read Paths
>Reporter: Ariel Weisberg
>
> Replication factors like RF=2 can't provide strong consistency and 
> availability because if a single node is lost it's impossible to reach a 
> quorum of replicas. Stepping up to RF=3 will allow you to lose a node and 
> still achieve quorum for reads and writes, but requires committing additional 
> storage.
> The requirement of a quorum for writes/reads doesn't seem to be something 
> that can be relaxed without additional constraints on queries, but it seems 
> like it should be possible to relax the requirement that 3 full copies of the 
> entire data set are kept. What is actually required is a covering data set 
> for the range and we should be able to achieve a covering data set and high 
> availability without having three full copies. 
> After a repair we know that some subset of the data set is fully replicated. 
> At that point we don't have to read from a quorum of nodes for the repaired 
> data. It is sufficient to read from a single node for the repaired data and a 
> quorum of nodes for the unrepaired data.
> One way to exploit this would be to have N replicas, say the last N replicas 
> (where N varies with RF) in the preference list, delete all repaired data 
> after a repair completes. Subsequent quorum reads will be able to retrieve 
> the repaired data from any of the two full replicas and the unrepaired data 
> from a quorum read of any replica including the "transient" replicas.
> Configuration for something like this in NTS might be something similar to { 
> DC1="3-1", DC2="3-2" } where the first value is the replication factor used 
> for consistency and the second values is the number of transient replicas. If 
> you specify { DC1=3, DC2=3 } then the number of transient replicas defaults 
> to 0 and you get the same behavior you have today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements

2017-10-09 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197649#comment-16197649
 ] 

Ariel Weisberg edited comment on CASSANDRA-13442 at 10/9/17 8:28 PM:
-

bq. I did not mean about end-users, I meant about core C* developers. We need 
to introduce some code change in order to accomodate the asymmetry between 
replicas in the code base.
I agree. I don't think that's something we can quantify until someone submits a 
patch with unit and integration tests so we can weigh the cost against the 
measured gains and tradeoffs of a real implementation.

Some of it might end up being part of overlapping functionality. I can hope.


was (Author: aweisberg):
bq. I did not mean about end-users, I meant about core C* developers. We need 
to introduce some code change in order to accomodate the asymmetry between 
replicas in the code base.
I agree. I don't think that's something we can quantify until someone submits a 
patch with unit and integration tests so we can weight the cost against the 
measured gains and tradeoffs of a real implementation.

Some of it might end up being part of overlapping functionality. I can hope.

> Support a means of strongly consistent highly available replication with 
> tunable storage requirements
> -
>
> Key: CASSANDRA-13442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13442
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Coordination, Distributed Metadata, Local 
> Write-Read Paths
>Reporter: Ariel Weisberg
>
> Replication factors like RF=2 can't provide strong consistency and 
> availability because if a single node is lost it's impossible to reach a 
> quorum of replicas. Stepping up to RF=3 will allow you to lose a node and 
> still achieve quorum for reads and writes, but requires committing additional 
> storage.
> The requirement of a quorum for writes/reads doesn't seem to be something 
> that can be relaxed without additional constraints on queries, but it seems 
> like it should be possible to relax the requirement that 3 full copies of the 
> entire data set are kept. What is actually required is a covering data set 
> for the range and we should be able to achieve a covering data set and high 
> availability without having three full copies. 
> After a repair we know that some subset of the data set is fully replicated. 
> At that point we don't have to read from a quorum of nodes for the repaired 
> data. It is sufficient to read from a single node for the repaired data and a 
> quorum of nodes for the unrepaired data.
> One way to exploit this would be to have N replicas, say the last N replicas 
> (where N varies with RF) in the preference list, delete all repaired data 
> after a repair completes. Subsequent quorum reads will be able to retrieve 
> the repaired data from any of the two full replicas and the unrepaired data 
> from a quorum read of any replica including the "transient" replicas.
> Configuration for something like this in NTS might be something similar to { 
> DC1="3-1", DC2="3-2" } where the first value is the replication factor used 
> for consistency and the second values is the number of transient replicas. If 
> you specify { DC1=3, DC2=3 } then the number of transient replicas defaults 
> to 0 and you get the same behavior you have today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13943) Infinite compaction of L0 SSTables in JBOD

2017-10-09 Thread Dan Kinder (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197682#comment-16197682
 ] 

Dan Kinder commented on CASSANDRA-13943:


FYI:
{noformat}
data_file_directories:
- /srv/disk1/cassandra-data
- /srv/disk2/cassandra-data
- /srv/disk3/cassandra-data
- /srv/disk4/cassandra-data
- /srv/disk5/cassandra-data
- /srv/disk6/cassandra-data
- /srv/disk7/cassandra-data
- /srv/disk8/cassandra-data
- /srv/disk9/cassandra-data
- /srv/disk10/cassandra-data
- /srv/disk11/cassandra-data
- /srv/disk12/cassandra-data
{noformat}

> Infinite compaction of L0 SSTables in JBOD
> --
>
> Key: CASSANDRA-13943
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13943
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: Cassandra 3.11.0 / Centos 6
>Reporter: Dan Kinder
>Assignee: Marcus Eriksson
> Attachments: debug.log
>
>
> I recently upgraded from 2.2.6 to 3.11.0.
> I am seeing Cassandra loop infinitely compacting the same data over and over. 
> Attaching logs.
> It is compacting two tables, one on /srv/disk10, the other on /srv/disk1. It 
> does create new SSTables but immediately recompacts again. Note that I am not 
> inserting anything at the moment, there is no flushing happening on this 
> table (Memtable switch count has not changed).
> My theory is that it somehow thinks those should be compaction candidates. 
> But they shouldn't be, they are on different disks and I ran nodetool 
> relocatesstables as well as nodetool compact. So, it tries to compact them 
> together, but the compaction results in the exact same 2 SSTables on the 2 
> disks, because the keys are split by data disk.
> This is pretty serious, because all our nodes right now are consuming CPU 
> doing this for multiple tables, it seems.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-13944) Throw descriptive errors for mixed mode repair attempts

2017-10-09 Thread Blake Eggleston (JIRA)
Blake Eggleston created CASSANDRA-13944:
---

 Summary: Throw descriptive errors for mixed mode repair attempts
 Key: CASSANDRA-13944
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13944
 Project: Cassandra
  Issue Type: Bug
  Components: Repair
Reporter: Blake Eggleston
Assignee: Blake Eggleston
Priority: Minor
 Fix For: 4.0


We often make breaking changes to streaming and repair between major versions, 
and don't usually support either in mixed mode clusters. Streaming connections 
check protocol versions, but repair message handling doesn't, which means 
cryptic exceptions show up in the logs when operators forget to turn off 
whatever's scheduling repairs on their cluster. Refusing to send or receive 
repair messages to/ from incompatible messaging service versions, and throwing 
a descriptive exception would make it clearer why repair is not working, as 
well as prevent any potentially unexpected behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13943) Infinite compaction of L0 SSTables in JBOD

2017-10-09 Thread Dan Kinder (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197459#comment-16197459
 ] 

Dan Kinder edited comment on CASSANDRA-13943 at 10/9/17 11:45 PM:
--

I do see a questionable {{startsWith}} in a few places: 

https://github.com/apache/cassandra/blob/ba87ab4e954ad2e537f6690953bd7ebaa069f5cd/src/java/org/apache/cassandra/db/Directories.java#L281

https://github.com/apache/cassandra/blob/7d4d1a32581ff40ed1049833631832054bcf2316/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java#L309

https://github.com/apache/cassandra/blob/3cec208c40b85e1be0ff8c68fc9d9017945a1ed8/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L570


was (Author: dkinder):
I do see a questionable {{startsWith}} here: 
https://github.com/apache/cassandra/blob/7d4d1a32581ff40ed1049833631832054bcf2316/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java#L309

Also here: 
https://github.com/apache/cassandra/blob/3cec208c40b85e1be0ff8c68fc9d9017945a1ed8/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L570

> Infinite compaction of L0 SSTables in JBOD
> --
>
> Key: CASSANDRA-13943
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13943
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: Cassandra 3.11.0 / Centos 6
>Reporter: Dan Kinder
>Assignee: Marcus Eriksson
> Attachments: debug.log
>
>
> I recently upgraded from 2.2.6 to 3.11.0.
> I am seeing Cassandra loop infinitely compacting the same data over and over. 
> Attaching logs.
> It is compacting two tables, one on /srv/disk10, the other on /srv/disk1. It 
> does create new SSTables but immediately recompacts again. Note that I am not 
> inserting anything at the moment, there is no flushing happening on this 
> table (Memtable switch count has not changed).
> My theory is that it somehow thinks those should be compaction candidates. 
> But they shouldn't be, they are on different disks and I ran nodetool 
> relocatesstables as well as nodetool compact. So, it tries to compact them 
> together, but the compaction results in the exact same 2 SSTables on the 2 
> disks, because the keys are split by data disk.
> This is pretty serious, because all our nodes right now are consuming CPU 
> doing this for multiple tables, it seems.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-10-09 Thread Dikang Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dikang Gu updated CASSANDRA-13475:
--
Summary: First version of pluggable storage engine API.  (was: Define 
pluggable storage engine API.)

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-10-09 Thread Dikang Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dikang Gu reassigned CASSANDRA-13475:
-

Assignee: Dikang Gu

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13848) Allow sstabledump to do a json object per partition to better handle large sstables

2017-10-09 Thread Kevin Wern (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wern updated CASSANDRA-13848:
---
Reviewer: Jeff Jirsa
  Status: Patch Available  (was: Open)

>From 834cab8a0a67dbbefa608ddd47109bb9883025a2 Mon Sep 17 00:00:00 2001
From: Kevin Wern 
Date: Mon, 9 Oct 2017 04:26:25 -0400
Subject: [PATCH] sstabledump: add -l option for jsonl

---
 .../apache/cassandra/tools/JsonTransformer.java| 35 +-
 .../org/apache/cassandra/tools/SSTableExport.java  |  8 +
 2 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/src/java/org/apache/cassandra/tools/JsonTransformer.java 
b/src/java/org/apache/cassandra/tools/JsonTransformer.java
index e6aaf07..0c7ed7e 100644
--- a/src/java/org/apache/cassandra/tools/JsonTransformer.java
+++ b/src/java/org/apache/cassandra/tools/JsonTransformer.java
@@ -56,6 +56,7 @@ import org.codehaus.jackson.JsonGenerator;
 import org.codehaus.jackson.impl.Indenter;
 import org.codehaus.jackson.util.DefaultPrettyPrinter.NopIndenter;
 import org.codehaus.jackson.util.DefaultPrettyPrinter;
+import org.codehaus.jackson.util.MinimalPrettyPrinter;
 
 public final class JsonTransformer
 {
@@ -78,17 +79,26 @@ public final class JsonTransformer
 
 private long currentPosition = 0;
 
-private JsonTransformer(JsonGenerator json, ISSTableScanner 
currentScanner, boolean rawTime, TableMetadata metadata)
+private JsonTransformer(JsonGenerator json, ISSTableScanner 
currentScanner, boolean rawTime, TableMetadata metadata, boolean isJsonLines)
 {
 this.json = json;
 this.metadata = metadata;
 this.currentScanner = currentScanner;
 this.rawTime = rawTime;
 
-DefaultPrettyPrinter prettyPrinter = new DefaultPrettyPrinter();
-prettyPrinter.indentObjectsWith(objectIndenter);
-prettyPrinter.indentArraysWith(arrayIndenter);
-json.setPrettyPrinter(prettyPrinter);
+if (isJsonLines)
+{
+MinimalPrettyPrinter minimalPrettyPrinter = new 
MinimalPrettyPrinter();
+minimalPrettyPrinter.setRootValueSeparator("\n");
+json.setPrettyPrinter(minimalPrettyPrinter);
+}
+else
+{
+DefaultPrettyPrinter prettyPrinter = new DefaultPrettyPrinter();
+prettyPrinter.indentObjectsWith(objectIndenter);
+prettyPrinter.indentArraysWith(arrayIndenter);
+json.setPrettyPrinter(prettyPrinter);
+}
 }
 
 public static void toJson(ISSTableScanner currentScanner, 
Stream partitions, boolean rawTime, TableMetadata 
metadata, OutputStream out)
@@ -96,18 +106,28 @@ public final class JsonTransformer
 {
 try (JsonGenerator json = jsonFactory.createJsonGenerator(new 
OutputStreamWriter(out, StandardCharsets.UTF_8)))
 {
-JsonTransformer transformer = new JsonTransformer(json, 
currentScanner, rawTime, metadata);
+JsonTransformer transformer = new JsonTransformer(json, 
currentScanner, rawTime, metadata, false);
 json.writeStartArray();
 partitions.forEach(transformer::serializePartition);
 json.writeEndArray();
 }
 }
 
+public static void toJsonLines(ISSTableScanner currentScanner, 
Stream partitions, boolean rawTime, TableMetadata 
metadata, OutputStream out)
+throws IOException
+{
+try (JsonGenerator json = jsonFactory.createJsonGenerator(new 
OutputStreamWriter(out, StandardCharsets.UTF_8)))
+{
+JsonTransformer transformer = new JsonTransformer(json, 
currentScanner, rawTime, metadata, true);
+partitions.forEach(transformer::serializePartition);
+}
+}
+
 public static void keysToJson(ISSTableScanner currentScanner, 
Stream keys, boolean rawTime, TableMetadata metadata, 
OutputStream out) throws IOException
 {
 try (JsonGenerator json = jsonFactory.createJsonGenerator(new 
OutputStreamWriter(out, StandardCharsets.UTF_8)))
 {
-JsonTransformer transformer = new JsonTransformer(json, 
currentScanner, rawTime, metadata);
+JsonTransformer transformer = new JsonTransformer(json, 
currentScanner, rawTime, metadata, false);
 json.writeStartArray();
 keys.forEach(transformer::serializePartitionKey);
 json.writeEndArray();
@@ -221,6 +241,7 @@ public final class JsonTransformer
 json.writeEndObject();
 }
 }
+
 catch (IOException e)
 {
 String key = 
metadata.partitionKeyType.getString(partition.partitionKey().getKey());
diff --git a/src/java/org/apache/cassandra/tools/SSTableExport.java 
b/src/java/org/apache/cassandra/tools/SSTableExport.java
index 95e3ed6..4079ee7 100644
--- a/src/java/org/apache/cassandra/tools/SSTableExport.java
+++ b/src/java/org/apache/cassandra/tools/SSTableExport.java

[jira] [Issue Comment Deleted] (CASSANDRA-13848) Allow sstabledump to do a json object per partition to better handle large sstables

2017-10-09 Thread Kevin Wern (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wern updated CASSANDRA-13848:
---
Comment: was deleted

(was: From 834cab8a0a67dbbefa608ddd47109bb9883025a2 Mon Sep 17 00:00:00 2001
From: Kevin Wern 
Date: Mon, 9 Oct 2017 04:26:25 -0400
Subject: [PATCH] sstabledump: add -l option for jsonl

---
 .../apache/cassandra/tools/JsonTransformer.java| 35 +-
 .../org/apache/cassandra/tools/SSTableExport.java  |  8 +
 2 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/src/java/org/apache/cassandra/tools/JsonTransformer.java 
b/src/java/org/apache/cassandra/tools/JsonTransformer.java
index e6aaf07..0c7ed7e 100644
--- a/src/java/org/apache/cassandra/tools/JsonTransformer.java
+++ b/src/java/org/apache/cassandra/tools/JsonTransformer.java
@@ -56,6 +56,7 @@ import org.codehaus.jackson.JsonGenerator;
 import org.codehaus.jackson.impl.Indenter;
 import org.codehaus.jackson.util.DefaultPrettyPrinter.NopIndenter;
 import org.codehaus.jackson.util.DefaultPrettyPrinter;
+import org.codehaus.jackson.util.MinimalPrettyPrinter;
 
 public final class JsonTransformer
 {
@@ -78,17 +79,26 @@ public final class JsonTransformer
 
 private long currentPosition = 0;
 
-private JsonTransformer(JsonGenerator json, ISSTableScanner 
currentScanner, boolean rawTime, TableMetadata metadata)
+private JsonTransformer(JsonGenerator json, ISSTableScanner 
currentScanner, boolean rawTime, TableMetadata metadata, boolean isJsonLines)
 {
 this.json = json;
 this.metadata = metadata;
 this.currentScanner = currentScanner;
 this.rawTime = rawTime;
 
-DefaultPrettyPrinter prettyPrinter = new DefaultPrettyPrinter();
-prettyPrinter.indentObjectsWith(objectIndenter);
-prettyPrinter.indentArraysWith(arrayIndenter);
-json.setPrettyPrinter(prettyPrinter);
+if (isJsonLines)
+{
+MinimalPrettyPrinter minimalPrettyPrinter = new 
MinimalPrettyPrinter();
+minimalPrettyPrinter.setRootValueSeparator("\n");
+json.setPrettyPrinter(minimalPrettyPrinter);
+}
+else
+{
+DefaultPrettyPrinter prettyPrinter = new DefaultPrettyPrinter();
+prettyPrinter.indentObjectsWith(objectIndenter);
+prettyPrinter.indentArraysWith(arrayIndenter);
+json.setPrettyPrinter(prettyPrinter);
+}
 }
 
 public static void toJson(ISSTableScanner currentScanner, 
Stream partitions, boolean rawTime, TableMetadata 
metadata, OutputStream out)
@@ -96,18 +106,28 @@ public final class JsonTransformer
 {
 try (JsonGenerator json = jsonFactory.createJsonGenerator(new 
OutputStreamWriter(out, StandardCharsets.UTF_8)))
 {
-JsonTransformer transformer = new JsonTransformer(json, 
currentScanner, rawTime, metadata);
+JsonTransformer transformer = new JsonTransformer(json, 
currentScanner, rawTime, metadata, false);
 json.writeStartArray();
 partitions.forEach(transformer::serializePartition);
 json.writeEndArray();
 }
 }
 
+public static void toJsonLines(ISSTableScanner currentScanner, 
Stream partitions, boolean rawTime, TableMetadata 
metadata, OutputStream out)
+throws IOException
+{
+try (JsonGenerator json = jsonFactory.createJsonGenerator(new 
OutputStreamWriter(out, StandardCharsets.UTF_8)))
+{
+JsonTransformer transformer = new JsonTransformer(json, 
currentScanner, rawTime, metadata, true);
+partitions.forEach(transformer::serializePartition);
+}
+}
+
 public static void keysToJson(ISSTableScanner currentScanner, 
Stream keys, boolean rawTime, TableMetadata metadata, 
OutputStream out) throws IOException
 {
 try (JsonGenerator json = jsonFactory.createJsonGenerator(new 
OutputStreamWriter(out, StandardCharsets.UTF_8)))
 {
-JsonTransformer transformer = new JsonTransformer(json, 
currentScanner, rawTime, metadata);
+JsonTransformer transformer = new JsonTransformer(json, 
currentScanner, rawTime, metadata, false);
 json.writeStartArray();
 keys.forEach(transformer::serializePartitionKey);
 json.writeEndArray();
@@ -221,6 +241,7 @@ public final class JsonTransformer
 json.writeEndObject();
 }
 }
+
 catch (IOException e)
 {
 String key = 
metadata.partitionKeyType.getString(partition.partitionKey().getKey());
diff --git a/src/java/org/apache/cassandra/tools/SSTableExport.java 
b/src/java/org/apache/cassandra/tools/SSTableExport.java
index 95e3ed6..4079ee7 100644
--- a/src/java/org/apache/cassandra/tools/SSTableExport.java
+++ b/src/java/org/apache/cassandra/tools/SSTableExport.java
@@ -62,6 +62,7 @@ public class SSTable

[jira] [Updated] (CASSANDRA-13848) Allow sstabledump to do a json object per partition to better handle large sstables

2017-10-09 Thread Kevin Wern (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wern updated CASSANDRA-13848:
---
Attachment: 0001-sstabledump-add-l-option-for-jsonl.patch

> Allow sstabledump to do a json object per partition to better handle large 
> sstables
> ---
>
> Key: CASSANDRA-13848
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13848
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Jeff Jirsa
>Assignee: Kevin Wern
>Priority: Trivial
>  Labels: lhf
> Attachments: 0001-sstabledump-add-l-option-for-jsonl.patch
>
>
> sstable2json / sstabledump make a huge json document of the whole file. For 
> very large sstables this makes it impossible to load in memory to do anything 
> with it. Allowing users to Break it into small json objects per partition 
> would be useful.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13848) Allow sstabledump to do a json object per partition to better handle large sstables

2017-10-09 Thread Kevin Wern (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197921#comment-16197921
 ] 

Kevin Wern commented on CASSANDRA-13848:


Took longer than I expected to revisit this, but above is my attempt.

> Allow sstabledump to do a json object per partition to better handle large 
> sstables
> ---
>
> Key: CASSANDRA-13848
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13848
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Jeff Jirsa
>Assignee: Kevin Wern
>Priority: Trivial
>  Labels: lhf
> Attachments: 0001-sstabledump-add-l-option-for-jsonl.patch
>
>
> sstable2json / sstabledump make a huge json document of the whole file. For 
> very large sstables this makes it impossible to load in memory to do anything 
> with it. Allowing users to Break it into small json objects per partition 
> would be useful.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13942) Open Cassandra.yaml for developers to extend custom settings

2017-10-09 Thread zhaoyan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197938#comment-16197938
 ] 

zhaoyan commented on CASSANDRA-13942:
-

Hi [~bdeggleston]

Thank you advice.

I can achieve this by creating a new ConfigurationLoader  

But I dont think  it is a friendly way to extend by create a new 
ConfigurationLoader。

Another new ConfigurationLoader  may be designed to load configurations from 
DB, properties, network etc source other than yaml.

I only want to add more settings to cassandra.yaml。


> Open Cassandra.yaml for developers to extend custom settings
> 
>
> Key: CASSANDRA-13942
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13942
> Project: Cassandra
>  Issue Type: Wish
>  Components: Configuration
>Reporter: zhaoyan
>
> we now try to write one index plugin for cassandra.
> we want to put some more settings in cassandra.yaml. and read it in our code.
> we find the cassandra use DatabaseDescriptor.java and Config.java to save the 
> configurations in cassandra.yaml. but we cant extend it
> so I advice cassandra provide some interfaces for deleopers to extend custom 
> settings
> Thank you



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13942) Open Cassandra.yaml for developers to extend custom settings

2017-10-09 Thread zhaoyan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197938#comment-16197938
 ] 

zhaoyan edited comment on CASSANDRA-13942 at 10/10/17 12:29 AM:


Hi [~bdeggleston]

Thank you for your advice.

I can achieve this by creating a new ConfigurationLoader  

But I dont think  it is a friendly way to extend by create a new 
ConfigurationLoader。

Another new ConfigurationLoader  may be designed to load configurations from 
DB, properties, network etc source other than yaml.

I only want to add more settings to cassandra.yaml。



was (Author: zhaoyan):
Hi [~bdeggleston]

Thank you advice.

I can achieve this by creating a new ConfigurationLoader  

But I dont think  it is a friendly way to extend by create a new 
ConfigurationLoader。

Another new ConfigurationLoader  may be designed to load configurations from 
DB, properties, network etc source other than yaml.

I only want to add more settings to cassandra.yaml。


> Open Cassandra.yaml for developers to extend custom settings
> 
>
> Key: CASSANDRA-13942
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13942
> Project: Cassandra
>  Issue Type: Wish
>  Components: Configuration
>Reporter: zhaoyan
>
> we now try to write one index plugin for cassandra.
> we want to put some more settings in cassandra.yaml. and read it in our code.
> we find the cassandra use DatabaseDescriptor.java and Config.java to save the 
> configurations in cassandra.yaml. but we cant extend it
> so I advice cassandra provide some interfaces for deleopers to extend custom 
> settings
> Thank you



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13942) Open Cassandra.yaml for developers to extend custom settings

2017-10-09 Thread zhaoyan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197938#comment-16197938
 ] 

zhaoyan edited comment on CASSANDRA-13942 at 10/10/17 12:38 AM:


Hi [~bdeggleston]

Thank you for your advice.

I can achieve this by creating a new ConfigurationLoader  

But I dont think  it is a friendly way to extend by create a new 
ConfigurationLoader。

Another new ConfigurationLoader  may be designed to load configurations from 
DB, properties, network etc source other than yaml.

I only want to add more settings to cassandra.yaml。 dont want to copy 
YamlConfigurationLoader.java again~:D



was (Author: zhaoyan):
Hi [~bdeggleston]

Thank you for your advice.

I can achieve this by creating a new ConfigurationLoader  

But I dont think  it is a friendly way to extend by create a new 
ConfigurationLoader。

Another new ConfigurationLoader  may be designed to load configurations from 
DB, properties, network etc source other than yaml.

I only want to add more settings to cassandra.yaml。


> Open Cassandra.yaml for developers to extend custom settings
> 
>
> Key: CASSANDRA-13942
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13942
> Project: Cassandra
>  Issue Type: Wish
>  Components: Configuration
>Reporter: zhaoyan
>
> we now try to write one index plugin for cassandra.
> we want to put some more settings in cassandra.yaml. and read it in our code.
> we find the cassandra use DatabaseDescriptor.java and Config.java to save the 
> configurations in cassandra.yaml. but we cant extend it
> so I advice cassandra provide some interfaces for deleopers to extend custom 
> settings
> Thank you



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13848) Allow sstabledump to do a json object per partition to better handle large sstables

2017-10-09 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197956#comment-16197956
 ] 

Jeff Jirsa commented on CASSANDRA-13848:


Thanks [~kwern] - took a quick peek and it looks reasonable, but I'll try to 
review soon

> Allow sstabledump to do a json object per partition to better handle large 
> sstables
> ---
>
> Key: CASSANDRA-13848
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13848
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Jeff Jirsa
>Assignee: Kevin Wern
>Priority: Trivial
>  Labels: lhf
> Attachments: 0001-sstabledump-add-l-option-for-jsonl.patch
>
>
> sstable2json / sstabledump make a huge json document of the whole file. For 
> very large sstables this makes it impossible to load in memory to do anything 
> with it. Allowing users to Break it into small json objects per partition 
> would be useful.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13813) Don't let user drop (or generally break) tables in system_distributed

2017-10-09 Thread Kurt Greaves (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197996#comment-16197996
 ] 

Kurt Greaves commented on CASSANDRA-13813:
--

I think if we can't provide a data model for our tables that works for all 
scenarios then we need to allow operators to make changes. I've had quite a few 
occasions where modifying "system" tables was necessary, and I'm sure more 
tables will be introduced that don't work in all scenarios in the future. 

While there is the workaround of just inserting into the system_schema tables 
that is fraught with peril, and far more likely for them to do something that 
breaks things. I can't see someone saying "woops I accidentally DROPped/ALTERed 
a random column in system_distributed.view_build_status", but I can definitely 
see someone trying to insert into system_schema.tables and making mistakes. As 
soon as we make them replicated we hand over some responsibility to the 
operator to manage them (not that the non-replicated keyspaces have a history 
of being perfect though), and I'd expect to be able to change table properties 
that potentially affect the cluster.

Cassandra already requires you to know what your doing as an operator, this 
really doesn't increase that expectation. There are a million other bad choices 
you could make when managing a cluster that would be far more catastrophic (and 
far more likely). I would like to move away from that, but a lot of that sort 
of thing requires major changes to fix. As in this case it seems we'll need the 
capability limitation framework or other major changes to make a reasonable 
compromise. 

> Don't let user drop (or generally break) tables in system_distributed
> -
>
> Key: CASSANDRA-13813
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13813
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Sylvain Lebresne
>Assignee: Aleksey Yeschenko
> Fix For: 3.0.x, 3.11.x
>
>
> There is not currently no particular restrictions on schema modifications to 
> tables of the {{system_distributed}} keyspace. This does mean you can drop 
> those tables, or even alter them in wrong ways like dropping or renaming 
> columns. All of which is guaranteed to break stuffs (that is, repair if you 
> mess up with on of it's table, or MVs if you mess up with 
> {{view_build_status}}).
> I'm pretty sure this was never intended and is an oversight of the condition 
> on {{ALTERABLE_SYSTEM_KEYSPACES}} in 
> [ClientState|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L397].
>  That condition is such that any keyspace not listed in 
> {{ALTERABLE_SYSTEM_KEYSPACES}} (which happens to be the case for 
> {{system_distributed}}) has no specific restrictions whatsoever, while given 
> the naming it's fair to assume the intention that exactly the opposite.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements

2017-10-09 Thread Kurt Greaves (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198063#comment-16198063
 ] 

Kurt Greaves commented on CASSANDRA-13442:
--

Yeah OK I'm convinced (if it can be proven, obviously), however let's not go 
around making it incredibly misleading.

bq. 10-20x on transient replicas. Not at full replicas or overall.
Saying 10-20x is really misleading. No one is actually going to see a 10 - 20x 
improvement in disk usage. Even a reduction of 1/3 would be optimistic I'm sure.

bq. With vnodes data would be spread out over several nodes so the additional 
utilization at each node could be substantially less.
Let's not pretend people running vnodes can actually run repairs.

bq. Some of it might end up being part of overlapping functionality. I can hope.
Not sure if there is a ticket for it but I've been meaning to create one which 
would probably benefit from this change. Need a way to change RF without 
downtime and without costing a fortune (DC migration). I can see ways in which 
transient replicas would give this functionality, as will need some way to 
change RF on the fly and not cause nodes to be responsible for data they don't 
yet have.

If you could add a replica as transient at any time this would almost solve the 
RF change problem, assuming you had some way to  transition between transient 
and real replicas.

> Support a means of strongly consistent highly available replication with 
> tunable storage requirements
> -
>
> Key: CASSANDRA-13442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13442
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Coordination, Distributed Metadata, Local 
> Write-Read Paths
>Reporter: Ariel Weisberg
>
> Replication factors like RF=2 can't provide strong consistency and 
> availability because if a single node is lost it's impossible to reach a 
> quorum of replicas. Stepping up to RF=3 will allow you to lose a node and 
> still achieve quorum for reads and writes, but requires committing additional 
> storage.
> The requirement of a quorum for writes/reads doesn't seem to be something 
> that can be relaxed without additional constraints on queries, but it seems 
> like it should be possible to relax the requirement that 3 full copies of the 
> entire data set are kept. What is actually required is a covering data set 
> for the range and we should be able to achieve a covering data set and high 
> availability without having three full copies. 
> After a repair we know that some subset of the data set is fully replicated. 
> At that point we don't have to read from a quorum of nodes for the repaired 
> data. It is sufficient to read from a single node for the repaired data and a 
> quorum of nodes for the unrepaired data.
> One way to exploit this would be to have N replicas, say the last N replicas 
> (where N varies with RF) in the preference list, delete all repaired data 
> after a repair completes. Subsequent quorum reads will be able to retrieve 
> the repaired data from any of the two full replicas and the unrepaired data 
> from a quorum read of any replica including the "transient" replicas.
> Configuration for something like this in NTS might be something similar to { 
> DC1="3-1", DC2="3-2" } where the first value is the replication factor used 
> for consistency and the second values is the number of transient replicas. If 
> you specify { DC1=3, DC2=3 } then the number of transient replicas defaults 
> to 0 and you get the same behavior you have today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-10-09 Thread Dikang Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dikang Gu updated CASSANDRA-13475:
--
Status: Patch Available  (was: Open)

Here is the first version of the pluggable storage engine api, based on trunk. 
https://github.com/DikangGu/cassandra/commit/f1c69f688d05504f7409dd735e1473982c59fa52

It contains the API, and a little bit refactoring of the streaming part. 

You can check https://github.com/Instagram/cassandra/tree/rocks_3.0 for the 
RocksDB based implementation.

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements

2017-10-09 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198253#comment-16198253
 ] 

DOAN DuyHai commented on CASSANDRA-13442:
-

So I've wrapped my head around the design of transient replicas. So far I can 
spot 2 concerns

1) It is not working with ONE or LOCAL_ONE. Of course transient replication is 
an opt-in feature but it means users should be super-careful about issuing 
queries at ONE/LOCAL_ONE for the keyspaces having transient replication 
enabled. Considering that ONE/LOCAL_ONE is the *default consistency level* for 
drivers and spark connector, maybe should we throw exception whenever a query 
with those consistency level are issued against transiently replicated 
keyspaces ?

2) *Consistency level* and *repair* have been 2 distinct and orthogonal notions 
so far. With transient replication they are strongly tied. Indeed transient 
replication relies heavily on incremental repair. Of course it is a detail of 
impl, [~aweisberg] has mentioned replicated hints as another impl alternative 
but in this case we're making transient replication dependent of hints impl. 
Same story

 The consequence of point 2) is that any bug in the incremental 
repair/replicated hints will impact terribly the correctness/assumptions of 
transient replication. This point worries me much more than point 1)

> Support a means of strongly consistent highly available replication with 
> tunable storage requirements
> -
>
> Key: CASSANDRA-13442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13442
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Coordination, Distributed Metadata, Local 
> Write-Read Paths
>Reporter: Ariel Weisberg
>
> Replication factors like RF=2 can't provide strong consistency and 
> availability because if a single node is lost it's impossible to reach a 
> quorum of replicas. Stepping up to RF=3 will allow you to lose a node and 
> still achieve quorum for reads and writes, but requires committing additional 
> storage.
> The requirement of a quorum for writes/reads doesn't seem to be something 
> that can be relaxed without additional constraints on queries, but it seems 
> like it should be possible to relax the requirement that 3 full copies of the 
> entire data set are kept. What is actually required is a covering data set 
> for the range and we should be able to achieve a covering data set and high 
> availability without having three full copies. 
> After a repair we know that some subset of the data set is fully replicated. 
> At that point we don't have to read from a quorum of nodes for the repaired 
> data. It is sufficient to read from a single node for the repaired data and a 
> quorum of nodes for the unrepaired data.
> One way to exploit this would be to have N replicas, say the last N replicas 
> (where N varies with RF) in the preference list, delete all repaired data 
> after a repair completes. Subsequent quorum reads will be able to retrieve 
> the repaired data from any of the two full replicas and the unrepaired data 
> from a quorum read of any replica including the "transient" replicas.
> Configuration for something like this in NTS might be something similar to { 
> DC1="3-1", DC2="3-2" } where the first value is the replication factor used 
> for consistency and the second values is the number of transient replicas. If 
> you specify { DC1=3, DC2=3 } then the number of transient replicas defaults 
> to 0 and you get the same behavior you have today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org