[jira] [Updated] (CASSANDRA-9176) drop out of column finding loop on success for altertable statement w/ drop column
[ https://issues.apache.org/jira/browse/CASSANDRA-9176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania updated CASSANDRA-9176: Reviewer: Stefania > drop out of column finding loop on success for altertable statement w/ drop > column > -- > > Key: CASSANDRA-9176 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9176 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Dave Brosius >Assignee: Dave Brosius >Priority: Trivial > Fix For: 3.0 > > Attachments: altertabledrop.txt > > > loop looks for column to drop but doesn't stop when found. add a break. > (trunk) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8377) Coordinated Commitlog Replay
[ https://issues.apache.org/jira/browse/CASSANDRA-8377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500996#comment-14500996 ] Chris Lohfink edited comment on CASSANDRA-8377 at 4/18/15 1:15 AM: --- The recovery happens before the node is up it so cannot use the storage proxy. I created a jmx operation that provides all the different options to restore a commit log. This gives added benefit of not requiring the setting of system properties and restarting to do the restore which I think is preferable. was (Author: cnlwsu): The recovery happens before the node is up it cant use the storage proxy so created a jmx operation that provides all the different options to restore a commit log. This gives added benefit of not requiring restarts to do a point in time restore. > Coordinated Commitlog Replay > > > Key: CASSANDRA-8377 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8377 > Project: Cassandra > Issue Type: New Feature >Reporter: Nick Bailey >Assignee: Chris Lohfink > Fix For: 3.0 > > Attachments: CASSANDRA-8377.txt > > > Commit log archiving and replay can be used to support point in time restores > on a cluster. Unfortunately, at the moment that is only true when the > topology of the cluster is exactly the same as when the commitlogs were > archived. This is because commitlogs need to be replayed on a node that is a > replica for those writes. > To support replaying commitlogs when the topology has changed we should have > a tool that replays the writes in a commitlog as if they were writes from a > client and will get coordinated to the correct replicas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8377) Coordinated Commitlog Replay
[ https://issues.apache.org/jira/browse/CASSANDRA-8377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Lohfink updated CASSANDRA-8377: - Attachment: CASSANDRA-8377.txt The recovery happens before the node is up it cant use the storage proxy so created a jmx operation that provides all the different options to restore a commit log. This gives added benefit of not requiring restarts to do a point in time restore. > Coordinated Commitlog Replay > > > Key: CASSANDRA-8377 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8377 > Project: Cassandra > Issue Type: New Feature >Reporter: Nick Bailey >Assignee: Chris Lohfink > Fix For: 3.0 > > Attachments: CASSANDRA-8377.txt > > > Commit log archiving and replay can be used to support point in time restores > on a cluster. Unfortunately, at the moment that is only true when the > topology of the cluster is exactly the same as when the commitlogs were > archived. This is because commitlogs need to be replayed on a node that is a > replica for those writes. > To support replaying commitlogs when the topology has changed we should have > a tool that replays the writes in a commitlog as if they were writes from a > client and will get coordinated to the correct replicas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[8/8] cassandra git commit: Merge branch 'cassandra-2.1' into trunk
Merge branch 'cassandra-2.1' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/f0d4705e Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/f0d4705e Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/f0d4705e Branch: refs/heads/trunk Commit: f0d4705e6fd90e865cd9f88d5159e1049a772220 Parents: 4040ba8 c6e4379 Author: Brandon Williams Authored: Fri Apr 17 18:43:36 2015 -0500 Committer: Brandon Williams Committed: Fri Apr 17 18:43:36 2015 -0500 -- .../org/apache/cassandra/locator/ReconnectableSnitchHelper.java| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --
[3/8] cassandra git commit: Fix commit
Fix commit Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/738229bd Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/738229bd Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/738229bd Branch: refs/heads/cassandra-2.1 Commit: 738229bd7900b21a83ea322b1e394b8c20c0b82f Parents: 54140bf Author: Brandon Williams Authored: Fri Apr 17 18:42:38 2015 -0500 Committer: Brandon Williams Committed: Fri Apr 17 18:42:38 2015 -0500 -- .../org/apache/cassandra/locator/ReconnectableSnitchHelper.java| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/738229bd/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java -- diff --git a/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java b/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java index 1642561..e5dbdeb 100644 --- a/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java +++ b/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java @@ -81,7 +81,7 @@ public class ReconnectableSnitchHelper implements IEndpointStateChangeSubscriber public void onChange(InetAddress endpoint, ApplicationState state, VersionedValue value) { -if (preferLocal && !Gossiper.instance.isDeadState(epState) && state == ApplicationState.INTERNAL_IP) +if (preferLocal && !Gossiper.instance.isDeadState(Gossiper.instance.getEndpointStateForEndpoint(endpoint)) && state == ApplicationState.INTERNAL_IP) reconnect(endpoint, value); }
[7/8] cassandra git commit: Merge branch 'cassandra-2.0' into cassandra-2.1
Merge branch 'cassandra-2.0' into cassandra-2.1 Conflicts: CHANGES.txt src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/c6e43798 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/c6e43798 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/c6e43798 Branch: refs/heads/cassandra-2.1 Commit: c6e4379831958a48f421d394a70ddc307341b5bf Parents: a925262 738229b Author: Brandon Williams Authored: Fri Apr 17 18:43:29 2015 -0500 Committer: Brandon Williams Committed: Fri Apr 17 18:43:29 2015 -0500 -- .../org/apache/cassandra/locator/ReconnectableSnitchHelper.java| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --
[1/8] cassandra git commit: Don't initiate snitch reconnection for dead states
Repository: cassandra Updated Branches: refs/heads/cassandra-2.0 54140bfde -> 738229bd7 refs/heads/cassandra-2.1 a92526239 -> c6e437983 refs/heads/trunk 4040ba8e7 -> f0d4705e6 Don't initiate snitch reconnection for dead states Patch by brandonwilliams, reviewed by John Alberts for CASSANDRA-7292 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/54140bfd Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/54140bfd Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/54140bfd Branch: refs/heads/cassandra-2.1 Commit: 54140bfde562361562489879f720e78e0ea0eac7 Parents: 724384a Author: Brandon Williams Authored: Fri Apr 17 17:36:55 2015 -0500 Committer: Brandon Williams Committed: Fri Apr 17 17:36:55 2015 -0500 -- CHANGES.txt | 1 + .../apache/cassandra/locator/ReconnectableSnitchHelper.java | 9 +++-- 2 files changed, 4 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/54140bfd/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index a492c74..6c546c4 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.0.15: + * Don't initiate snitch reconnection for dead states (CASSANDRA-7292) * Fix ArrayIndexOutOfBoundsException in CQLSSTableWriter (CASSANDRA-8978) * Add shutdown gossip state to prevent timeouts during rolling restarts (CASSANDRA-8336) * Fix running with java.net.preferIPv6Addresses=true (CASSANDRA-9137) http://git-wip-us.apache.org/repos/asf/cassandra/blob/54140bfd/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java -- diff --git a/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java b/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java index d797393..1642561 100644 --- a/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java +++ b/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java @@ -21,10 +21,7 @@ package org.apache.cassandra.locator; import java.net.InetAddress; import java.net.UnknownHostException; -import org.apache.cassandra.gms.ApplicationState; -import org.apache.cassandra.gms.EndpointState; -import org.apache.cassandra.gms.IEndpointStateChangeSubscriber; -import org.apache.cassandra.gms.VersionedValue; +import org.apache.cassandra.gms.*; import org.apache.cassandra.net.MessagingService; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -78,13 +75,13 @@ public class ReconnectableSnitchHelper implements IEndpointStateChangeSubscriber public void onJoin(InetAddress endpoint, EndpointState epState) { -if (preferLocal && epState.getApplicationState(ApplicationState.INTERNAL_IP) != null) +if (preferLocal && !Gossiper.instance.isDeadState(epState) && epState.getApplicationState(ApplicationState.INTERNAL_IP) != null) reconnect(endpoint, epState.getApplicationState(ApplicationState.INTERNAL_IP)); } public void onChange(InetAddress endpoint, ApplicationState state, VersionedValue value) { -if (preferLocal && state == ApplicationState.INTERNAL_IP) +if (preferLocal && !Gossiper.instance.isDeadState(epState) && state == ApplicationState.INTERNAL_IP) reconnect(endpoint, value); }
[6/8] cassandra git commit: Merge branch 'cassandra-2.0' into cassandra-2.1
Merge branch 'cassandra-2.0' into cassandra-2.1 Conflicts: CHANGES.txt src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/c6e43798 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/c6e43798 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/c6e43798 Branch: refs/heads/trunk Commit: c6e4379831958a48f421d394a70ddc307341b5bf Parents: a925262 738229b Author: Brandon Williams Authored: Fri Apr 17 18:43:29 2015 -0500 Committer: Brandon Williams Committed: Fri Apr 17 18:43:29 2015 -0500 -- .../org/apache/cassandra/locator/ReconnectableSnitchHelper.java| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --
[4/8] cassandra git commit: Fix commit
Fix commit Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/738229bd Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/738229bd Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/738229bd Branch: refs/heads/trunk Commit: 738229bd7900b21a83ea322b1e394b8c20c0b82f Parents: 54140bf Author: Brandon Williams Authored: Fri Apr 17 18:42:38 2015 -0500 Committer: Brandon Williams Committed: Fri Apr 17 18:42:38 2015 -0500 -- .../org/apache/cassandra/locator/ReconnectableSnitchHelper.java| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/738229bd/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java -- diff --git a/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java b/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java index 1642561..e5dbdeb 100644 --- a/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java +++ b/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java @@ -81,7 +81,7 @@ public class ReconnectableSnitchHelper implements IEndpointStateChangeSubscriber public void onChange(InetAddress endpoint, ApplicationState state, VersionedValue value) { -if (preferLocal && !Gossiper.instance.isDeadState(epState) && state == ApplicationState.INTERNAL_IP) +if (preferLocal && !Gossiper.instance.isDeadState(Gossiper.instance.getEndpointStateForEndpoint(endpoint)) && state == ApplicationState.INTERNAL_IP) reconnect(endpoint, value); }
[2/8] cassandra git commit: Don't initiate snitch reconnection for dead states
Don't initiate snitch reconnection for dead states Patch by brandonwilliams, reviewed by John Alberts for CASSANDRA-7292 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/54140bfd Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/54140bfd Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/54140bfd Branch: refs/heads/trunk Commit: 54140bfde562361562489879f720e78e0ea0eac7 Parents: 724384a Author: Brandon Williams Authored: Fri Apr 17 17:36:55 2015 -0500 Committer: Brandon Williams Committed: Fri Apr 17 17:36:55 2015 -0500 -- CHANGES.txt | 1 + .../apache/cassandra/locator/ReconnectableSnitchHelper.java | 9 +++-- 2 files changed, 4 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/54140bfd/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index a492c74..6c546c4 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.0.15: + * Don't initiate snitch reconnection for dead states (CASSANDRA-7292) * Fix ArrayIndexOutOfBoundsException in CQLSSTableWriter (CASSANDRA-8978) * Add shutdown gossip state to prevent timeouts during rolling restarts (CASSANDRA-8336) * Fix running with java.net.preferIPv6Addresses=true (CASSANDRA-9137) http://git-wip-us.apache.org/repos/asf/cassandra/blob/54140bfd/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java -- diff --git a/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java b/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java index d797393..1642561 100644 --- a/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java +++ b/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java @@ -21,10 +21,7 @@ package org.apache.cassandra.locator; import java.net.InetAddress; import java.net.UnknownHostException; -import org.apache.cassandra.gms.ApplicationState; -import org.apache.cassandra.gms.EndpointState; -import org.apache.cassandra.gms.IEndpointStateChangeSubscriber; -import org.apache.cassandra.gms.VersionedValue; +import org.apache.cassandra.gms.*; import org.apache.cassandra.net.MessagingService; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -78,13 +75,13 @@ public class ReconnectableSnitchHelper implements IEndpointStateChangeSubscriber public void onJoin(InetAddress endpoint, EndpointState epState) { -if (preferLocal && epState.getApplicationState(ApplicationState.INTERNAL_IP) != null) +if (preferLocal && !Gossiper.instance.isDeadState(epState) && epState.getApplicationState(ApplicationState.INTERNAL_IP) != null) reconnect(endpoint, epState.getApplicationState(ApplicationState.INTERNAL_IP)); } public void onChange(InetAddress endpoint, ApplicationState state, VersionedValue value) { -if (preferLocal && state == ApplicationState.INTERNAL_IP) +if (preferLocal && !Gossiper.instance.isDeadState(epState) && state == ApplicationState.INTERNAL_IP) reconnect(endpoint, value); }
[5/8] cassandra git commit: Fix commit
Fix commit Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/738229bd Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/738229bd Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/738229bd Branch: refs/heads/cassandra-2.0 Commit: 738229bd7900b21a83ea322b1e394b8c20c0b82f Parents: 54140bf Author: Brandon Williams Authored: Fri Apr 17 18:42:38 2015 -0500 Committer: Brandon Williams Committed: Fri Apr 17 18:42:38 2015 -0500 -- .../org/apache/cassandra/locator/ReconnectableSnitchHelper.java| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/738229bd/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java -- diff --git a/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java b/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java index 1642561..e5dbdeb 100644 --- a/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java +++ b/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java @@ -81,7 +81,7 @@ public class ReconnectableSnitchHelper implements IEndpointStateChangeSubscriber public void onChange(InetAddress endpoint, ApplicationState state, VersionedValue value) { -if (preferLocal && !Gossiper.instance.isDeadState(epState) && state == ApplicationState.INTERNAL_IP) +if (preferLocal && !Gossiper.instance.isDeadState(Gossiper.instance.getEndpointStateForEndpoint(endpoint)) && state == ApplicationState.INTERNAL_IP) reconnect(endpoint, value); }
[jira] [Comment Edited] (CASSANDRA-9181) Improve index versus secondary index selection
[ https://issues.apache.org/jira/browse/CASSANDRA-9181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500842#comment-14500842 ] Jeff Jirsa edited comment on CASSANDRA-9181 at 4/17/15 11:04 PM: - For what it's worth, I've tested this on 2.0.14 and 3.0 (trunk) using ccm (RF=1, N=3), and it appears to be working as intended (if partition key is included, it's hitting only one node; if no partition key is included, it's hitting all nodes in the cluster). Your reproduced-in is 2.0.7 - have you seen this in more recent versions? Attaching logs for review. On 2.0.14: https://gist.github.com/jeffjirsa/5c0f63395269a85cdcb2 On trunk: https://gist.github.com/jeffjirsa/87e2b95113e3366bc00b The data generator for schema/etc is top of https://gist.github.com/jeffjirsa/87e2b95113e3366bc00b was (Author: jjirsa): For what it's worth, I've tested this on 2.0.14 and 3.0 (trunk), and it appears to be working as intended (if partition key is included, it's hitting only one node; if no partition key is included, it's hitting all nodes in the cluster). Your reproduced-in is 2.0.7 - have you seen this in more recent versions? Attaching logs for review. On 2.0.14: https://gist.github.com/jeffjirsa/5c0f63395269a85cdcb2 On trunk: https://gist.github.com/jeffjirsa/87e2b95113e3366bc00b The data generator for schema/etc is top of https://gist.github.com/jeffjirsa/87e2b95113e3366bc00b > Improve index versus secondary index selection > -- > > Key: CASSANDRA-9181 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9181 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jeremy Hanna > Labels: 2i > Fix For: 3.0 > > > There is a special case for secondary indexes if you always supply the > partition key. For example, if you have a family with ID "a456" which has 6 > family members and I have a secondary index on first name. Currently, if I > do a query like this "select * from families where id = 'a456' and firstname > = 'alowishus';" you can see from a query trace, that it will first scan the > entire cluster based on the firstname, then look for the key within that. > If it's not terribly invasive, I think this would be a valid use case to > narrow down the results by key first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9181) Improve index versus secondary index selection
[ https://issues.apache.org/jira/browse/CASSANDRA-9181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500842#comment-14500842 ] Jeff Jirsa commented on CASSANDRA-9181: --- For what it's worth, I've tested this on 2.0.14 and 3.0 (trunk), and it appears to be working as intended (if partition key is included, it's hitting only one node; if no partition key is included, it's hitting all nodes in the cluster). Your reproduced-in is 2.0.7 - have you seen this in more recent versions? Attaching logs for review. On 2.0.14: https://gist.github.com/jeffjirsa/5c0f63395269a85cdcb2 On trunk: https://gist.github.com/jeffjirsa/87e2b95113e3366bc00b The data generator for schema/etc is top of https://gist.github.com/jeffjirsa/87e2b95113e3366bc00b > Improve index versus secondary index selection > -- > > Key: CASSANDRA-9181 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9181 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jeremy Hanna > Labels: 2i > Fix For: 3.0 > > > There is a special case for secondary indexes if you always supply the > partition key. For example, if you have a family with ID "a456" which has 6 > family members and I have a secondary index on first name. Currently, if I > do a query like this "select * from families where id = 'a456' and firstname > = 'alowishus';" you can see from a query trace, that it will first scan the > entire cluster based on the firstname, then look for the key within that. > If it's not terribly invasive, I think this would be a valid use case to > narrow down the results by key first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[3/4] cassandra git commit: Don't initiate snitch reconnection for dead states
Don't initiate snitch reconnection for dead states Patch by brandonwilliams, reviewed by John Alberts for CASSANDRA-7292 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/a9252623 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/a9252623 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/a9252623 Branch: refs/heads/trunk Commit: a925262395f1e735ada0ba35c8e41042be1807fb Parents: b4fae85 Author: Brandon Williams Authored: Fri Apr 17 17:36:55 2015 -0500 Committer: Brandon Williams Committed: Fri Apr 17 17:38:44 2015 -0500 -- CHANGES.txt | 1 + .../apache/cassandra/locator/ReconnectableSnitchHelper.java | 9 +++-- 2 files changed, 4 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/a9252623/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 80ab11c..2777d79 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -81,6 +81,7 @@ * Use stdout for progress and stats in sstableloader (CASSANDRA-8982) * Correctly identify 2i datadir from older versions (CASSANDRA-9116) Merged from 2.0: + * Don't initiate snitch reconnection for dead states (CASSANDRA-7292) * Fix ArrayIndexOutOfBoundsException in CQLSSTableWriter (CASSANDRA-8978) * Add shutdown gossip state to prevent timeouts during rolling restarts (CASSANDRA-8336) * Fix running with java.net.preferIPv6Addresses=true (CASSANDRA-9137) http://git-wip-us.apache.org/repos/asf/cassandra/blob/a9252623/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java -- diff --git a/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java b/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java index d797393..1642561 100644 --- a/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java +++ b/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java @@ -21,10 +21,7 @@ package org.apache.cassandra.locator; import java.net.InetAddress; import java.net.UnknownHostException; -import org.apache.cassandra.gms.ApplicationState; -import org.apache.cassandra.gms.EndpointState; -import org.apache.cassandra.gms.IEndpointStateChangeSubscriber; -import org.apache.cassandra.gms.VersionedValue; +import org.apache.cassandra.gms.*; import org.apache.cassandra.net.MessagingService; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -78,13 +75,13 @@ public class ReconnectableSnitchHelper implements IEndpointStateChangeSubscriber public void onJoin(InetAddress endpoint, EndpointState epState) { -if (preferLocal && epState.getApplicationState(ApplicationState.INTERNAL_IP) != null) +if (preferLocal && !Gossiper.instance.isDeadState(epState) && epState.getApplicationState(ApplicationState.INTERNAL_IP) != null) reconnect(endpoint, epState.getApplicationState(ApplicationState.INTERNAL_IP)); } public void onChange(InetAddress endpoint, ApplicationState state, VersionedValue value) { -if (preferLocal && state == ApplicationState.INTERNAL_IP) +if (preferLocal && !Gossiper.instance.isDeadState(epState) && state == ApplicationState.INTERNAL_IP) reconnect(endpoint, value); }
[1/4] cassandra git commit: Don't initiate snitch reconnection for dead states
Repository: cassandra Updated Branches: refs/heads/cassandra-2.0 724384ab0 -> 54140bfde refs/heads/cassandra-2.1 b4fae8557 -> a92526239 refs/heads/trunk 11dfc0253 -> 4040ba8e7 Don't initiate snitch reconnection for dead states Patch by brandonwilliams, reviewed by John Alberts for CASSANDRA-7292 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/54140bfd Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/54140bfd Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/54140bfd Branch: refs/heads/cassandra-2.0 Commit: 54140bfde562361562489879f720e78e0ea0eac7 Parents: 724384a Author: Brandon Williams Authored: Fri Apr 17 17:36:55 2015 -0500 Committer: Brandon Williams Committed: Fri Apr 17 17:36:55 2015 -0500 -- CHANGES.txt | 1 + .../apache/cassandra/locator/ReconnectableSnitchHelper.java | 9 +++-- 2 files changed, 4 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/54140bfd/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index a492c74..6c546c4 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.0.15: + * Don't initiate snitch reconnection for dead states (CASSANDRA-7292) * Fix ArrayIndexOutOfBoundsException in CQLSSTableWriter (CASSANDRA-8978) * Add shutdown gossip state to prevent timeouts during rolling restarts (CASSANDRA-8336) * Fix running with java.net.preferIPv6Addresses=true (CASSANDRA-9137) http://git-wip-us.apache.org/repos/asf/cassandra/blob/54140bfd/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java -- diff --git a/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java b/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java index d797393..1642561 100644 --- a/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java +++ b/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java @@ -21,10 +21,7 @@ package org.apache.cassandra.locator; import java.net.InetAddress; import java.net.UnknownHostException; -import org.apache.cassandra.gms.ApplicationState; -import org.apache.cassandra.gms.EndpointState; -import org.apache.cassandra.gms.IEndpointStateChangeSubscriber; -import org.apache.cassandra.gms.VersionedValue; +import org.apache.cassandra.gms.*; import org.apache.cassandra.net.MessagingService; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -78,13 +75,13 @@ public class ReconnectableSnitchHelper implements IEndpointStateChangeSubscriber public void onJoin(InetAddress endpoint, EndpointState epState) { -if (preferLocal && epState.getApplicationState(ApplicationState.INTERNAL_IP) != null) +if (preferLocal && !Gossiper.instance.isDeadState(epState) && epState.getApplicationState(ApplicationState.INTERNAL_IP) != null) reconnect(endpoint, epState.getApplicationState(ApplicationState.INTERNAL_IP)); } public void onChange(InetAddress endpoint, ApplicationState state, VersionedValue value) { -if (preferLocal && state == ApplicationState.INTERNAL_IP) +if (preferLocal && !Gossiper.instance.isDeadState(epState) && state == ApplicationState.INTERNAL_IP) reconnect(endpoint, value); }
[4/4] cassandra git commit: Merge branch 'cassandra-2.1' into trunk
Merge branch 'cassandra-2.1' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/4040ba8e Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/4040ba8e Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/4040ba8e Branch: refs/heads/trunk Commit: 4040ba8e7bd290881f16d1f7b4161b744998364b Parents: 11dfc02 a925262 Author: Brandon Williams Authored: Fri Apr 17 17:38:56 2015 -0500 Committer: Brandon Williams Committed: Fri Apr 17 17:38:56 2015 -0500 -- CHANGES.txt | 1 + .../apache/cassandra/locator/ReconnectableSnitchHelper.java | 9 +++-- 2 files changed, 4 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/4040ba8e/CHANGES.txt --
[2/4] cassandra git commit: Don't initiate snitch reconnection for dead states
Don't initiate snitch reconnection for dead states Patch by brandonwilliams, reviewed by John Alberts for CASSANDRA-7292 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/a9252623 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/a9252623 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/a9252623 Branch: refs/heads/cassandra-2.1 Commit: a925262395f1e735ada0ba35c8e41042be1807fb Parents: b4fae85 Author: Brandon Williams Authored: Fri Apr 17 17:36:55 2015 -0500 Committer: Brandon Williams Committed: Fri Apr 17 17:38:44 2015 -0500 -- CHANGES.txt | 1 + .../apache/cassandra/locator/ReconnectableSnitchHelper.java | 9 +++-- 2 files changed, 4 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/a9252623/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 80ab11c..2777d79 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -81,6 +81,7 @@ * Use stdout for progress and stats in sstableloader (CASSANDRA-8982) * Correctly identify 2i datadir from older versions (CASSANDRA-9116) Merged from 2.0: + * Don't initiate snitch reconnection for dead states (CASSANDRA-7292) * Fix ArrayIndexOutOfBoundsException in CQLSSTableWriter (CASSANDRA-8978) * Add shutdown gossip state to prevent timeouts during rolling restarts (CASSANDRA-8336) * Fix running with java.net.preferIPv6Addresses=true (CASSANDRA-9137) http://git-wip-us.apache.org/repos/asf/cassandra/blob/a9252623/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java -- diff --git a/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java b/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java index d797393..1642561 100644 --- a/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java +++ b/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java @@ -21,10 +21,7 @@ package org.apache.cassandra.locator; import java.net.InetAddress; import java.net.UnknownHostException; -import org.apache.cassandra.gms.ApplicationState; -import org.apache.cassandra.gms.EndpointState; -import org.apache.cassandra.gms.IEndpointStateChangeSubscriber; -import org.apache.cassandra.gms.VersionedValue; +import org.apache.cassandra.gms.*; import org.apache.cassandra.net.MessagingService; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -78,13 +75,13 @@ public class ReconnectableSnitchHelper implements IEndpointStateChangeSubscriber public void onJoin(InetAddress endpoint, EndpointState epState) { -if (preferLocal && epState.getApplicationState(ApplicationState.INTERNAL_IP) != null) +if (preferLocal && !Gossiper.instance.isDeadState(epState) && epState.getApplicationState(ApplicationState.INTERNAL_IP) != null) reconnect(endpoint, epState.getApplicationState(ApplicationState.INTERNAL_IP)); } public void onChange(InetAddress endpoint, ApplicationState state, VersionedValue value) { -if (preferLocal && state == ApplicationState.INTERNAL_IP) +if (preferLocal && !Gossiper.instance.isDeadState(epState) && state == ApplicationState.INTERNAL_IP) reconnect(endpoint, value); }
[jira] [Commented] (CASSANDRA-7292) Can't seed new node into ring with (public) ip of an old node
[ https://issues.apache.org/jira/browse/CASSANDRA-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500811#comment-14500811 ] John Alberts commented on CASSANDRA-7292: - I was able to confirm this patch does indeed seem to fix the problem I was having. The patch was built against tag 'cassandra-2.0.11' running on amazon linux 2014.03. [~brandon.williams] Thank you for all of your help with providing a fix for this issue. Can't wait until it's in an official release package. > Can't seed new node into ring with (public) ip of an old node > - > > Key: CASSANDRA-7292 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7292 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Cassandra 2.0.7, Ec2MultiRegionSnitch >Reporter: Juho Mäkinen >Assignee: Brandon Williams > Labels: bootstrap, gossip > Fix For: 2.0.15, 2.1.5 > > Attachments: 7292.txt, cassandra-replace-address.log > > > This bug prevents node to return with bootstrap into the cluster with its old > ip. > Scenario: five node ec2 cluster spread into three AZ, all in one region. I'm > using Ec2MultiRegionSnitch. Nodes are reported with their public ips (as > Ec2MultiRegionSnitch requires) > I simulated a loss of one node by terminating one instance. nodetool status > reported correctly that node was down. Then I launched new instance with the > old public ip (i'm using elastic ips) with > "Dcassandra.replace_address=IP_ADDRESS" but the new node can't join the > cluster: > INFO 07:20:43,424 Gathering node replacement information for /54.86.191.30 > INFO 07:20:43,428 Starting Messaging Service on port 9043 > INFO 07:20:43,489 Handshaking version with /54.86.171.10 > INFO 07:20:43,491 Handshaking version with /54.86.187.245 > (some delay) > ERROR 07:21:14,445 Exception encountered during startup > java.lang.RuntimeException: Unable to gossip with any seeds > at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1193) > at > org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:419) > at > org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:650) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:612) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:505) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:362) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:480) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:569) > It does not help if I remove the "Dcassandra.replace_address=IP_ADDRESS" > system property. > Also it does not help to remove the node with "nodetool removenode" with or > without the cassandra.replace_address property. > I think this is because the node information is preserved in the gossip info > as seen this output of "nodetool gossipinfo" > /54.86.191.30 > INTERNAL_IP:172.16.1.231 > DC:us-east > REMOVAL_COORDINATOR:REMOVER,d581309a-8610-40d4-ba30-cb250eda22a8 > STATUS:removed,19311925-46b5-4fe4-928a-321e8adb731d,1401089960664 > HOST_ID:19311925-46b5-4fe4-928a-321e8adb731d > RPC_ADDRESS:0.0.0.0 > NET_VERSION:7 > SCHEMA:226f9315-b4b2-32c1-bfe1-f4bb49fccfd5 > RACK:1b > LOAD:7.075290515E9 > SEVERITY:0.0 > RELEASE_VERSION:2.0.7 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9107) More accurate row count estimates
[ https://issues.apache.org/jira/browse/CASSANDRA-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500779#comment-14500779 ] Sam Tunnicliffe commented on CASSANDRA-9107: Fair enough, +1 from me then. > More accurate row count estimates > - > > Key: CASSANDRA-9107 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9107 > Project: Cassandra > Issue Type: Improvement >Reporter: Chris Lohfink >Assignee: Chris Lohfink > Attachments: 9107-cassandra2-1.patch > > > Currently the estimated row count from cfstats is the sum of the number of > rows in all the sstables. This becomes very inaccurate with wide rows or > heavily updated datasets since the same partition would exist in many > sstables. In example: > {code} > create KEYSPACE test WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 1}; > create TABLE wide (key text PRIMARY KEY , value text) WITH compaction = > {'class': 'SizeTieredCompactionStrategy', 'min_threshold': 30, > 'max_threshold': 100} ; > --- > insert INTO wide (key, value) VALUES ('key', 'value'); > // flush > // cfstats output: Number of keys (estimate): 1 (128 in older version from > index) > insert INTO wide (key, value) VALUES ('key', 'value'); > // flush > // cfstats output: Number of keys (estimate): 2 (256 in older version from > index) > ... etc > {code} > previously it used the index but it still did it per sstable and summed them > up which became inaccurate as there are more sstables (just by much worse). > With new versions of sstables we can merge the cardinalities to resolve this > with a slight hit to accuracy in the case of every sstable having completely > unique partitions. > Furthermore I think it would be pretty minimal effort to include the number > of rows in the memtables to this count. We wont have the cardinality merging > between memtables and sstables but I would consider that a relatively minor > negative. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9107) More accurate row count estimates
[ https://issues.apache.org/jira/browse/CASSANDRA-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500746#comment-14500746 ] Chris Lohfink commented on CASSANDRA-9107: -- I like having the MT count included, when people run some simple small tests it will show up then. I think it can confuse people if they insert some data and the value doesn't go up. > More accurate row count estimates > - > > Key: CASSANDRA-9107 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9107 > Project: Cassandra > Issue Type: Improvement >Reporter: Chris Lohfink >Assignee: Chris Lohfink > Attachments: 9107-cassandra2-1.patch > > > Currently the estimated row count from cfstats is the sum of the number of > rows in all the sstables. This becomes very inaccurate with wide rows or > heavily updated datasets since the same partition would exist in many > sstables. In example: > {code} > create KEYSPACE test WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 1}; > create TABLE wide (key text PRIMARY KEY , value text) WITH compaction = > {'class': 'SizeTieredCompactionStrategy', 'min_threshold': 30, > 'max_threshold': 100} ; > --- > insert INTO wide (key, value) VALUES ('key', 'value'); > // flush > // cfstats output: Number of keys (estimate): 1 (128 in older version from > index) > insert INTO wide (key, value) VALUES ('key', 'value'); > // flush > // cfstats output: Number of keys (estimate): 2 (256 in older version from > index) > ... etc > {code} > previously it used the index but it still did it per sstable and summed them > up which became inaccurate as there are more sstables (just by much worse). > With new versions of sstables we can merge the cardinalities to resolve this > with a slight hit to accuracy in the case of every sstable having completely > unique partitions. > Furthermore I think it would be pretty minimal effort to include the number > of rows in the memtables to this count. We wont have the cardinality merging > between memtables and sstables but I would consider that a relatively minor > negative. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9196) Do not rebuild indexes if no columns are actually indexed
[ https://issues.apache.org/jira/browse/CASSANDRA-9196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-9196: --- Attachment: 2.1-CASSANDRA-9196.txt The patch looks good for 2.0, but it won't work for 2.1/trunk. The reason being that {{indexes()}} now takes a {{CellName}} rather than a {{ByteBuffer}} containing the column name and we can't construct one in {{maybeRebuildIndex}}. I've attached an alternative patch for 2.1+ that adds that an {{indexes(ColumnDefinition)}} overload, with the default implementation on {{SecondaryIndex}} simply checking if the supplied {{ColumnDefinition}} is present in the index's {{columnDefs}}. > Do not rebuild indexes if no columns are actually indexed > - > > Key: CASSANDRA-9196 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9196 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Sergio Bossa >Assignee: Sergio Bossa > Fix For: 2.0.15 > > Attachments: 2.0-CASSANDRA-9196.txt, 2.1-CASSANDRA-9196.txt > > > When rebuilding secondary indexes, the index task is executed regardless if > the actual {{SecondaryIndex#indexes(ByteBuffer )}} implementation of any > index returns true for any column, meaning that the expensive task of going > through all sstables and related rows will be executed even if in the end no > column/row will be actually indexed. > This is a huge performance hit when i.e. bootstrapping with large datasets on > tables having custom secondary index implementations whose {{indexes()}} > implementation might return false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[Cassandra Wiki] Update of "ClientOptions" by KarlLehenbauer
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification. The "ClientOptions" page has been changed by KarlLehenbauer: https://wiki.apache.org/cassandra/ClientOptions?action=diff&rev1=192&rev2=193 Comment: add link to Tcl client * [[https://github.com/jbochi/lua-resty-cassandra|lua-resty-cassandra]] * Dart * [[https://github.com/achilleasa/dart_cassandra_cql|dart_cassandra_cql]] + * Tcl + * [[https://github.com/flightaware/casstcl|casstcl]] = Thrift = For older Thrift clients, see ClientOptionsThrift.
[jira] [Comment Edited] (CASSANDRA-9206) Remove seed gossip probability
[ https://issues.apache.org/jira/browse/CASSANDRA-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500581#comment-14500581 ] Jason Brown edited comment on CASSANDRA-9206 at 4/17/15 8:15 PM: - It's when things backup/slow down, and the queue gets deep, that I have a small degree of concern about. Remember that the Gossip stage is single-threaded, so it's not hard to see that queue backing up. was (Author: jasobrown): It's when things backup/slow down, and the queue gets deep, that I have a small degree of concern about. > Remove seed gossip probability > -- > > Key: CASSANDRA-9206 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9206 > Project: Cassandra > Issue Type: Improvement >Reporter: Brandon Williams >Assignee: Brandon Williams > Fix For: 2.1.5 > > Attachments: 9206.txt > > > Currently, we use probability to determine whether a node will gossip with a > seed: > {noformat} > double probability = seeds.size() / (double) > (liveEndpoints.size() + unreachableEndpoints.size()); > double randDbl = random.nextDouble(); > if (randDbl <= probability) > sendGossip(prod, seeds); > {noformat} > I propose that we remove this probability, and instead *always* gossip with a > seed. This of course means increased traffic and processing on the seed(s), > but even a 1000 node cluster with a single seed will only put ~1000 messages > per second on the seed, which is virtually nothing. Should it become a > problem, the solution is simple: add more seeds. Since seeds will also > always gossip with each other, this effectively gives us a poor man's > spanning tree, with the only cost being removing a few lines of code, and > should greatly improve our gossip convergence time, especially in large > clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9206) Remove seed gossip probability
[ https://issues.apache.org/jira/browse/CASSANDRA-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500581#comment-14500581 ] Jason Brown commented on CASSANDRA-9206: It's when things backup/slow down, and the queue gets deep, that I have a small degree of concern about. > Remove seed gossip probability > -- > > Key: CASSANDRA-9206 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9206 > Project: Cassandra > Issue Type: Improvement >Reporter: Brandon Williams >Assignee: Brandon Williams > Fix For: 2.1.5 > > Attachments: 9206.txt > > > Currently, we use probability to determine whether a node will gossip with a > seed: > {noformat} > double probability = seeds.size() / (double) > (liveEndpoints.size() + unreachableEndpoints.size()); > double randDbl = random.nextDouble(); > if (randDbl <= probability) > sendGossip(prod, seeds); > {noformat} > I propose that we remove this probability, and instead *always* gossip with a > seed. This of course means increased traffic and processing on the seed(s), > but even a 1000 node cluster with a single seed will only put ~1000 messages > per second on the seed, which is virtually nothing. Should it become a > problem, the solution is simple: add more seeds. Since seeds will also > always gossip with each other, this effectively gives us a poor man's > spanning tree, with the only cost being removing a few lines of code, and > should greatly improve our gossip convergence time, especially in large > clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9206) Remove seed gossip probability
[ https://issues.apache.org/jira/browse/CASSANDRA-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500576#comment-14500576 ] Jonathan Ellis commented on CASSANDRA-9206: --- Is processing 500 gossip messages per second per seed really that big a deal? > Remove seed gossip probability > -- > > Key: CASSANDRA-9206 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9206 > Project: Cassandra > Issue Type: Improvement >Reporter: Brandon Williams >Assignee: Brandon Williams > Fix For: 2.1.5 > > Attachments: 9206.txt > > > Currently, we use probability to determine whether a node will gossip with a > seed: > {noformat} > double probability = seeds.size() / (double) > (liveEndpoints.size() + unreachableEndpoints.size()); > double randDbl = random.nextDouble(); > if (randDbl <= probability) > sendGossip(prod, seeds); > {noformat} > I propose that we remove this probability, and instead *always* gossip with a > seed. This of course means increased traffic and processing on the seed(s), > but even a 1000 node cluster with a single seed will only put ~1000 messages > per second on the seed, which is virtually nothing. Should it become a > problem, the solution is simple: add more seeds. Since seeds will also > always gossip with each other, this effectively gives us a poor man's > spanning tree, with the only cost being removing a few lines of code, and > should greatly improve our gossip convergence time, especially in large > clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9206) Remove seed gossip probability
[ https://issues.apache.org/jira/browse/CASSANDRA-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500571#comment-14500571 ] Jason Brown edited comment on CASSANDRA-9206 at 4/17/15 8:08 PM: - Actually, if all we really care about doing is increasing the fanout from a fixed 1 to 2 nodes per gossip round (to aid in convergence), we could just select two peers at random, rather than always selecting one random peer plus one seed (as per the change in this ticket). This way seeds do not get the extra load, and we still achieve the increased gossip sessions via a larger fanout. Granted, this would increase the number of gossip sessions per round from a current max of 3 to a max of 4, but then gossiping to a seed and an down node are probabalistic anyway. was (Author: jasobrown): Actually, if all we really care about doing is increasing the fanout from a fixed 1 to 2 nodes per gossip round (to aid in convergence), we could just select two peers at random, rather than always selecting one random peer plus one seed (as per the change in this ticket). This way seeds do not get the extra load, and we still achieve the increased gossip sessions via a larger fanout. > Remove seed gossip probability > -- > > Key: CASSANDRA-9206 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9206 > Project: Cassandra > Issue Type: Improvement >Reporter: Brandon Williams >Assignee: Brandon Williams > Fix For: 2.1.5 > > Attachments: 9206.txt > > > Currently, we use probability to determine whether a node will gossip with a > seed: > {noformat} > double probability = seeds.size() / (double) > (liveEndpoints.size() + unreachableEndpoints.size()); > double randDbl = random.nextDouble(); > if (randDbl <= probability) > sendGossip(prod, seeds); > {noformat} > I propose that we remove this probability, and instead *always* gossip with a > seed. This of course means increased traffic and processing on the seed(s), > but even a 1000 node cluster with a single seed will only put ~1000 messages > per second on the seed, which is virtually nothing. Should it become a > problem, the solution is simple: add more seeds. Since seeds will also > always gossip with each other, this effectively gives us a poor man's > spanning tree, with the only cost being removing a few lines of code, and > should greatly improve our gossip convergence time, especially in large > clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9206) Remove seed gossip probability
[ https://issues.apache.org/jira/browse/CASSANDRA-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500571#comment-14500571 ] Jason Brown commented on CASSANDRA-9206: Actually, if all we really care about doing is increasing the fanout from a fixed 1 to 2 nodes per gossip round (to aid in convergence), we could just select two peers at random, rather than always selecting one random peer plus one seed (as per the change in this ticket). This way seeds do not get the extra load, and we still achieve the increased gossip sessions via a larger fanout. > Remove seed gossip probability > -- > > Key: CASSANDRA-9206 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9206 > Project: Cassandra > Issue Type: Improvement >Reporter: Brandon Williams >Assignee: Brandon Williams > Fix For: 2.1.5 > > Attachments: 9206.txt > > > Currently, we use probability to determine whether a node will gossip with a > seed: > {noformat} > double probability = seeds.size() / (double) > (liveEndpoints.size() + unreachableEndpoints.size()); > double randDbl = random.nextDouble(); > if (randDbl <= probability) > sendGossip(prod, seeds); > {noformat} > I propose that we remove this probability, and instead *always* gossip with a > seed. This of course means increased traffic and processing on the seed(s), > but even a 1000 node cluster with a single seed will only put ~1000 messages > per second on the seed, which is virtually nothing. Should it become a > problem, the solution is simple: add more seeds. Since seeds will also > always gossip with each other, this effectively gives us a poor man's > spanning tree, with the only cost being removing a few lines of code, and > should greatly improve our gossip convergence time, especially in large > clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9206) Remove seed gossip probability
[ https://issues.apache.org/jira/browse/CASSANDRA-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500565#comment-14500565 ] Jason Brown commented on CASSANDRA-9206: TBH, I'm kinda +0 on this ticket. While I agree the original motivation behind the probabalistic desire to contact seeds is a bit spurious/funky/undocumented, I'm not compltetly convinced adding more traffic will help much in cluster convergence. For small clusters (less than 20 nodes), there will be near zero impact, so I don't have much problem in that case - but then, they probably don't suffer from the problems we're trying to address here. However, for larger clusters (greater than 500 nodes), think the extra messaging might be an issue. The problem I see is that when things slow down, and you have a very low number of seed nodes (i.e. less than 5), the gossip messages will back up on those nodes and we'll spend lot of cycles just trying to broadcast the same redundant data over and over again. What's worse is that the operator won't really have any great insight to discover that gossip (our membership dissemination protocol) is contributing to things going weird; and, thus, the advice to "add more seeds" isn't obvious nor simple, in some cases. (I'm thinking of Netflix's Priam programmed to use up to two nodes per availability zone as seeds. It would require a non-trivial effort to change that core assumption, fwiw.) Further, in 3.0, we've now split the OTCP by message size, not function. Thus, all the excess gossip messages on the seeds could start interfering with the normal read/write traffic. Also, we will not create a spanning tree by increasing the number of nodes contacted during a gossip round. What that does is increase the fanout (the number of nodes contacted) from a fixed size of 1 to 2. We still have randomly selected peers at every step, and not a static nor dynamic tree that covers all nodes from a given sender. Lastly, there is a minor error in the number of messages to be generated: in a cluster of 1000 nodes, we will start 1000 more gossip sessions to the seeds, and each gossip session is comprised of 3 messages. Thus, the message count is 3000. If you are actually running a cluster that large, and the network can't sustain that extra load, you're probably screwed anyway. While this might help in convergence (primarily for heartbeat dissemination), the trade off is for more (non-directed) traffic. All in all (and thinking while I'm typing), this patch is probably fine for the vast majority of use cases, and if anything, the clarity in the code that will come from it should be worthwhile. > Remove seed gossip probability > -- > > Key: CASSANDRA-9206 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9206 > Project: Cassandra > Issue Type: Improvement >Reporter: Brandon Williams >Assignee: Brandon Williams > Fix For: 2.1.5 > > Attachments: 9206.txt > > > Currently, we use probability to determine whether a node will gossip with a > seed: > {noformat} > double probability = seeds.size() / (double) > (liveEndpoints.size() + unreachableEndpoints.size()); > double randDbl = random.nextDouble(); > if (randDbl <= probability) > sendGossip(prod, seeds); > {noformat} > I propose that we remove this probability, and instead *always* gossip with a > seed. This of course means increased traffic and processing on the seed(s), > but even a 1000 node cluster with a single seed will only put ~1000 messages > per second on the seed, which is virtually nothing. Should it become a > problem, the solution is simple: add more seeds. Since seeds will also > always gossip with each other, this effectively gives us a poor man's > spanning tree, with the only cost being removing a few lines of code, and > should greatly improve our gossip convergence time, especially in large > clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500554#comment-14500554 ] Brandon Williams commented on CASSANDRA-8072: - The reason this only manifests with shadow gossip is because shadow is the only time we send precisely one round, under normal gossip conditions we fire once per second, but probably lose the first message as well. > Exception during startup: Unable to gossip with any seeds > - > > Key: CASSANDRA-8072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8072 > Project: Cassandra > Issue Type: Bug >Reporter: Ryan Springer >Assignee: Brandon Williams > Fix For: 2.0.15, 2.1.5 > > Attachments: cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2, > cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2, > cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2, > casandra-system-log-with-assert-patch.log, trace_logs.tar.bz2 > > > When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster > in either ec2 or locally, an error occurs sometimes with one of the nodes > refusing to start C*. The error in the /var/log/cassandra/system.log is: > ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) > Exception encountered during startup > java.lang.RuntimeException: Unable to gossip with any seeds > at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200) > at > org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444) > at > org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:609) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:502) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585) > INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java > (line 1279) Announcing shutdown > INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 > MessagingService.java (line 701) Waiting for messaging service to quiesce > INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 > MessagingService.java (line 941) MessagingService has terminated the accept() > thread > This errors does not always occur when provisioning a 2-node cluster, but > probably around half of the time on only one of the nodes. I haven't been > able to reproduce this error with DSC 2.0.9, and there have been no code or > definition file changes in Opscenter. > I can reproduce locally with the above steps. I'm happy to test any proposed > fixes since I'm the only person able to reproduce reliably so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7292) Can't seed new node into ring with (public) ip of an old node
[ https://issues.apache.org/jira/browse/CASSANDRA-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500557#comment-14500557 ] John Alberts commented on CASSANDRA-7292: - [~brandon.williams] I was able to get this patch to fix my problem last night but tried again today and couldn't reproduce. I think the db had issues from multiple restarts, version switches, etc. I'm going to start from scratch with a new cluster, re-test, and I'll get back to you. > Can't seed new node into ring with (public) ip of an old node > - > > Key: CASSANDRA-7292 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7292 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Cassandra 2.0.7, Ec2MultiRegionSnitch >Reporter: Juho Mäkinen >Assignee: Brandon Williams > Labels: bootstrap, gossip > Fix For: 2.0.15, 2.1.5 > > Attachments: 7292.txt, cassandra-replace-address.log > > > This bug prevents node to return with bootstrap into the cluster with its old > ip. > Scenario: five node ec2 cluster spread into three AZ, all in one region. I'm > using Ec2MultiRegionSnitch. Nodes are reported with their public ips (as > Ec2MultiRegionSnitch requires) > I simulated a loss of one node by terminating one instance. nodetool status > reported correctly that node was down. Then I launched new instance with the > old public ip (i'm using elastic ips) with > "Dcassandra.replace_address=IP_ADDRESS" but the new node can't join the > cluster: > INFO 07:20:43,424 Gathering node replacement information for /54.86.191.30 > INFO 07:20:43,428 Starting Messaging Service on port 9043 > INFO 07:20:43,489 Handshaking version with /54.86.171.10 > INFO 07:20:43,491 Handshaking version with /54.86.187.245 > (some delay) > ERROR 07:21:14,445 Exception encountered during startup > java.lang.RuntimeException: Unable to gossip with any seeds > at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1193) > at > org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:419) > at > org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:650) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:612) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:505) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:362) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:480) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:569) > It does not help if I remove the "Dcassandra.replace_address=IP_ADDRESS" > system property. > Also it does not help to remove the node with "nodetool removenode" with or > without the cassandra.replace_address property. > I think this is because the node information is preserved in the gossip info > as seen this output of "nodetool gossipinfo" > /54.86.191.30 > INTERNAL_IP:172.16.1.231 > DC:us-east > REMOVAL_COORDINATOR:REMOVER,d581309a-8610-40d4-ba30-cb250eda22a8 > STATUS:removed,19311925-46b5-4fe4-928a-321e8adb731d,1401089960664 > HOST_ID:19311925-46b5-4fe4-928a-321e8adb731d > RPC_ADDRESS:0.0.0.0 > NET_VERSION:7 > SCHEMA:226f9315-b4b2-32c1-bfe1-f4bb49fccfd5 > RACK:1b > LOAD:7.075290515E9 > SEVERITY:0.0 > RELEASE_VERSION:2.0.7 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500545#comment-14500545 ] Brandon Williams commented on CASSANDRA-8072: - After deep packet inspection, I believe I've found the root non-reconnectable snitch part of this issue. When you decom a node, it never correctly tears down its ITC pools, which leaves the other side with a dead OTC pool: {noformat} tcp1 0 10.208.8.123:33441 10.208.8.63:7000CLOSE_WAIT 18401/java {noformat} Now when you try to bootstrap with the same IP, the shadow syn is correctly sent and the ack reply is built and queued, but MS tries to use the now default OTC pool and the message never makes it back to the node, since it just sends RSTs which finally kills the connection. But since the syn is only sent once, the seed has nothing else to send the node and never reestablishes the connection, leaving the bootstrapping node thinking it never talked to a seed and throwing this error. > Exception during startup: Unable to gossip with any seeds > - > > Key: CASSANDRA-8072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8072 > Project: Cassandra > Issue Type: Bug >Reporter: Ryan Springer >Assignee: Brandon Williams > Fix For: 2.0.15, 2.1.5 > > Attachments: cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2, > cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2, > cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2, > casandra-system-log-with-assert-patch.log, trace_logs.tar.bz2 > > > When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster > in either ec2 or locally, an error occurs sometimes with one of the nodes > refusing to start C*. The error in the /var/log/cassandra/system.log is: > ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) > Exception encountered during startup > java.lang.RuntimeException: Unable to gossip with any seeds > at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200) > at > org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444) > at > org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:609) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:502) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585) > INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java > (line 1279) Announcing shutdown > INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 > MessagingService.java (line 701) Waiting for messaging service to quiesce > INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 > MessagingService.java (line 941) MessagingService has terminated the accept() > thread > This errors does not always occur when provisioning a 2-node cluster, but > probably around half of the time on only one of the nodes. I haven't been > able to reproduce this error with DSC 2.0.9, and there have been no code or > definition file changes in Opscenter. > I can reproduce locally with the above steps. I'm happy to test any proposed > fixes since I'm the only person able to reproduce reliably so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6696) Partition sstables by token range
[ https://issues.apache.org/jira/browse/CASSANDRA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500512#comment-14500512 ] Marcus Eriksson commented on CASSANDRA-6696: bq. Specifically, what vnodes are assigned to what disk? What vnode is an sstable responsible for? It should be possible to get that information without running sstablemetadata against every sstable file. yes, we could in theory create sub directories per vnode for example, then we would get the sstables very easily. But, again, we can do this after we commit this, please create a new ticket that depends on this > Partition sstables by token range > - > > Key: CASSANDRA-6696 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6696 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: sankalp kohli >Assignee: Marcus Eriksson > Labels: compaction, correctness, dense-storage, performance > Fix For: 3.0 > > > In JBOD, when someone gets a bad drive, the bad drive is replaced with a new > empty one and repair is run. > This can cause deleted data to come back in some cases. Also this is true for > corrupt stables in which we delete the corrupt stable and run repair. > Here is an example: > Say we have 3 nodes A,B and C and RF=3 and GC grace=10days. > row=sankalp col=sankalp is written 20 days back and successfully went to all > three nodes. > Then a delete/tombstone was written successfully for the same row column 15 > days back. > Since this tombstone is more than gc grace, it got compacted in Nodes A and B > since it got compacted with the actual data. So there is no trace of this row > column in node A and B. > Now in node C, say the original data is in drive1 and tombstone is in drive2. > Compaction has not yet reclaimed the data and tombstone. > Drive2 becomes corrupt and was replaced with new empty drive. > Due to the replacement, the tombstone in now gone and row=sankalp col=sankalp > has come back to life. > Now after replacing the drive we run repair. This data will be propagated to > all nodes. > Note: This is still a problem even if we run repair every gc grace. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6696) Partition sstables by token range
[ https://issues.apache.org/jira/browse/CASSANDRA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500508#comment-14500508 ] Marcus Eriksson commented on CASSANDRA-6696: bq. It might be nice to have this be configurable. This one flush per drive still results in L0 having sstables that overlap with almost all of L1 on a per drive basis. If you flush to X ranges per drive, then you can get some of the benefits of more parallel L0->L1 promotion even if you only have one drive. It might, or you just set up multiple data directories for the same drive. We can improve this later, please create a ticket that depends on this. > Partition sstables by token range > - > > Key: CASSANDRA-6696 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6696 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: sankalp kohli >Assignee: Marcus Eriksson > Labels: compaction, correctness, dense-storage, performance > Fix For: 3.0 > > > In JBOD, when someone gets a bad drive, the bad drive is replaced with a new > empty one and repair is run. > This can cause deleted data to come back in some cases. Also this is true for > corrupt stables in which we delete the corrupt stable and run repair. > Here is an example: > Say we have 3 nodes A,B and C and RF=3 and GC grace=10days. > row=sankalp col=sankalp is written 20 days back and successfully went to all > three nodes. > Then a delete/tombstone was written successfully for the same row column 15 > days back. > Since this tombstone is more than gc grace, it got compacted in Nodes A and B > since it got compacted with the actual data. So there is no trace of this row > column in node A and B. > Now in node C, say the original data is in drive1 and tombstone is in drive2. > Compaction has not yet reclaimed the data and tombstone. > Drive2 becomes corrupt and was replaced with new empty drive. > Due to the replacement, the tombstone in now gone and row=sankalp col=sankalp > has come back to life. > Now after replacing the drive we run repair. This data will be propagated to > all nodes. > Note: This is still a problem even if we run repair every gc grace. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6696) Partition sstables by token range
[ https://issues.apache.org/jira/browse/CASSANDRA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500496#comment-14500496 ] Jeremiah Jordan commented on CASSANDRA-6696: bq. multi threaded flushing - one thread per disk, splits the owned token range evenly over the drives It might be nice to have this be configurable. This one flush per drive still results in L0 having sstables that overlap with almost all of L1 on a per drive basis. If you flush to X ranges per drive, then you can get some of the benefits of more parallel L0->L1 promotion even if you only have one drive. > Partition sstables by token range > - > > Key: CASSANDRA-6696 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6696 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: sankalp kohli >Assignee: Marcus Eriksson > Labels: compaction, correctness, dense-storage, performance > Fix For: 3.0 > > > In JBOD, when someone gets a bad drive, the bad drive is replaced with a new > empty one and repair is run. > This can cause deleted data to come back in some cases. Also this is true for > corrupt stables in which we delete the corrupt stable and run repair. > Here is an example: > Say we have 3 nodes A,B and C and RF=3 and GC grace=10days. > row=sankalp col=sankalp is written 20 days back and successfully went to all > three nodes. > Then a delete/tombstone was written successfully for the same row column 15 > days back. > Since this tombstone is more than gc grace, it got compacted in Nodes A and B > since it got compacted with the actual data. So there is no trace of this row > column in node A and B. > Now in node C, say the original data is in drive1 and tombstone is in drive2. > Compaction has not yet reclaimed the data and tombstone. > Drive2 becomes corrupt and was replaced with new empty drive. > Due to the replacement, the tombstone in now gone and row=sankalp col=sankalp > has come back to life. > Now after replacing the drive we run repair. This data will be propagated to > all nodes. > Note: This is still a problem even if we run repair every gc grace. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9210) CQL Tests Should Operate Over All Types (etc)
Benedict created CASSANDRA-9210: --- Summary: CQL Tests Should Operate Over All Types (etc) Key: CASSANDRA-9210 URL: https://issues.apache.org/jira/browse/CASSANDRA-9210 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict With some refactoring our CQL tests could cover all possible types for a given operation, and potentially different positions in the clustering component of a primary key, as well as differing adjacent items of data in the table being queried. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9131) Defining correct behavior during leap second insertion
[ https://issues.apache.org/jira/browse/CASSANDRA-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500443#comment-14500443 ] Andy Tolbert commented on CASSANDRA-9131: - [~benedict], thanks. I agree if we could have a client timestamp implementation in the drivers that is monotonically increasing while avoiding the possibility of having operations with the same timestamp would be ideal. I agree with [~slebresne] that a reference algorithm would be very helpful so all drivers implement this in a specific way. I'll bring up this topic to the team. I've seen it brought up in a number of issues with differing opinions, is the current perspective that going forward client-provided timestamps will be preferred over server timestamps? Depending on the perspective I can see it being more important for clients to properly implement this. The current behavior in all the drivers to my knowledge is that using client timestamps has to be explicit and only the python and java drivers have a way to enable it for all queries. > Defining correct behavior during leap second insertion > -- > > Key: CASSANDRA-9131 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9131 > Project: Cassandra > Issue Type: Bug > Environment: Linux ip-172-31-0-5 3.2.0-57-virtual #87-Ubuntu SMP Tue > Nov 12 21:53:49 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Jim Witschey >Assignee: Jim Witschey > > On Linux platforms, the insertion of a leap second breaks the monotonicity of > timestamps. This can make values appear to have been inserted into Cassandra > in a different order than they were. I want to know what behavior is expected > and desirable for inserts over this discontinuity. > From a timestamp perspective, an inserted leap second looks like a repeat of > the previous second: > {code} > $ while true ; do echo "`date +%s%N` `date -u`" ; sleep .5 ; done > 1435708798171327029 Tue Jun 30 23:59:58 UTC 2015 > 1435708798679392477 Tue Jun 30 23:59:58 UTC 2015 > 1435708799187550335 Tue Jun 30 23:59:59 UTC 2015 > 1435708799695670453 Tue Jun 30 23:59:59 UTC 2015 > 1435708799203902068 Tue Jun 30 23:59:59 UTC 2015 > 1435708799712168566 Tue Jun 30 23:59:59 UTC 2015 > 1435708800220473932 Wed Jul 1 00:00:00 UTC 2015 > 1435708800728908190 Wed Jul 1 00:00:00 UTC 2015 > 1435708801237611983 Wed Jul 1 00:00:01 UTC 2015 > 1435708801746251996 Wed Jul 1 00:00:01 UTC 2015 > {code} > Note that 23:59:59 repeats itself, and that the timestamps increase during > the first time through, then step back down to the beginning of the second > and increase again. > As a result, the timestamps on values inserted during these seconds will be > out of order. I set up a 4-node cluster running under Ubuntu 12.04.3 and > synced them to shortly before the leap second would be inserted. During the > insertion of the leap second, I ran a test with logic something like: > {code} > simple_insert = session.prepare( > 'INSERT INTO test (foo, bar) VALUES (?, ?);') > for i in itertools.count(): > # stop after midnight > now = datetime.utcnow() > last_midnight = now.replace(hour=0, minute=0, > second=0, microsecond=0) > seconds_since_midnight = (now - last_midnight).total_seconds() > if 5 <= seconds_since_midnight <= 15: > break > session.execute(simple_insert, [i, i]) > result = session.execute("SELECT bar, WRITETIME(bar) FROM test;") > {code} > EDIT: This behavior occurs with server-generated timestamps; in this > particular test, I set {{use_client_timestamp}} to {{False}}. > Under normal circumstances, the values and writetimes would increase > together, but when inserted over the leap second, they don't. These {{value, > writetime}} pairs are sorted by writetime: > {code} > (582, 1435708799285000) > (579, 1435708799339000) > (583, 1435708799593000) > (580, 1435708799643000) > (584, 1435708799897000) > (581, 1435708799958000) > {code} > The values were inserted in increasing order, but their writetimes are in a > different order because of the repeated second. During the first instance of > 23:59:59, the values 579, 580, and 581 were inserted at the beginning, > middle, and end of the second. During the leap second, which is also > 23:59:59, 582, 583, and 584 were inserted, also at the beginning, middle, and > end of the second. However, since the two seconds are the same second, they > appear interleaved with respect to timestamps, as shown above. > So, should I consider this behavior correct? If not, how should Cassandra > correctly handle the discontinuity introduced by the insertion of a leap > second? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6696) Partition sstables by token range
[ https://issues.apache.org/jira/browse/CASSANDRA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500329#comment-14500329 ] Nick Bailey commented on CASSANDRA-6696: I'd also like to mention that we should consider what the best way to expose this new information to operators is. Specifically, what vnodes are assigned to what disk? What vnode is an sstable responsible for? It should be possible to get that information without running sstablemetadata against every sstable file. > Partition sstables by token range > - > > Key: CASSANDRA-6696 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6696 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: sankalp kohli >Assignee: Marcus Eriksson > Labels: compaction, correctness, dense-storage, performance > Fix For: 3.0 > > > In JBOD, when someone gets a bad drive, the bad drive is replaced with a new > empty one and repair is run. > This can cause deleted data to come back in some cases. Also this is true for > corrupt stables in which we delete the corrupt stable and run repair. > Here is an example: > Say we have 3 nodes A,B and C and RF=3 and GC grace=10days. > row=sankalp col=sankalp is written 20 days back and successfully went to all > three nodes. > Then a delete/tombstone was written successfully for the same row column 15 > days back. > Since this tombstone is more than gc grace, it got compacted in Nodes A and B > since it got compacted with the actual data. So there is no trace of this row > column in node A and B. > Now in node C, say the original data is in drive1 and tombstone is in drive2. > Compaction has not yet reclaimed the data and tombstone. > Drive2 becomes corrupt and was replaced with new empty drive. > Due to the replacement, the tombstone in now gone and row=sankalp col=sankalp > has come back to life. > Now after replacing the drive we run repair. This data will be propagated to > all nodes. > Note: This is still a problem even if we run repair every gc grace. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
cassandra git commit: rename
Repository: cassandra Updated Branches: refs/heads/trunk e983956c2 -> 11dfc0253 rename Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/11dfc025 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/11dfc025 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/11dfc025 Branch: refs/heads/trunk Commit: 11dfc025305113f5cfeac4151fb72cee2e6f83f9 Parents: e983956 Author: Jonathan Ellis Authored: Fri Apr 17 12:48:28 2015 -0500 Committer: Jonathan Ellis Committed: Fri Apr 17 12:48:28 2015 -0500 -- .../apache/cassandra/config/CFMetaDataTest.java | 150 - .../config/LegacySchemaTablesTest.java | 150 + .../org/apache/cassandra/schema/DefsTest.java | 568 +++ .../schema/LegacySchemaTablesTest.java | 568 --- 4 files changed, 718 insertions(+), 718 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/11dfc025/test/unit/org/apache/cassandra/config/CFMetaDataTest.java -- diff --git a/test/unit/org/apache/cassandra/config/CFMetaDataTest.java b/test/unit/org/apache/cassandra/config/CFMetaDataTest.java deleted file mode 100644 index 5fed5be..000 --- a/test/unit/org/apache/cassandra/config/CFMetaDataTest.java +++ /dev/null @@ -1,150 +0,0 @@ -/** - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ -package org.apache.cassandra.config; - -import java.util.ArrayList; -import java.util.List; -import java.util.HashMap; -import java.util.HashSet; - -import org.apache.cassandra.SchemaLoader; -import org.apache.cassandra.db.*; -import org.apache.cassandra.db.marshal.AsciiType; -import org.apache.cassandra.db.marshal.UTF8Type; -import org.apache.cassandra.exceptions.ConfigurationException; -import org.apache.cassandra.io.compress.*; -import org.apache.cassandra.locator.SimpleStrategy; -import org.apache.cassandra.schema.LegacySchemaTables; -import org.apache.cassandra.service.StorageService; -import org.apache.cassandra.thrift.CfDef; -import org.apache.cassandra.thrift.ColumnDef; -import org.apache.cassandra.thrift.IndexType; -import org.apache.cassandra.thrift.ThriftConversion; -import org.apache.cassandra.utils.ByteBufferUtil; -import org.apache.cassandra.utils.FBUtilities; - -import org.junit.BeforeClass; -import org.junit.Test; - -import static org.junit.Assert.assertEquals; - -public class CFMetaDataTest -{ -private static final String KEYSPACE1 = "CFMetaDataTest1"; -private static final String CF_STANDARD1 = "Standard1"; - -private static List columnDefs = new ArrayList(); - -static -{ -columnDefs.add(new ColumnDef(ByteBufferUtil.bytes("col1"), AsciiType.class.getCanonicalName()) -.setIndex_name("col1Index") -.setIndex_type(IndexType.KEYS)); - -columnDefs.add(new ColumnDef(ByteBufferUtil.bytes("col2"), UTF8Type.class.getCanonicalName()) -.setIndex_name("col2Index") -.setIndex_type(IndexType.KEYS)); -} - -@BeforeClass -public static void defineSchema() throws ConfigurationException -{ -SchemaLoader.prepareServer(); -SchemaLoader.createKeyspace(KEYSPACE1, -SimpleStrategy.class, -KSMetaData.optsWithRF(1), -SchemaLoader.standardCFMD(KEYSPACE1, CF_STANDARD1)); -} - -@Test -public void testThriftConversion() throws Exception -{ -CfDef cfDef = new CfDef().setDefault_validation_class(AsciiType.class.getCanonicalName()) - .setComment("Test comment") - .setColumn_metadata(columnDefs) - .setKeyspace(KEYSPACE1) - .setName(CF_STANDARD1); - -// convert Thrift to CFMetaData -CFMetaData cfMetaData = ThriftConversion.from
[jira] [Commented] (CASSANDRA-6696) Partition sstables by token range
[ https://issues.apache.org/jira/browse/CASSANDRA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500283#comment-14500283 ] Nick Bailey commented on CASSANDRA-6696: So I just want to mention on here that the current approach here isn't going to help us much with CASSANDRA-4756. If you don't update your compaction strategy, sstables will contain data from many vnodes so things aren't much different than now. If you do use the new compaction strategy, things are slightly better in that levels 1 or higher are split per vnode and you could deduplicate that data, but level 0 won't be so you'll still be forced to overstream anything in level 0. We may want to revisit a new approach to CASSANDRA-4756, specifically one that isn't compaction strategy specific. > Partition sstables by token range > - > > Key: CASSANDRA-6696 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6696 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: sankalp kohli >Assignee: Marcus Eriksson > Labels: compaction, correctness, dense-storage, performance > Fix For: 3.0 > > > In JBOD, when someone gets a bad drive, the bad drive is replaced with a new > empty one and repair is run. > This can cause deleted data to come back in some cases. Also this is true for > corrupt stables in which we delete the corrupt stable and run repair. > Here is an example: > Say we have 3 nodes A,B and C and RF=3 and GC grace=10days. > row=sankalp col=sankalp is written 20 days back and successfully went to all > three nodes. > Then a delete/tombstone was written successfully for the same row column 15 > days back. > Since this tombstone is more than gc grace, it got compacted in Nodes A and B > since it got compacted with the actual data. So there is no trace of this row > column in node A and B. > Now in node C, say the original data is in drive1 and tombstone is in drive2. > Compaction has not yet reclaimed the data and tombstone. > Drive2 becomes corrupt and was replaced with new empty drive. > Due to the replacement, the tombstone in now gone and row=sankalp col=sankalp > has come back to life. > Now after replacing the drive we run repair. This data will be propagated to > all nodes. > Note: This is still a problem even if we run repair every gc grace. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9209) Add static analysis to report any AutoCloseable objects that are not encapsulated in a try/finally block
Benedict created CASSANDRA-9209: --- Summary: Add static analysis to report any AutoCloseable objects that are not encapsulated in a try/finally block Key: CASSANDRA-9209 URL: https://issues.apache.org/jira/browse/CASSANDRA-9209 Project: Cassandra Issue Type: Improvement Reporter: Benedict Fix For: 3.0 Shouldn't be too tricky, and would help us potentially avoid a number of bugs. A follow up would be to enable optional ref counting (or at least leak detection) at run time for AutoCloseable objects, possibly only for those we care about, but also possible via bytecode weaving so that we could capture all of them without question. (/cc [~tjake]) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9206) Remove seed gossip probability
[ https://issues.apache.org/jira/browse/CASSANDRA-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500274#comment-14500274 ] Jonathan Ellis commented on CASSANDRA-9206: --- LGTM > Remove seed gossip probability > -- > > Key: CASSANDRA-9206 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9206 > Project: Cassandra > Issue Type: Improvement >Reporter: Brandon Williams >Assignee: Brandon Williams > Fix For: 2.1.5 > > Attachments: 9206.txt > > > Currently, we use probability to determine whether a node will gossip with a > seed: > {noformat} > double probability = seeds.size() / (double) > (liveEndpoints.size() + unreachableEndpoints.size()); > double randDbl = random.nextDouble(); > if (randDbl <= probability) > sendGossip(prod, seeds); > {noformat} > I propose that we remove this probability, and instead *always* gossip with a > seed. This of course means increased traffic and processing on the seed(s), > but even a 1000 node cluster with a single seed will only put ~1000 messages > per second on the seed, which is virtually nothing. Should it become a > problem, the solution is simple: add more seeds. Since seeds will also > always gossip with each other, this effectively gives us a poor man's > spanning tree, with the only cost being removing a few lines of code, and > should greatly improve our gossip convergence time, especially in large > clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8801) Decommissioned nodes are willing to rejoin the cluster if restarted
[ https://issues.apache.org/jira/browse/CASSANDRA-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-8801: -- Reviewer: Tyler Hobbs [~thobbs] to review > Decommissioned nodes are willing to rejoin the cluster if restarted > --- > > Key: CASSANDRA-8801 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8801 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Eric Stevens >Assignee: Brandon Williams > Fix For: 3.0 > > Attachments: 8801.txt > > > This issue comes from the Cassandra user group. > If a node which was successfully decommissioned gets restarted with its data > directory in tact, it will rejoin the cluster immediately going to {{UN}} and > beginning to serve client requests. > This is wrong - the node has consistency issues, having missed any writes > while it was offline because no hinted handoffs were being kept. And in the > best case scenario (it's spotted and remediated immediately), near-100% > overstreaming will still occur. > Also, whatever reasons the operator had for decommissioning the node would > presumably still be valid, so this action may threaten cluster stability if > the node is underpowered or suffering hardware issues. > But what elevates this to critical is that if the node had been offline > longer than gc_grace_seconds, it may cause permanent and unrecoverable > consistency issues due to data resurrection. > h3. Recommendation: > A node should remember that it was decommissioned and refuse to rejoin a > cluster without at least a -Dflag forcing it to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9131) Defining correct behavior during leap second insertion
[ https://issues.apache.org/jira/browse/CASSANDRA-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500259#comment-14500259 ] Sylvain Lebresne commented on CASSANDRA-9131: - bq. if we should offer some sample code Yes, I suspect it would be appreciated if we were to provide some kind of example/reference algorithm for this (maybe just in form of some pseudo-code), probably in the protocol spec documentation. > Defining correct behavior during leap second insertion > -- > > Key: CASSANDRA-9131 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9131 > Project: Cassandra > Issue Type: Bug > Environment: Linux ip-172-31-0-5 3.2.0-57-virtual #87-Ubuntu SMP Tue > Nov 12 21:53:49 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Jim Witschey >Assignee: Jim Witschey > > On Linux platforms, the insertion of a leap second breaks the monotonicity of > timestamps. This can make values appear to have been inserted into Cassandra > in a different order than they were. I want to know what behavior is expected > and desirable for inserts over this discontinuity. > From a timestamp perspective, an inserted leap second looks like a repeat of > the previous second: > {code} > $ while true ; do echo "`date +%s%N` `date -u`" ; sleep .5 ; done > 1435708798171327029 Tue Jun 30 23:59:58 UTC 2015 > 1435708798679392477 Tue Jun 30 23:59:58 UTC 2015 > 1435708799187550335 Tue Jun 30 23:59:59 UTC 2015 > 1435708799695670453 Tue Jun 30 23:59:59 UTC 2015 > 1435708799203902068 Tue Jun 30 23:59:59 UTC 2015 > 1435708799712168566 Tue Jun 30 23:59:59 UTC 2015 > 1435708800220473932 Wed Jul 1 00:00:00 UTC 2015 > 1435708800728908190 Wed Jul 1 00:00:00 UTC 2015 > 1435708801237611983 Wed Jul 1 00:00:01 UTC 2015 > 1435708801746251996 Wed Jul 1 00:00:01 UTC 2015 > {code} > Note that 23:59:59 repeats itself, and that the timestamps increase during > the first time through, then step back down to the beginning of the second > and increase again. > As a result, the timestamps on values inserted during these seconds will be > out of order. I set up a 4-node cluster running under Ubuntu 12.04.3 and > synced them to shortly before the leap second would be inserted. During the > insertion of the leap second, I ran a test with logic something like: > {code} > simple_insert = session.prepare( > 'INSERT INTO test (foo, bar) VALUES (?, ?);') > for i in itertools.count(): > # stop after midnight > now = datetime.utcnow() > last_midnight = now.replace(hour=0, minute=0, > second=0, microsecond=0) > seconds_since_midnight = (now - last_midnight).total_seconds() > if 5 <= seconds_since_midnight <= 15: > break > session.execute(simple_insert, [i, i]) > result = session.execute("SELECT bar, WRITETIME(bar) FROM test;") > {code} > EDIT: This behavior occurs with server-generated timestamps; in this > particular test, I set {{use_client_timestamp}} to {{False}}. > Under normal circumstances, the values and writetimes would increase > together, but when inserted over the leap second, they don't. These {{value, > writetime}} pairs are sorted by writetime: > {code} > (582, 1435708799285000) > (579, 1435708799339000) > (583, 1435708799593000) > (580, 1435708799643000) > (584, 1435708799897000) > (581, 1435708799958000) > {code} > The values were inserted in increasing order, but their writetimes are in a > different order because of the repeated second. During the first instance of > 23:59:59, the values 579, 580, and 581 were inserted at the beginning, > middle, and end of the second. During the leap second, which is also > 23:59:59, 582, 583, and 584 were inserted, also at the beginning, middle, and > end of the second. However, since the two seconds are the same second, they > appear interleaved with respect to timestamps, as shown above. > So, should I consider this behavior correct? If not, how should Cassandra > correctly handle the discontinuity introduced by the insertion of a leap > second? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9206) Remove seed gossip probability
[ https://issues.apache.org/jira/browse/CASSANDRA-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500227#comment-14500227 ] Brandon Williams commented on CASSANDRA-9206: - I did some digging as to why the probability code exists, and wasn't able to find much. It came over the wall with facebook, and there's no mention of it in the scuttlebutt paper (nor seeds at all, nor probabilistically gossiping with unreachable members) so I'm not sure what the original reasoning was for it. > Remove seed gossip probability > -- > > Key: CASSANDRA-9206 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9206 > Project: Cassandra > Issue Type: Improvement >Reporter: Brandon Williams >Assignee: Brandon Williams > Fix For: 2.1.5 > > Attachments: 9206.txt > > > Currently, we use probability to determine whether a node will gossip with a > seed: > {noformat} > double probability = seeds.size() / (double) > (liveEndpoints.size() + unreachableEndpoints.size()); > double randDbl = random.nextDouble(); > if (randDbl <= probability) > sendGossip(prod, seeds); > {noformat} > I propose that we remove this probability, and instead *always* gossip with a > seed. This of course means increased traffic and processing on the seed(s), > but even a 1000 node cluster with a single seed will only put ~1000 messages > per second on the seed, which is virtually nothing. Should it become a > problem, the solution is simple: add more seeds. Since seeds will also > always gossip with each other, this effectively gives us a poor man's > spanning tree, with the only cost being removing a few lines of code, and > should greatly improve our gossip convergence time, especially in large > clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8437) Track digest mismatch ratio
[ https://issues.apache.org/jira/browse/CASSANDRA-8437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-8437: -- Attachment: CASSANDRA-8437-V3.txt Changes the patch as suggested and modified NodeTool for backward compatibility. > Track digest mismatch ratio > --- > > Key: CASSANDRA-8437 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8437 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Benjamin Lerer >Priority: Minor > Fix For: 2.1.5 > > Attachments: CASSANDRA-8437-V2.txt, CASSANDRA-8437-V3.txt, > CASSANDRA-8437.txt > > > I don't believe we track how often read results in a digest mismatch but we > should since that could directly impact read performance in practice. > Once we have that data, it might be that some workloads (write heavy most > likely) ends up with enough mismatches that going to the data read is more > efficient in practice. What we do about it it step 2 however, but getting the > data is easy enough. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8043) Native Protocol V4
[ https://issues.apache.org/jira/browse/CASSANDRA-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500222#comment-14500222 ] Michael Penick edited comment on CASSANDRA-8043 at 4/17/15 5:21 PM: [~slebresne] There's no type on the field for the new error bodies, "Read_failure" or "Write_failure". It doesn't seem to be there for the "Read_timeout" or "Write_timeout" either. I like the " is a\(n\) \[type\]" format of the other fields and is an outlier in that regard. was (Author: mpenick): [~slebresne] There's no type on the field for the new error bodies, "Read_failure" or "Write_failure". It doesn't seem to be there for the "Read_timeout" or "Write_timeout" either. I like the " is a(n) [type]" format of the other fields and is an outlier in that regard. > Native Protocol V4 > -- > > Key: CASSANDRA-8043 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8043 > Project: Cassandra > Issue Type: Task >Reporter: Sylvain Lebresne > Labels: client-impacting, protocolv4 > Fix For: 3.0 > > > We have a bunch of issues that will require a protocol v4, this ticket is > just a meta ticket to group them all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8043) Native Protocol V4
[ https://issues.apache.org/jira/browse/CASSANDRA-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500222#comment-14500222 ] Michael Penick commented on CASSANDRA-8043: --- [~slebresne] There's no type on the field for the new error bodies, "Read_failure" or "Write_failure". It doesn't seem to be there for the "Read_timeout" or "Write_timeout" either. I like the " is a(n) [type]" format of the other fields and is an outlier in that regard. > Native Protocol V4 > -- > > Key: CASSANDRA-8043 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8043 > Project: Cassandra > Issue Type: Task >Reporter: Sylvain Lebresne > Labels: client-impacting, protocolv4 > Fix For: 3.0 > > > We have a bunch of issues that will require a protocol v4, this ticket is > just a meta ticket to group them all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Git Push Summary
Repository: cassandra Updated Tags: refs/tags/cassandra-2.1.5-tentative [created] b4fae8557
Git Push Summary
Repository: cassandra Updated Tags: refs/tags/cassandra-2.1.5-tentative [deleted] 3c17ac6e1
[jira] [Commented] (CASSANDRA-8609) Remove depency of hadoop to internals (Cell/CellName)
[ https://issues.apache.org/jira/browse/CASSANDRA-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500199#comment-14500199 ] Sylvain Lebresne commented on CASSANDRA-8609: - [~philipthompson] So is this just a duplicate of CASSANDRA-8358 or will we need something more for this? > Remove depency of hadoop to internals (Cell/CellName) > - > > Key: CASSANDRA-8609 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8609 > Project: Cassandra > Issue Type: Bug >Reporter: Sylvain Lebresne >Assignee: Philip Thompson > Fix For: 3.0 > > Attachments: CASSANDRA-8609-3.0-branch.txt > > > For some reason most of the Hadoop code (ColumnFamilyRecordReader, > CqlStorage, ...) uses the {{Cell}} and {{CellName}} classes. That dependency > is entirely artificial: all this code is really client code that communicate > with Cassandra over thrift/native protocol and there is thus no reason for it > to use internal classes. And in fact, thoses classes are used in a very crude > way, as a {{Pair}} really. > But this dependency is really painful when we make changes to the internals. > Further, every time we do so, I believe we break some of those the APIs due > to the change. This has been painful for CASSANDRA-5417 and this is now > painful for CASSANDRA-8099. But while I somewhat hack over it in > CASSANDRA-5417, this was a mistake and we should have removed the depency > back then. So let do that now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-9208) Setting rpc_interface in cassandra.yaml causes NPE during startup
[ https://issues.apache.org/jira/browse/CASSANDRA-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams resolved CASSANDRA-9208. - Resolution: Duplicate > Setting rpc_interface in cassandra.yaml causes NPE during startup > - > > Key: CASSANDRA-9208 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9208 > Project: Cassandra > Issue Type: Bug > Components: Config > Environment: Windows and RHEL >Reporter: Sandeep More > Attachments: SuggestedDataBaseDescriptor.diff > > > In the cassandra.yaml file when "rpc_interface" option is set it causes a NPE > (stack trace at the end). > Upon further investigation it turns out that there is a serious problem is in > the way this logic is handled in the code DatabaseDescriptor.java (#374). > Following is the code snippet > else if (conf.rpc_interface != null) > { > listenAddress = getNetworkInterfaceAddress(conf.rpc_interface, > "rpc_interface"); > } > else > { > rpcAddress = FBUtilities.getLocalAddress(); > } > If you notice, > 1) The code above sets the "listenAddress" instead of "rpcAddress". > 2) The function getNetworkInterfaceAddress() blindly assumes that this is > called to set the "listenAddress" (see line 171). The "configName" variable > passed to the function is royally ignored and only used for printing out > exception (which again is misleading) > I am also attaching a suggested patch (NOTE: the patch tries to address this > issue, the function getNetworkInterfaceAddress() needs revision ). > INFO 15:36:56 Windows environment detected. DiskAccessMode set to standard, > indexAccessMode standard > INFO 15:36:56 Global memtable on-heap threshold is enabled at 503MB > INFO 15:36:56 Global memtable off-heap threshold is enabled at 503MB > ERROR 15:37:50 Fatal error during configuration loading > java.lang.NullPointerException: null > at > org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:411) > ~[apache-cassandra-2.1.4.jar:2.1.4] > at > org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:133) > ~[apache-cassandra-2.1.4.jar:2.1.4] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:164) > [apache-cassandra-2.1.4.jar:2.1.4] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:533) > [apache-cassandra-2.1.4.jar:2.1.4] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:622) > [apache-cassandra-2.1.4.jar:2.1.4] > null > Fatal error during configuration loading; unable to start. See log for > stacktrace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9208) Setting rpc_interface in cassandra.yaml causes NPE during startup
[ https://issues.apache.org/jira/browse/CASSANDRA-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-9208: Labels: (was: easyfix patch) > Setting rpc_interface in cassandra.yaml causes NPE during startup > - > > Key: CASSANDRA-9208 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9208 > Project: Cassandra > Issue Type: Bug > Components: Config > Environment: Windows and RHEL >Reporter: Sandeep More > Attachments: SuggestedDataBaseDescriptor.diff > > > In the cassandra.yaml file when "rpc_interface" option is set it causes a NPE > (stack trace at the end). > Upon further investigation it turns out that there is a serious problem is in > the way this logic is handled in the code DatabaseDescriptor.java (#374). > Following is the code snippet > else if (conf.rpc_interface != null) > { > listenAddress = getNetworkInterfaceAddress(conf.rpc_interface, > "rpc_interface"); > } > else > { > rpcAddress = FBUtilities.getLocalAddress(); > } > If you notice, > 1) The code above sets the "listenAddress" instead of "rpcAddress". > 2) The function getNetworkInterfaceAddress() blindly assumes that this is > called to set the "listenAddress" (see line 171). The "configName" variable > passed to the function is royally ignored and only used for printing out > exception (which again is misleading) > I am also attaching a suggested patch (NOTE: the patch tries to address this > issue, the function getNetworkInterfaceAddress() needs revision ). > INFO 15:36:56 Windows environment detected. DiskAccessMode set to standard, > indexAccessMode standard > INFO 15:36:56 Global memtable on-heap threshold is enabled at 503MB > INFO 15:36:56 Global memtable off-heap threshold is enabled at 503MB > ERROR 15:37:50 Fatal error during configuration loading > java.lang.NullPointerException: null > at > org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:411) > ~[apache-cassandra-2.1.4.jar:2.1.4] > at > org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:133) > ~[apache-cassandra-2.1.4.jar:2.1.4] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:164) > [apache-cassandra-2.1.4.jar:2.1.4] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:533) > [apache-cassandra-2.1.4.jar:2.1.4] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:622) > [apache-cassandra-2.1.4.jar:2.1.4] > null > Fatal error during configuration loading; unable to start. See log for > stacktrace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9131) Defining correct behavior during leap second insertion
[ https://issues.apache.org/jira/browse/CASSANDRA-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500186#comment-14500186 ] Benedict commented on CASSANDRA-9131: - [~andrew.tolbert]: if you feel like taking a look at CASSANDRA-6106, this could be a useful approach for the java-driver (and be adapted to other drivers). Whether or not it uses the microsecond time is kind of irrelevant to the point at hand (although potentially also helpful in itself), but the approach to staggering the time corrections is very applicable. This would prevent the "only have fewer than 1000 inserts in a leap second" problem, because the 1s shift backwards in time would be spread over the proceeding minute, with each second taking around 20ms longer to elapse than they otherwise would. Either my or Sylvain's method of updating the clock time would suffice, and be a tremendous improvement to behaviour here in the drivers. > Defining correct behavior during leap second insertion > -- > > Key: CASSANDRA-9131 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9131 > Project: Cassandra > Issue Type: Bug > Environment: Linux ip-172-31-0-5 3.2.0-57-virtual #87-Ubuntu SMP Tue > Nov 12 21:53:49 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Jim Witschey >Assignee: Jim Witschey > > On Linux platforms, the insertion of a leap second breaks the monotonicity of > timestamps. This can make values appear to have been inserted into Cassandra > in a different order than they were. I want to know what behavior is expected > and desirable for inserts over this discontinuity. > From a timestamp perspective, an inserted leap second looks like a repeat of > the previous second: > {code} > $ while true ; do echo "`date +%s%N` `date -u`" ; sleep .5 ; done > 1435708798171327029 Tue Jun 30 23:59:58 UTC 2015 > 1435708798679392477 Tue Jun 30 23:59:58 UTC 2015 > 1435708799187550335 Tue Jun 30 23:59:59 UTC 2015 > 1435708799695670453 Tue Jun 30 23:59:59 UTC 2015 > 1435708799203902068 Tue Jun 30 23:59:59 UTC 2015 > 1435708799712168566 Tue Jun 30 23:59:59 UTC 2015 > 1435708800220473932 Wed Jul 1 00:00:00 UTC 2015 > 1435708800728908190 Wed Jul 1 00:00:00 UTC 2015 > 1435708801237611983 Wed Jul 1 00:00:01 UTC 2015 > 1435708801746251996 Wed Jul 1 00:00:01 UTC 2015 > {code} > Note that 23:59:59 repeats itself, and that the timestamps increase during > the first time through, then step back down to the beginning of the second > and increase again. > As a result, the timestamps on values inserted during these seconds will be > out of order. I set up a 4-node cluster running under Ubuntu 12.04.3 and > synced them to shortly before the leap second would be inserted. During the > insertion of the leap second, I ran a test with logic something like: > {code} > simple_insert = session.prepare( > 'INSERT INTO test (foo, bar) VALUES (?, ?);') > for i in itertools.count(): > # stop after midnight > now = datetime.utcnow() > last_midnight = now.replace(hour=0, minute=0, > second=0, microsecond=0) > seconds_since_midnight = (now - last_midnight).total_seconds() > if 5 <= seconds_since_midnight <= 15: > break > session.execute(simple_insert, [i, i]) > result = session.execute("SELECT bar, WRITETIME(bar) FROM test;") > {code} > EDIT: This behavior occurs with server-generated timestamps; in this > particular test, I set {{use_client_timestamp}} to {{False}}. > Under normal circumstances, the values and writetimes would increase > together, but when inserted over the leap second, they don't. These {{value, > writetime}} pairs are sorted by writetime: > {code} > (582, 1435708799285000) > (579, 1435708799339000) > (583, 1435708799593000) > (580, 1435708799643000) > (584, 1435708799897000) > (581, 1435708799958000) > {code} > The values were inserted in increasing order, but their writetimes are in a > different order because of the repeated second. During the first instance of > 23:59:59, the values 579, 580, and 581 were inserted at the beginning, > middle, and end of the second. During the leap second, which is also > 23:59:59, 582, 583, and 584 were inserted, also at the beginning, middle, and > end of the second. However, since the two seconds are the same second, they > appear interleaved with respect to timestamps, as shown above. > So, should I consider this behavior correct? If not, how should Cassandra > correctly handle the discontinuity introduced by the insertion of a leap > second? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
cassandra git commit: Fix sigar message about swap
Repository: cassandra Updated Branches: refs/heads/trunk ae3edb2ab -> e983956c2 Fix sigar message about swap Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/e983956c Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/e983956c Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/e983956c Branch: refs/heads/trunk Commit: e983956c2883c8b7c85f7f6700c19ffd9a4a7e54 Parents: ae3edb2 Author: Brandon Williams Authored: Fri Apr 17 11:55:38 2015 -0500 Committer: Brandon Williams Committed: Fri Apr 17 11:55:46 2015 -0500 -- src/java/org/apache/cassandra/utils/SigarLibrary.java | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/e983956c/src/java/org/apache/cassandra/utils/SigarLibrary.java -- diff --git a/src/java/org/apache/cassandra/utils/SigarLibrary.java b/src/java/org/apache/cassandra/utils/SigarLibrary.java index be85977..7cf4d71 100644 --- a/src/java/org/apache/cassandra/utils/SigarLibrary.java +++ b/src/java/org/apache/cassandra/utils/SigarLibrary.java @@ -140,11 +140,11 @@ public class SigarLibrary long swapSize = swap.getTotal(); if (swapSize > 0) { -return false; +return true; } else { -return true; +return false; } } catch (SigarException sigarException)
[jira] [Resolved] (CASSANDRA-8766) SSTableRewriter opens all sstables as early before completing the compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-8766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict resolved CASSANDRA-8766. - Resolution: Duplicate > SSTableRewriter opens all sstables as early before completing the compaction > > > Key: CASSANDRA-8766 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8766 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Benedict >Assignee: Joshua McKenzie >Priority: Minor > Fix For: 2.1.5 > > > In CASSANDRA-8320, we made the rewriter call switchWriter() inside of > finish(); in CASSANDRA-8124 was made switchWriter() open its data as EARLY. > This combination means we no longer honour disabling of early opening, which > is potentially a problem on windows for the deletion of the contents (which > is why we disable early opening on Windows). > I've commented on CASSANDRA-8124, as I suspect I'm missing something about > this. Although I have no doubt the old behaviour of opening as TMP file > reduced the window for problems, and opening as TMPLINK now does the same, > it's not entirely clear to me its the right fix (though it may be) since we > shouldn't be susceptible to this window anyway? Either way, we perhaps need > to come up with something else, because this could potentially break windows > support. Perhaps if we simply did not swap in the TMPLINK file so that it > never actually get mapped, it would perhaps be enough. [~JoshuaMcKenzie], > WDYT? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9206) Remove seed gossip probability
[ https://issues.apache.org/jira/browse/CASSANDRA-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-9206: Attachment: (was: 9206.txt) > Remove seed gossip probability > -- > > Key: CASSANDRA-9206 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9206 > Project: Cassandra > Issue Type: Improvement >Reporter: Brandon Williams >Assignee: Brandon Williams > Fix For: 2.1.5 > > Attachments: 9206.txt > > > Currently, we use probability to determine whether a node will gossip with a > seed: > {noformat} > double probability = seeds.size() / (double) > (liveEndpoints.size() + unreachableEndpoints.size()); > double randDbl = random.nextDouble(); > if (randDbl <= probability) > sendGossip(prod, seeds); > {noformat} > I propose that we remove this probability, and instead *always* gossip with a > seed. This of course means increased traffic and processing on the seed(s), > but even a 1000 node cluster with a single seed will only put ~1000 messages > per second on the seed, which is virtually nothing. Should it become a > problem, the solution is simple: add more seeds. Since seeds will also > always gossip with each other, this effectively gives us a poor man's > spanning tree, with the only cost being removing a few lines of code, and > should greatly improve our gossip convergence time, especially in large > clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9206) Remove seed gossip probability
[ https://issues.apache.org/jira/browse/CASSANDRA-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-9206: Attachment: 9206.txt > Remove seed gossip probability > -- > > Key: CASSANDRA-9206 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9206 > Project: Cassandra > Issue Type: Improvement >Reporter: Brandon Williams >Assignee: Brandon Williams > Fix For: 2.1.5 > > Attachments: 9206.txt > > > Currently, we use probability to determine whether a node will gossip with a > seed: > {noformat} > double probability = seeds.size() / (double) > (liveEndpoints.size() + unreachableEndpoints.size()); > double randDbl = random.nextDouble(); > if (randDbl <= probability) > sendGossip(prod, seeds); > {noformat} > I propose that we remove this probability, and instead *always* gossip with a > seed. This of course means increased traffic and processing on the seed(s), > but even a 1000 node cluster with a single seed will only put ~1000 messages > per second on the seed, which is virtually nothing. Should it become a > problem, the solution is simple: add more seeds. Since seeds will also > always gossip with each other, this effectively gives us a poor man's > spanning tree, with the only cost being removing a few lines of code, and > should greatly improve our gossip convergence time, especially in large > clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9131) Defining correct behavior during leap second insertion
[ https://issues.apache.org/jira/browse/CASSANDRA-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500127#comment-14500127 ] Andy Tolbert edited comment on CASSANDRA-9131 at 4/17/15 4:36 PM: -- {quote} if we should update the client protocol spec/docs to make clear that this problem exists, and that clients are expected to work around it if we should offer some sample code, and work with the Java Driver team to ensure this problem doesn't affect it {quote} This is something I'm actively looking at on the drivers side for each driver. As of now only the python-driver and java-driver have a mechanism to enable automatically set client timestamps. The other drivers that support protocol v3 have a mechanism on the client specifying the timestamp (they could also use 'USING TIMESTAMP' i suppose), so it will be up to user / time implementation. Most of the drivers should have an active means of setting client timestamp through API by June 30th 2015. * python-driver: [Session use_client_timestamp|http://datastax.github.io/python-driver/api/cassandra/cluster.html?highlight=timestamp#cassandra.cluster.Session.use_client_timestamp] not monotonic, uses time.time(). * java-driver: [TimestampGenerator|http://www.datastax.com/drivers/java/2.1/com/datastax/driver/core/TimestampGenerator.html]. All client implementations are monotonic, but client can provide their own. If > 999 entries for same millisecond, will reuse the same timestamp, so some timestamps may not be distinct. Possible problem if > 1000 entries during leap second ([code|https://github.com/datastax/java-driver/blob/2.1/driver-core/src/main/java/com/datastax/driver/core/AbstractTimestampGenerator.java]) was (Author: andrew.tolbert): {quote} if we should update the client protocol spec/docs to make clear that this problem exists, and that clients are expected to work around it if we should offer some sample code, and work with the Java Driver team to ensure this problem doesn't affect it {quote} This is something I'm actively looking at on the drivers side for each driver. As of now only the python-driver and java-driver have a mechanism to enable automatically set client timestamps. The other drivers that support protocol v3 have a mechanism on the client specifying the timestamp (they could also use 'USING TIMESTAMP' i suppose), so it will be up to user / time implementation. Most of the drivers should have an active means of setting client timestamp through API by June 30th 2015. * python-driver: [Session use_client_timestamp|http://datastax.github.io/python-driver/api/cassandra/cluster.html?highlight=timestamp#cassandra.cluster.Session.use_client_timestamp] not monotonic, uses time.time(). * java-driver: [TimestampGenerator|http://www.datastax.com/drivers/java/2.1/com/datastax/driver/core/TimestampGenerator.html]. All client implementations are monotonic, but client can provide their own. If > 999 entries for same millisecond, will reloop over the same millisecond, so its not completely monotonic. Possible problem if > 1000 entries during leap second ([code|https://github.com/datastax/java-driver/blob/2.1/driver-core/src/main/java/com/datastax/driver/core/AbstractTimestampGenerator.java]) > Defining correct behavior during leap second insertion > -- > > Key: CASSANDRA-9131 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9131 > Project: Cassandra > Issue Type: Bug > Environment: Linux ip-172-31-0-5 3.2.0-57-virtual #87-Ubuntu SMP Tue > Nov 12 21:53:49 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Jim Witschey >Assignee: Jim Witschey > > On Linux platforms, the insertion of a leap second breaks the monotonicity of > timestamps. This can make values appear to have been inserted into Cassandra > in a different order than they were. I want to know what behavior is expected > and desirable for inserts over this discontinuity. > From a timestamp perspective, an inserted leap second looks like a repeat of > the previous second: > {code} > $ while true ; do echo "`date +%s%N` `date -u`" ; sleep .5 ; done > 1435708798171327029 Tue Jun 30 23:59:58 UTC 2015 > 1435708798679392477 Tue Jun 30 23:59:58 UTC 2015 > 1435708799187550335 Tue Jun 30 23:59:59 UTC 2015 > 1435708799695670453 Tue Jun 30 23:59:59 UTC 2015 > 1435708799203902068 Tue Jun 30 23:59:59 UTC 2015 > 1435708799712168566 Tue Jun 30 23:59:59 UTC 2015 > 1435708800220473932 Wed Jul 1 00:00:00 UTC 2015 > 1435708800728908190 Wed Jul 1 00:00:00 UTC 2015 > 1435708801237611983 Wed Jul 1 00:00:01 UTC 2015 > 1435708801746251996 Wed Jul 1 00:00:01 UTC 2015 > {code} > Note that 23:59:59 repeats itself, and that the timestamps increase during > the first time through, then step back down to the b
[jira] [Comment Edited] (CASSANDRA-9131) Defining correct behavior during leap second insertion
[ https://issues.apache.org/jira/browse/CASSANDRA-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500127#comment-14500127 ] Andy Tolbert edited comment on CASSANDRA-9131 at 4/17/15 4:28 PM: -- {quote} if we should update the client protocol spec/docs to make clear that this problem exists, and that clients are expected to work around it if we should offer some sample code, and work with the Java Driver team to ensure this problem doesn't affect it {quote} This is something I'm actively looking at on the drivers side for each driver. As of now only the python-driver and java-driver have a mechanism to enable automatically set client timestamps. The other drivers that support protocol v3 have a mechanism on the client specifying the timestamp (they could also use 'USING TIMESTAMP' i suppose), so it will be up to user / time implementation. Most of the drivers should have an active means of setting client timestamp through API by June 30th 2015. * python-driver: [Session use_client_timestamp|http://datastax.github.io/python-driver/api/cassandra/cluster.html?highlight=timestamp#cassandra.cluster.Session.use_client_timestamp] not monotonic, uses time.time(). * java-driver: [TimestampGenerator|http://www.datastax.com/drivers/java/2.1/com/datastax/driver/core/TimestampGenerator.html]. All client implementations are monotonic, but client can provide their own. If > 999 entries for same millisecond, will reloop over the same millisecond, so its not completely monotonic. Possible problem if > 1000 entries during leap second ([code|https://github.com/datastax/java-driver/blob/2.1/driver-core/src/main/java/com/datastax/driver/core/AbstractTimestampGenerator.java]) was (Author: andrew.tolbert): {quote} if we should update the client protocol spec/docs to make clear that this problem exists, and that clients are expected to work around it if we should offer some sample code, and work with the Java Driver team to ensure this problem doesn't affect it {quote} This is something I'm actively looking at on the drivers side for each driver. As of now only the python-driver and java-driver have a mechanism to enable automatically set client timestamps. The other drivers that support protocol v3 have a mechanism on the client specifying the timestamp (they could also use 'USING TIMESTAMP' i suppose), so it will be up to user / time implementation. Most of the drivers will have an active means of setting client timestamp through API by June 30th 2015. * python-driver: [Session use_client_timestamp|http://datastax.github.io/python-driver/api/cassandra/cluster.html?highlight=timestamp#cassandra.cluster.Session.use_client_timestamp] not monotonic, uses time.time(). * java-driver: [TimestampGenerator|http://www.datastax.com/drivers/java/2.1/com/datastax/driver/core/TimestampGenerator.html]. All client implementations are monotonic, but client can provide their own. If > 999 entries for same millisecond, will reloop over the same millisecond, so its not completely monotonic. Possible problem if > 1000 entries during leap second ([code|https://github.com/datastax/java-driver/blob/2.1/driver-core/src/main/java/com/datastax/driver/core/AbstractTimestampGenerator.java]) > Defining correct behavior during leap second insertion > -- > > Key: CASSANDRA-9131 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9131 > Project: Cassandra > Issue Type: Bug > Environment: Linux ip-172-31-0-5 3.2.0-57-virtual #87-Ubuntu SMP Tue > Nov 12 21:53:49 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Jim Witschey >Assignee: Jim Witschey > > On Linux platforms, the insertion of a leap second breaks the monotonicity of > timestamps. This can make values appear to have been inserted into Cassandra > in a different order than they were. I want to know what behavior is expected > and desirable for inserts over this discontinuity. > From a timestamp perspective, an inserted leap second looks like a repeat of > the previous second: > {code} > $ while true ; do echo "`date +%s%N` `date -u`" ; sleep .5 ; done > 1435708798171327029 Tue Jun 30 23:59:58 UTC 2015 > 1435708798679392477 Tue Jun 30 23:59:58 UTC 2015 > 1435708799187550335 Tue Jun 30 23:59:59 UTC 2015 > 1435708799695670453 Tue Jun 30 23:59:59 UTC 2015 > 1435708799203902068 Tue Jun 30 23:59:59 UTC 2015 > 1435708799712168566 Tue Jun 30 23:59:59 UTC 2015 > 1435708800220473932 Wed Jul 1 00:00:00 UTC 2015 > 1435708800728908190 Wed Jul 1 00:00:00 UTC 2015 > 1435708801237611983 Wed Jul 1 00:00:01 UTC 2015 > 1435708801746251996 Wed Jul 1 00:00:01 UTC 2015 > {code} > Note that 23:59:59 repeats itself, and that the timestamps increase during > the first time through, then step back down to the be
[jira] [Comment Edited] (CASSANDRA-9131) Defining correct behavior during leap second insertion
[ https://issues.apache.org/jira/browse/CASSANDRA-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500127#comment-14500127 ] Andy Tolbert edited comment on CASSANDRA-9131 at 4/17/15 4:27 PM: -- {quote} if we should update the client protocol spec/docs to make clear that this problem exists, and that clients are expected to work around it if we should offer some sample code, and work with the Java Driver team to ensure this problem doesn't affect it {quote} This is something I'm actively looking at on the drivers side for each driver. As of now only the python-driver and java-driver have a mechanism to enable automatically set client timestamps. The other drivers that support protocol v3 have a mechanism on the client specifying the timestamp (they could also use 'USING TIMESTAMP' i suppose), so it will be up to user / time implementation. Most of the drivers will have an active means of setting client timestamp through API by June 30th 2015. * python-driver: [Session use_client_timestamp|http://datastax.github.io/python-driver/api/cassandra/cluster.html?highlight=timestamp#cassandra.cluster.Session.use_client_timestamp] not monotonic, uses time.time(). * java-driver: [TimestampGenerator|http://www.datastax.com/drivers/java/2.1/com/datastax/driver/core/TimestampGenerator.html]. All client implementations are monotonic, but client can provide their own. If > 999 entries for same millisecond, will reloop over the same millisecond, so its not completely monotonic. Possible problem if > 1000 entries during leap second ([code|https://github.com/datastax/java-driver/blob/2.1/driver-core/src/main/java/com/datastax/driver/core/AbstractTimestampGenerator.java]) was (Author: andrew.tolbert): {quote} if we should update the client protocol spec/docs to make clear that this problem exists, and that clients are expected to work around it if we should offer some sample code, and work with the Java Driver team to ensure this problem doesn't affect it {quote} This is something I'm actively looking at on the drivers side for each driver. As of now only the python-driver and java-driver have a mechanism to enable automatically set client timestamps. The other drivers that support 2.1 have a mechanism of the client specifying the timestamp (they could also use 'USING TIMESTAMP' i suppose), so it will be up to user / time implementation. Most of the drivers will have an active means of setting client timestamp through API by June 30th 2015. * python-driver: [Session use_client_timestamp|http://datastax.github.io/python-driver/api/cassandra/cluster.html?highlight=timestamp#cassandra.cluster.Session.use_client_timestamp] not monotonic, uses time.time(). * java-driver: [TimestampGenerator|http://www.datastax.com/drivers/java/2.1/com/datastax/driver/core/TimestampGenerator.html]. All client implementations are monotonic, but client can provide their own. If > 999 entries for same millisecond, will reloop over the same millisecond, so its not completely monotonic. Possible problem if > 1000 entries during leap second ([code|https://github.com/datastax/java-driver/blob/2.1/driver-core/src/main/java/com/datastax/driver/core/AbstractTimestampGenerator.java]) > Defining correct behavior during leap second insertion > -- > > Key: CASSANDRA-9131 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9131 > Project: Cassandra > Issue Type: Bug > Environment: Linux ip-172-31-0-5 3.2.0-57-virtual #87-Ubuntu SMP Tue > Nov 12 21:53:49 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Jim Witschey >Assignee: Jim Witschey > > On Linux platforms, the insertion of a leap second breaks the monotonicity of > timestamps. This can make values appear to have been inserted into Cassandra > in a different order than they were. I want to know what behavior is expected > and desirable for inserts over this discontinuity. > From a timestamp perspective, an inserted leap second looks like a repeat of > the previous second: > {code} > $ while true ; do echo "`date +%s%N` `date -u`" ; sleep .5 ; done > 1435708798171327029 Tue Jun 30 23:59:58 UTC 2015 > 1435708798679392477 Tue Jun 30 23:59:58 UTC 2015 > 1435708799187550335 Tue Jun 30 23:59:59 UTC 2015 > 1435708799695670453 Tue Jun 30 23:59:59 UTC 2015 > 1435708799203902068 Tue Jun 30 23:59:59 UTC 2015 > 1435708799712168566 Tue Jun 30 23:59:59 UTC 2015 > 1435708800220473932 Wed Jul 1 00:00:00 UTC 2015 > 1435708800728908190 Wed Jul 1 00:00:00 UTC 2015 > 1435708801237611983 Wed Jul 1 00:00:01 UTC 2015 > 1435708801746251996 Wed Jul 1 00:00:01 UTC 2015 > {code} > Note that 23:59:59 repeats itself, and that the timestamps increase during > the first time through, then step back down to the beginning of
[6/6] cassandra git commit: Merge branch 'cassandra-2.1' into trunk
Merge branch 'cassandra-2.1' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/ae3edb2a Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/ae3edb2a Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/ae3edb2a Branch: refs/heads/trunk Commit: ae3edb2abee8c847b4b76349ba5dece54450ebac Parents: 0f72f79 b4fae85 Author: Jonathan Ellis Authored: Fri Apr 17 11:24:27 2015 -0500 Committer: Jonathan Ellis Committed: Fri Apr 17 11:24:27 2015 -0500 -- src/java/org/apache/cassandra/utils/MurmurHash.java | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/ae3edb2a/src/java/org/apache/cassandra/utils/MurmurHash.java --
[jira] [Commented] (CASSANDRA-9131) Defining correct behavior during leap second insertion
[ https://issues.apache.org/jira/browse/CASSANDRA-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500127#comment-14500127 ] Andy Tolbert commented on CASSANDRA-9131: - {quote} if we should update the client protocol spec/docs to make clear that this problem exists, and that clients are expected to work around it if we should offer some sample code, and work with the Java Driver team to ensure this problem doesn't affect it {quote} This is something I'm actively looking at on the drivers side for each driver. As of now only the python-driver and java-driver have a mechanism to enable automatically set client timestamps. The other drivers that support 2.1 have a mechanism of the client specifying the timestamp (they could also use 'USING TIMESTAMP' i suppose), so it will be up to user / time implementation. Most of the drivers will have an active means of setting client timestamp through API by June 30th 2015. * python-driver: [Session use_client_timestamp|http://datastax.github.io/python-driver/api/cassandra/cluster.html?highlight=timestamp#cassandra.cluster.Session.use_client_timestamp] not monotonic, uses time.time(). * java-driver: [TimestampGenerator|http://www.datastax.com/drivers/java/2.1/com/datastax/driver/core/TimestampGenerator.html]. All client implementations are monotonic, but client can provide their own. If > 999 entries for same millisecond, will reloop over the same millisecond, so its not completely monotonic. Possible problem if > 1000 entries during leap second ([code|https://github.com/datastax/java-driver/blob/2.1/driver-core/src/main/java/com/datastax/driver/core/AbstractTimestampGenerator.java]) > Defining correct behavior during leap second insertion > -- > > Key: CASSANDRA-9131 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9131 > Project: Cassandra > Issue Type: Bug > Environment: Linux ip-172-31-0-5 3.2.0-57-virtual #87-Ubuntu SMP Tue > Nov 12 21:53:49 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Jim Witschey >Assignee: Jim Witschey > > On Linux platforms, the insertion of a leap second breaks the monotonicity of > timestamps. This can make values appear to have been inserted into Cassandra > in a different order than they were. I want to know what behavior is expected > and desirable for inserts over this discontinuity. > From a timestamp perspective, an inserted leap second looks like a repeat of > the previous second: > {code} > $ while true ; do echo "`date +%s%N` `date -u`" ; sleep .5 ; done > 1435708798171327029 Tue Jun 30 23:59:58 UTC 2015 > 1435708798679392477 Tue Jun 30 23:59:58 UTC 2015 > 1435708799187550335 Tue Jun 30 23:59:59 UTC 2015 > 1435708799695670453 Tue Jun 30 23:59:59 UTC 2015 > 1435708799203902068 Tue Jun 30 23:59:59 UTC 2015 > 1435708799712168566 Tue Jun 30 23:59:59 UTC 2015 > 1435708800220473932 Wed Jul 1 00:00:00 UTC 2015 > 1435708800728908190 Wed Jul 1 00:00:00 UTC 2015 > 1435708801237611983 Wed Jul 1 00:00:01 UTC 2015 > 1435708801746251996 Wed Jul 1 00:00:01 UTC 2015 > {code} > Note that 23:59:59 repeats itself, and that the timestamps increase during > the first time through, then step back down to the beginning of the second > and increase again. > As a result, the timestamps on values inserted during these seconds will be > out of order. I set up a 4-node cluster running under Ubuntu 12.04.3 and > synced them to shortly before the leap second would be inserted. During the > insertion of the leap second, I ran a test with logic something like: > {code} > simple_insert = session.prepare( > 'INSERT INTO test (foo, bar) VALUES (?, ?);') > for i in itertools.count(): > # stop after midnight > now = datetime.utcnow() > last_midnight = now.replace(hour=0, minute=0, > second=0, microsecond=0) > seconds_since_midnight = (now - last_midnight).total_seconds() > if 5 <= seconds_since_midnight <= 15: > break > session.execute(simple_insert, [i, i]) > result = session.execute("SELECT bar, WRITETIME(bar) FROM test;") > {code} > EDIT: This behavior occurs with server-generated timestamps; in this > particular test, I set {{use_client_timestamp}} to {{False}}. > Under normal circumstances, the values and writetimes would increase > together, but when inserted over the leap second, they don't. These {{value, > writetime}} pairs are sorted by writetime: > {code} > (582, 1435708799285000) > (579, 1435708799339000) > (583, 1435708799593000) > (580, 1435708799643000) > (584, 1435708799897000) > (581, 1435708799958000) > {code} > The values were inserted in increasing order, but their writetimes are in a > different order because of the repeated second. During the first instance of > 23:59:59, the values 579, 580, and 581
[4/6] cassandra git commit: Merge branch 'cassandra-2.0' into cassandra-2.1
Merge branch 'cassandra-2.0' into cassandra-2.1 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b4fae855 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b4fae855 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b4fae855 Branch: refs/heads/trunk Commit: b4fae85578b1bd31d162be9cb58b03c0be9f853f Parents: 5d88ff4 724384a Author: Jonathan Ellis Authored: Fri Apr 17 11:24:19 2015 -0500 Committer: Jonathan Ellis Committed: Fri Apr 17 11:24:19 2015 -0500 -- src/java/org/apache/cassandra/utils/MurmurHash.java | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/b4fae855/src/java/org/apache/cassandra/utils/MurmurHash.java --
[3/6] cassandra git commit: comment Murmur incompatibility
comment Murmur incompatibility Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/724384ab Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/724384ab Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/724384ab Branch: refs/heads/trunk Commit: 724384ab05e4a6bf3cacb1732641968d37e8c391 Parents: 9bbcbf5 Author: Jonathan Ellis Authored: Fri Apr 17 11:23:46 2015 -0500 Committer: Jonathan Ellis Committed: Fri Apr 17 11:24:07 2015 -0500 -- src/java/org/apache/cassandra/utils/MurmurHash.java | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/724384ab/src/java/org/apache/cassandra/utils/MurmurHash.java -- diff --git a/src/java/org/apache/cassandra/utils/MurmurHash.java b/src/java/org/apache/cassandra/utils/MurmurHash.java index 9dcde6d..c02fdcc 100644 --- a/src/java/org/apache/cassandra/utils/MurmurHash.java +++ b/src/java/org/apache/cassandra/utils/MurmurHash.java @@ -24,8 +24,10 @@ import java.nio.ByteBuffer; * lookup. See http://murmurhash.googlepages.com/ for more details. * * hash32() and hash64() are MurmurHash 2.0. - * hash3_x64_128() is MurmurHash 3.0. * + * hash3_x64_128() is *almost* MurmurHash 3.0. It was supposed to match, but we didn't catch a sign bug with + * the result that it doesn't. Unfortunately, we can't change it now without breaking Murmur3Partitioner. * + * * * The C version of MurmurHash 2.0 found at that site was ported to Java by * Andrzej Bialecki (ab at getopt org).
[2/6] cassandra git commit: comment Murmur incompatibility
comment Murmur incompatibility Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/724384ab Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/724384ab Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/724384ab Branch: refs/heads/cassandra-2.1 Commit: 724384ab05e4a6bf3cacb1732641968d37e8c391 Parents: 9bbcbf5 Author: Jonathan Ellis Authored: Fri Apr 17 11:23:46 2015 -0500 Committer: Jonathan Ellis Committed: Fri Apr 17 11:24:07 2015 -0500 -- src/java/org/apache/cassandra/utils/MurmurHash.java | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/724384ab/src/java/org/apache/cassandra/utils/MurmurHash.java -- diff --git a/src/java/org/apache/cassandra/utils/MurmurHash.java b/src/java/org/apache/cassandra/utils/MurmurHash.java index 9dcde6d..c02fdcc 100644 --- a/src/java/org/apache/cassandra/utils/MurmurHash.java +++ b/src/java/org/apache/cassandra/utils/MurmurHash.java @@ -24,8 +24,10 @@ import java.nio.ByteBuffer; * lookup. See http://murmurhash.googlepages.com/ for more details. * * hash32() and hash64() are MurmurHash 2.0. - * hash3_x64_128() is MurmurHash 3.0. * + * hash3_x64_128() is *almost* MurmurHash 3.0. It was supposed to match, but we didn't catch a sign bug with + * the result that it doesn't. Unfortunately, we can't change it now without breaking Murmur3Partitioner. * + * * * The C version of MurmurHash 2.0 found at that site was ported to Java by * Andrzej Bialecki (ab at getopt org).
[5/6] cassandra git commit: Merge branch 'cassandra-2.0' into cassandra-2.1
Merge branch 'cassandra-2.0' into cassandra-2.1 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b4fae855 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b4fae855 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b4fae855 Branch: refs/heads/cassandra-2.1 Commit: b4fae85578b1bd31d162be9cb58b03c0be9f853f Parents: 5d88ff4 724384a Author: Jonathan Ellis Authored: Fri Apr 17 11:24:19 2015 -0500 Committer: Jonathan Ellis Committed: Fri Apr 17 11:24:19 2015 -0500 -- src/java/org/apache/cassandra/utils/MurmurHash.java | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/b4fae855/src/java/org/apache/cassandra/utils/MurmurHash.java --
[1/6] cassandra git commit: comment Murmur incompatibility
Repository: cassandra Updated Branches: refs/heads/cassandra-2.0 9bbcbf505 -> 724384ab0 refs/heads/cassandra-2.1 5d88ff4e4 -> b4fae8557 refs/heads/trunk 0f72f79d5 -> ae3edb2ab comment Murmur incompatibility Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/724384ab Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/724384ab Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/724384ab Branch: refs/heads/cassandra-2.0 Commit: 724384ab05e4a6bf3cacb1732641968d37e8c391 Parents: 9bbcbf5 Author: Jonathan Ellis Authored: Fri Apr 17 11:23:46 2015 -0500 Committer: Jonathan Ellis Committed: Fri Apr 17 11:24:07 2015 -0500 -- src/java/org/apache/cassandra/utils/MurmurHash.java | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/724384ab/src/java/org/apache/cassandra/utils/MurmurHash.java -- diff --git a/src/java/org/apache/cassandra/utils/MurmurHash.java b/src/java/org/apache/cassandra/utils/MurmurHash.java index 9dcde6d..c02fdcc 100644 --- a/src/java/org/apache/cassandra/utils/MurmurHash.java +++ b/src/java/org/apache/cassandra/utils/MurmurHash.java @@ -24,8 +24,10 @@ import java.nio.ByteBuffer; * lookup. See http://murmurhash.googlepages.com/ for more details. * * hash32() and hash64() are MurmurHash 2.0. - * hash3_x64_128() is MurmurHash 3.0. * + * hash3_x64_128() is *almost* MurmurHash 3.0. It was supposed to match, but we didn't catch a sign bug with + * the result that it doesn't. Unfortunately, we can't change it now without breaking Murmur3Partitioner. * + * * * The C version of MurmurHash 2.0 found at that site was ported to Java by * Andrzej Bialecki (ab at getopt org).
[jira] [Commented] (CASSANDRA-8718) nodetool cleanup causes segfault
[ https://issues.apache.org/jira/browse/CASSANDRA-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500122#comment-14500122 ] Benedict commented on CASSANDRA-8718: - This is most likely a resource cleanup issue, with accessing offheap memory that has been freed as you suggest (the error printed in the stdout is more helpful, since it's clearly in the middle of the offheap binarySearch). Which means most likely either a double decrement of refcounts, or not taking a reference somewhere. Since this is being thrown in Compaction, the latter is actually always true (ie we never take a separate reference), so I would suspect what is happening is cleanup releases references even if a compaction is operating on the sstables, or perhaps doesn't properly mark the sstable compacting first, or something along those lines. > nodetool cleanup causes segfault > > > Key: CASSANDRA-8718 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8718 > Project: Cassandra > Issue Type: Bug >Reporter: Maxim Ivanov >Assignee: Joshua McKenzie >Priority: Minor > Fix For: 2.0.15 > > Attachments: java_hs_err.log > > > When doing cleanup on C* 2.0.12 following error crashes the java process: > {code} > INFO 17:59:02,800 Cleaning up > SSTableReader(path='/data/sdd/cassandra_prod/vdna/analytics/vdna-analytics-jb-21670-Data.db') > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f750890268e, pid=28039, tid=140130222446336 > # > # JRE version: Java(TM) SE Runtime Environment (7.0_71-b14) (build > 1.7.0_71-b14) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.71-b01 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # J 2655 C2 > org.apache.cassandra.io.sstable.IndexSummary.binarySearch(Lorg/apache/cassandra/db/RowPosition;)I > (88 bytes) @ 0x7f750890268e [0x7f7508902580+0x10e] > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /var/lib/cassandra_prod/hs_err_pid28039.log > Compiled method (c2) 913167265 4849 > org.apache.cassandra.dht.Token::maxKeyBound (24 bytes) > total in heap [0x7f7508572450,0x7f7508573318] = 3784 > relocation [0x7f7508572570,0x7f7508572618] = 168 > main code [0x7f7508572620,0x7f7508572cc0] = 1696 > stub code [0x7f7508572cc0,0x7f7508572cf8] = 56 > oops [0x7f7508572cf8,0x7f7508572d90] = 152 > scopes data[0x7f7508572d90,0x7f7508573118] = 904 > scopes pcs [0x7f7508573118,0x7f7508573268] = 336 > dependencies [0x7f7508573268,0x7f7508573280] = 24 > handler table [0x7f7508573280,0x7f75085732e0] = 96 > nul chk table [0x7f75085732e0,0x7f7508573318] = 56 > # > # If you would like to submit a bug report, please visit: > # http://bugreport.sun.com/bugreport/crash.jsp > # > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8766) SSTableRewriter opens all sstables as early before completing the compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-8766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500096#comment-14500096 ] Joshua McKenzie commented on CASSANDRA-8766: I believe we resolved this as part of CASSANDRA-8535: {code:title=switchWriter} // If early re-open is disabled, simply finalize the writer and store it if (preemptiveOpenInterval == Long.MAX_VALUE) { SSTableReader reader = writer.finish(SSTableWriter.FinishType.NORMAL, maxAge, -1); finishedReaders.add(reader); } {code} > SSTableRewriter opens all sstables as early before completing the compaction > > > Key: CASSANDRA-8766 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8766 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Benedict >Assignee: Joshua McKenzie >Priority: Minor > Fix For: 2.1.5 > > > In CASSANDRA-8320, we made the rewriter call switchWriter() inside of > finish(); in CASSANDRA-8124 was made switchWriter() open its data as EARLY. > This combination means we no longer honour disabling of early opening, which > is potentially a problem on windows for the deletion of the contents (which > is why we disable early opening on Windows). > I've commented on CASSANDRA-8124, as I suspect I'm missing something about > this. Although I have no doubt the old behaviour of opening as TMP file > reduced the window for problems, and opening as TMPLINK now does the same, > it's not entirely clear to me its the right fix (though it may be) since we > shouldn't be susceptible to this window anyway? Either way, we perhaps need > to come up with something else, because this could potentially break windows > support. Perhaps if we simply did not swap in the TMPLINK file so that it > never actually get mapped, it would perhaps be enough. [~JoshuaMcKenzie], > WDYT? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-8718) nodetool cleanup causes segfault
[ https://issues.apache.org/jira/browse/CASSANDRA-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie reassigned CASSANDRA-8718: -- Assignee: Joshua McKenzie > nodetool cleanup causes segfault > > > Key: CASSANDRA-8718 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8718 > Project: Cassandra > Issue Type: Bug >Reporter: Maxim Ivanov >Assignee: Joshua McKenzie >Priority: Minor > Fix For: 2.0.15 > > Attachments: java_hs_err.log > > > When doing cleanup on C* 2.0.12 following error crashes the java process: > {code} > INFO 17:59:02,800 Cleaning up > SSTableReader(path='/data/sdd/cassandra_prod/vdna/analytics/vdna-analytics-jb-21670-Data.db') > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f750890268e, pid=28039, tid=140130222446336 > # > # JRE version: Java(TM) SE Runtime Environment (7.0_71-b14) (build > 1.7.0_71-b14) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.71-b01 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # J 2655 C2 > org.apache.cassandra.io.sstable.IndexSummary.binarySearch(Lorg/apache/cassandra/db/RowPosition;)I > (88 bytes) @ 0x7f750890268e [0x7f7508902580+0x10e] > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /var/lib/cassandra_prod/hs_err_pid28039.log > Compiled method (c2) 913167265 4849 > org.apache.cassandra.dht.Token::maxKeyBound (24 bytes) > total in heap [0x7f7508572450,0x7f7508573318] = 3784 > relocation [0x7f7508572570,0x7f7508572618] = 168 > main code [0x7f7508572620,0x7f7508572cc0] = 1696 > stub code [0x7f7508572cc0,0x7f7508572cf8] = 56 > oops [0x7f7508572cf8,0x7f7508572d90] = 152 > scopes data[0x7f7508572d90,0x7f7508573118] = 904 > scopes pcs [0x7f7508573118,0x7f7508573268] = 336 > dependencies [0x7f7508573268,0x7f7508573280] = 24 > handler table [0x7f7508573280,0x7f75085732e0] = 96 > nul chk table [0x7f75085732e0,0x7f7508573318] = 56 > # > # If you would like to submit a bug report, please visit: > # http://bugreport.sun.com/bugreport/crash.jsp > # > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8718) nodetool cleanup causes segfault
[ https://issues.apache.org/jira/browse/CASSANDRA-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500089#comment-14500089 ] Joshua McKenzie commented on CASSANDRA-8718: [~philipthompson] Looks like the crash occurs while getting information for our index scan position, likely during access of off-heap memory since the rest of getIndexScanPosition is pretty innocuous. I believe a full memory dump would be necessary to get more visibility into what's gone wrong, though JDK crash dumps aren't my forte ([~benedict] - care to sanity check?) [~srspnda] [~rossmohax]: Were either of you able to get more information about this error? Updates to JDK / C* have any impact on this error presenting? > nodetool cleanup causes segfault > > > Key: CASSANDRA-8718 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8718 > Project: Cassandra > Issue Type: Bug >Reporter: Maxim Ivanov >Priority: Minor > Fix For: 2.0.15 > > Attachments: java_hs_err.log > > > When doing cleanup on C* 2.0.12 following error crashes the java process: > {code} > INFO 17:59:02,800 Cleaning up > SSTableReader(path='/data/sdd/cassandra_prod/vdna/analytics/vdna-analytics-jb-21670-Data.db') > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f750890268e, pid=28039, tid=140130222446336 > # > # JRE version: Java(TM) SE Runtime Environment (7.0_71-b14) (build > 1.7.0_71-b14) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.71-b01 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # J 2655 C2 > org.apache.cassandra.io.sstable.IndexSummary.binarySearch(Lorg/apache/cassandra/db/RowPosition;)I > (88 bytes) @ 0x7f750890268e [0x7f7508902580+0x10e] > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /var/lib/cassandra_prod/hs_err_pid28039.log > Compiled method (c2) 913167265 4849 > org.apache.cassandra.dht.Token::maxKeyBound (24 bytes) > total in heap [0x7f7508572450,0x7f7508573318] = 3784 > relocation [0x7f7508572570,0x7f7508572618] = 168 > main code [0x7f7508572620,0x7f7508572cc0] = 1696 > stub code [0x7f7508572cc0,0x7f7508572cf8] = 56 > oops [0x7f7508572cf8,0x7f7508572d90] = 152 > scopes data[0x7f7508572d90,0x7f7508573118] = 904 > scopes pcs [0x7f7508573118,0x7f7508573268] = 336 > dependencies [0x7f7508573268,0x7f7508573280] = 24 > handler table [0x7f7508573280,0x7f75085732e0] = 96 > nul chk table [0x7f75085732e0,0x7f7508573318] = 56 > # > # If you would like to submit a bug report, please visit: > # http://bugreport.sun.com/bugreport/crash.jsp > # > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9208) Setting rpc_interface in cassandra.yaml causes NPE during startup
[ https://issues.apache.org/jira/browse/CASSANDRA-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep More updated CASSANDRA-9208: Description: In the cassandra.yaml file when "rpc_interface" option is set it causes a NPE (stack trace at the end). Upon further investigation it turns out that there is a serious problem is in the way this logic is handled in the code DatabaseDescriptor.java (#374). Following is the code snippet else if (conf.rpc_interface != null) { listenAddress = getNetworkInterfaceAddress(conf.rpc_interface, "rpc_interface"); } else { rpcAddress = FBUtilities.getLocalAddress(); } If you notice, 1) The code above sets the "listenAddress" instead of "rpcAddress". 2) The function getNetworkInterfaceAddress() blindly assumes that this is called to set the "listenAddress" (see line 171). The "configName" variable passed to the function is royally ignored and only used for printing out exception (which again is misleading) I am also attaching a suggested patch (NOTE: the patch tries to address this issue, the function getNetworkInterfaceAddress() needs revision ). INFO 15:36:56 Windows environment detected. DiskAccessMode set to standard, indexAccessMode standard INFO 15:36:56 Global memtable on-heap threshold is enabled at 503MB INFO 15:36:56 Global memtable off-heap threshold is enabled at 503MB ERROR 15:37:50 Fatal error during configuration loading java.lang.NullPointerException: null at org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:411) ~[apache-cassandra-2.1.4.jar:2.1.4] at org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:133) ~[apache-cassandra-2.1.4.jar:2.1.4] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:164) [apache-cassandra-2.1.4.jar:2.1.4] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:533) [apache-cassandra-2.1.4.jar:2.1.4] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:622) [apache-cassandra-2.1.4.jar:2.1.4] null Fatal error during configuration loading; unable to start. See log for stacktrace. was: In the cassandra.yaml file when "rpc_interface" option is set it causes a NPE. Upon further investigation it turns out that there is a serious problem is in the way this logic is handled in the code DatabaseDescriptor.java (#374). Following is the code snippet else if (conf.rpc_interface != null) { listenAddress = getNetworkInterfaceAddress(conf.rpc_interface, "rpc_interface"); } else { rpcAddress = FBUtilities.getLocalAddress(); } If you notice, 1) The code above sets the "listenAddress" instead of "rpcAddress". 2) The function getNetworkInterfaceAddress() blindly assumes that this is called to set the "listenAddress" (see line 171). The "configName" variable passed to the function is royally ignored and only used for printing out exception (which again is misleading) I am also attaching a suggested patch (NOTE: the patch tries to address this issue, the function getNetworkInterfaceAddress() needs revision ). > Setting rpc_interface in cassandra.yaml causes NPE during startup > - > > Key: CASSANDRA-9208 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9208 > Project: Cassandra > Issue Type: Bug > Components: Config > Environment: Windows and RHEL >Reporter: Sandeep More > Labels: easyfix, patch > Attachments: SuggestedDataBaseDescriptor.diff > > > In the cassandra.yaml file when "rpc_interface" option is set it causes a NPE > (stack trace at the end). > Upon further investigation it turns out that there is a serious problem is in > the way this logic is handled in the code DatabaseDescriptor.java (#374). > Following is the code snippet > else if (conf.rpc_interface != null) > { > listenAddress = getNetworkInterfaceAddress(conf.rpc_interface, > "rpc_interface"); > } > else > { > rpcAddress = FBUtilities.getLocalAddress(); > } > If you notice, > 1) The code above sets the "listenAddress" instead of "rpcAddress". > 2) The function getNetworkInterfaceAddress() blindly assumes that this is > called to set the "listenAddress" (see line 171). The "configName" variable > passed to the function is royally ignored and only used for printing out > exception (which again is misleading) > I am also attaching a suggested patch (NOTE: the patch tries to address this > issue, the function getNetworkInterfaceAddress() needs revision ). > INFO 15:36:56 Windows environment detected. DiskAccessMode set to standard,
[jira] [Created] (CASSANDRA-9208) Setting rpc_interface in cassandra.yaml causes NPE during startup
Sandeep More created CASSANDRA-9208: --- Summary: Setting rpc_interface in cassandra.yaml causes NPE during startup Key: CASSANDRA-9208 URL: https://issues.apache.org/jira/browse/CASSANDRA-9208 Project: Cassandra Issue Type: Bug Components: Config Environment: Windows and RHEL Reporter: Sandeep More Attachments: SuggestedDataBaseDescriptor.diff In the cassandra.yaml file when "rpc_interface" option is set it causes a NPE. Upon further investigation it turns out that there is a serious problem is in the way this logic is handled in the code DatabaseDescriptor.java (#374). Following is the code snippet else if (conf.rpc_interface != null) { listenAddress = getNetworkInterfaceAddress(conf.rpc_interface, "rpc_interface"); } else { rpcAddress = FBUtilities.getLocalAddress(); } If you notice, 1) The code above sets the "listenAddress" instead of "rpcAddress". 2) The function getNetworkInterfaceAddress() blindly assumes that this is called to set the "listenAddress" (see line 171). The "configName" variable passed to the function is royally ignored and only used for printing out exception (which again is misleading) I am also attaching a suggested patch (NOTE: the patch tries to address this issue, the function getNetworkInterfaceAddress() needs revision ). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-9204) AssertionError in CompactionExecutor thread
[ https://issues.apache.org/jira/browse/CASSANDRA-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict reassigned CASSANDRA-9204: --- Assignee: Benedict (was: Marcus Eriksson) > AssertionError in CompactionExecutor thread > --- > > Key: CASSANDRA-9204 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9204 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Philip Thompson >Assignee: Benedict > Fix For: 3.0 > > Attachments: node1.log > > > While running the dtest > {{upgrade_through_versions_test.py:TestRandomPartitionerUpgrade.upgrade_test}}, > the test is failing due to a large number of exceptions in the logs related > to compaction. Here is a snippet of one. The full log is attached. These > exceptions occurred after upgrading to trunk. The cluster had already > upgraded 1.2 -> 2.0 -> 2.1 successfully. > {code} > ERROR [CompactionExecutor:2] 2015-04-16 12:05:11,747 Cassandra > Daemon.java: Exception in thread Thread[CompactionExecutor:2,1 > ,main] > java.lang.AssertionError: null > at org.apache.cassandra.io.sstable.format.SSTableReade > r.setReplacedBy(SSTableReader.java:905) ~[main/:na] > at org.apache.cassandra.io.sstable.SSTableRewriter.fin > ishAndMaybeThrow(SSTableRewriter.java:461) ~[main/:na] > at org.apache.cassandra.io.sstable.SSTableRewriter.fin > ish(SSTableRewriter.java:418) ~[main/:na] > at org.apache.cassandra.io.sstable.SSTableRewriter.fin > ish(SSTableRewriter.java:398) ~[main/:na] > at org.apache.cassandra.db.compaction.writers.DefaultC > ompactionWriter.finish(DefaultCompactionWriter.java:77) ~[main > /:na] > at org.apache.cassandra.db.compaction.CompactionTask.r > unMayThrow(CompactionTask.java:202) ~[main/:na] > at org.apache.cassandra.utils.WrappedRunnable.run(Wrap > pedRunnable.java:28) ~[main/:na] > at org.apache.cassandra.db.compaction.CompactionTask.e > xecuteInternal(CompactionTask.java:73) ~[main/:na] > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:58) > ~[main/:na] > at > org.apache.cassandra.db.compaction.CompactionManager$5.execute(CompactionManager.java:371) > ~[main/:na] > at > org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:280) > ~[main/:na] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_75] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_75] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_75] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9204) AssertionError in CompactionExecutor thread
[ https://issues.apache.org/jira/browse/CASSANDRA-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500072#comment-14500072 ] Benedict commented on CASSANDRA-9204: - I'll take the ticket, but defer it until we can get CASSANDRA-8948 and CASSANDRA-8568 in since, as you say, they change behaviour pretty substantially and very likely fix it. > AssertionError in CompactionExecutor thread > --- > > Key: CASSANDRA-9204 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9204 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Philip Thompson >Assignee: Benedict > Fix For: 3.0 > > Attachments: node1.log > > > While running the dtest > {{upgrade_through_versions_test.py:TestRandomPartitionerUpgrade.upgrade_test}}, > the test is failing due to a large number of exceptions in the logs related > to compaction. Here is a snippet of one. The full log is attached. These > exceptions occurred after upgrading to trunk. The cluster had already > upgraded 1.2 -> 2.0 -> 2.1 successfully. > {code} > ERROR [CompactionExecutor:2] 2015-04-16 12:05:11,747 Cassandra > Daemon.java: Exception in thread Thread[CompactionExecutor:2,1 > ,main] > java.lang.AssertionError: null > at org.apache.cassandra.io.sstable.format.SSTableReade > r.setReplacedBy(SSTableReader.java:905) ~[main/:na] > at org.apache.cassandra.io.sstable.SSTableRewriter.fin > ishAndMaybeThrow(SSTableRewriter.java:461) ~[main/:na] > at org.apache.cassandra.io.sstable.SSTableRewriter.fin > ish(SSTableRewriter.java:418) ~[main/:na] > at org.apache.cassandra.io.sstable.SSTableRewriter.fin > ish(SSTableRewriter.java:398) ~[main/:na] > at org.apache.cassandra.db.compaction.writers.DefaultC > ompactionWriter.finish(DefaultCompactionWriter.java:77) ~[main > /:na] > at org.apache.cassandra.db.compaction.CompactionTask.r > unMayThrow(CompactionTask.java:202) ~[main/:na] > at org.apache.cassandra.utils.WrappedRunnable.run(Wrap > pedRunnable.java:28) ~[main/:na] > at org.apache.cassandra.db.compaction.CompactionTask.e > xecuteInternal(CompactionTask.java:73) ~[main/:na] > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:58) > ~[main/:na] > at > org.apache.cassandra.db.compaction.CompactionManager$5.execute(CompactionManager.java:371) > ~[main/:na] > at > org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:280) > ~[main/:na] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_75] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_75] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_75] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9204) AssertionError in CompactionExecutor thread
[ https://issues.apache.org/jira/browse/CASSANDRA-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500072#comment-14500072 ] Benedict edited comment on CASSANDRA-9204 at 4/17/15 3:46 PM: -- I'll take the ticket, but defer it until we can get CASSANDRA-8984 and CASSANDRA-8568 in since, as you say, they change behaviour pretty substantially and very likely fix it. was (Author: benedict): I'll take the ticket, but defer it until we can get CASSANDRA-8948 and CASSANDRA-8568 in since, as you say, they change behaviour pretty substantially and very likely fix it. > AssertionError in CompactionExecutor thread > --- > > Key: CASSANDRA-9204 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9204 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Philip Thompson >Assignee: Benedict > Fix For: 3.0 > > Attachments: node1.log > > > While running the dtest > {{upgrade_through_versions_test.py:TestRandomPartitionerUpgrade.upgrade_test}}, > the test is failing due to a large number of exceptions in the logs related > to compaction. Here is a snippet of one. The full log is attached. These > exceptions occurred after upgrading to trunk. The cluster had already > upgraded 1.2 -> 2.0 -> 2.1 successfully. > {code} > ERROR [CompactionExecutor:2] 2015-04-16 12:05:11,747 Cassandra > Daemon.java: Exception in thread Thread[CompactionExecutor:2,1 > ,main] > java.lang.AssertionError: null > at org.apache.cassandra.io.sstable.format.SSTableReade > r.setReplacedBy(SSTableReader.java:905) ~[main/:na] > at org.apache.cassandra.io.sstable.SSTableRewriter.fin > ishAndMaybeThrow(SSTableRewriter.java:461) ~[main/:na] > at org.apache.cassandra.io.sstable.SSTableRewriter.fin > ish(SSTableRewriter.java:418) ~[main/:na] > at org.apache.cassandra.io.sstable.SSTableRewriter.fin > ish(SSTableRewriter.java:398) ~[main/:na] > at org.apache.cassandra.db.compaction.writers.DefaultC > ompactionWriter.finish(DefaultCompactionWriter.java:77) ~[main > /:na] > at org.apache.cassandra.db.compaction.CompactionTask.r > unMayThrow(CompactionTask.java:202) ~[main/:na] > at org.apache.cassandra.utils.WrappedRunnable.run(Wrap > pedRunnable.java:28) ~[main/:na] > at org.apache.cassandra.db.compaction.CompactionTask.e > xecuteInternal(CompactionTask.java:73) ~[main/:na] > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:58) > ~[main/:na] > at > org.apache.cassandra.db.compaction.CompactionManager$5.execute(CompactionManager.java:371) > ~[main/:na] > at > org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:280) > ~[main/:na] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_75] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_75] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_75] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9206) Remove seed gossip probability
[ https://issues.apache.org/jira/browse/CASSANDRA-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500067#comment-14500067 ] Brandon Williams commented on CASSANDRA-9206: - For testing convergence time, I recommend starting a large-ish cluster except for one node, then starting that node with join_ring=false. Once everything is settled, call nodetool join on the node and then examine the deltas on the logs between when the node joined and all the nodes saw it. > Remove seed gossip probability > -- > > Key: CASSANDRA-9206 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9206 > Project: Cassandra > Issue Type: Improvement >Reporter: Brandon Williams >Assignee: Brandon Williams > Fix For: 2.1.5 > > Attachments: 9206.txt > > > Currently, we use probability to determine whether a node will gossip with a > seed: > {noformat} > double probability = seeds.size() / (double) > (liveEndpoints.size() + unreachableEndpoints.size()); > double randDbl = random.nextDouble(); > if (randDbl <= probability) > sendGossip(prod, seeds); > {noformat} > I propose that we remove this probability, and instead *always* gossip with a > seed. This of course means increased traffic and processing on the seed(s), > but even a 1000 node cluster with a single seed will only put ~1000 messages > per second on the seed, which is virtually nothing. Should it become a > problem, the solution is simple: add more seeds. Since seeds will also > always gossip with each other, this effectively gives us a poor man's > spanning tree, with the only cost being removing a few lines of code, and > should greatly improve our gossip convergence time, especially in large > clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9206) Remove seed gossip probability
[ https://issues.apache.org/jira/browse/CASSANDRA-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-9206: Tester: Philip Thompson Fix Version/s: 2.1.5 > Remove seed gossip probability > -- > > Key: CASSANDRA-9206 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9206 > Project: Cassandra > Issue Type: Improvement >Reporter: Brandon Williams >Assignee: Brandon Williams > Fix For: 2.1.5 > > Attachments: 9206.txt > > > Currently, we use probability to determine whether a node will gossip with a > seed: > {noformat} > double probability = seeds.size() / (double) > (liveEndpoints.size() + unreachableEndpoints.size()); > double randDbl = random.nextDouble(); > if (randDbl <= probability) > sendGossip(prod, seeds); > {noformat} > I propose that we remove this probability, and instead *always* gossip with a > seed. This of course means increased traffic and processing on the seed(s), > but even a 1000 node cluster with a single seed will only put ~1000 messages > per second on the seed, which is virtually nothing. Should it become a > problem, the solution is simple: add more seeds. Since seeds will also > always gossip with each other, this effectively gives us a poor man's > spanning tree, with the only cost being removing a few lines of code, and > should greatly improve our gossip convergence time, especially in large > clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9131) Defining correct behavior during leap second insertion
[ https://issues.apache.org/jira/browse/CASSANDRA-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500058#comment-14500058 ] Robert Stupp commented on CASSANDRA-9131: - bq. problem with server-side timestamps is unsafe retry by clients in the event of a failure. got it > Defining correct behavior during leap second insertion > -- > > Key: CASSANDRA-9131 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9131 > Project: Cassandra > Issue Type: Bug > Environment: Linux ip-172-31-0-5 3.2.0-57-virtual #87-Ubuntu SMP Tue > Nov 12 21:53:49 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Jim Witschey >Assignee: Jim Witschey > > On Linux platforms, the insertion of a leap second breaks the monotonicity of > timestamps. This can make values appear to have been inserted into Cassandra > in a different order than they were. I want to know what behavior is expected > and desirable for inserts over this discontinuity. > From a timestamp perspective, an inserted leap second looks like a repeat of > the previous second: > {code} > $ while true ; do echo "`date +%s%N` `date -u`" ; sleep .5 ; done > 1435708798171327029 Tue Jun 30 23:59:58 UTC 2015 > 1435708798679392477 Tue Jun 30 23:59:58 UTC 2015 > 1435708799187550335 Tue Jun 30 23:59:59 UTC 2015 > 1435708799695670453 Tue Jun 30 23:59:59 UTC 2015 > 1435708799203902068 Tue Jun 30 23:59:59 UTC 2015 > 1435708799712168566 Tue Jun 30 23:59:59 UTC 2015 > 1435708800220473932 Wed Jul 1 00:00:00 UTC 2015 > 1435708800728908190 Wed Jul 1 00:00:00 UTC 2015 > 1435708801237611983 Wed Jul 1 00:00:01 UTC 2015 > 1435708801746251996 Wed Jul 1 00:00:01 UTC 2015 > {code} > Note that 23:59:59 repeats itself, and that the timestamps increase during > the first time through, then step back down to the beginning of the second > and increase again. > As a result, the timestamps on values inserted during these seconds will be > out of order. I set up a 4-node cluster running under Ubuntu 12.04.3 and > synced them to shortly before the leap second would be inserted. During the > insertion of the leap second, I ran a test with logic something like: > {code} > simple_insert = session.prepare( > 'INSERT INTO test (foo, bar) VALUES (?, ?);') > for i in itertools.count(): > # stop after midnight > now = datetime.utcnow() > last_midnight = now.replace(hour=0, minute=0, > second=0, microsecond=0) > seconds_since_midnight = (now - last_midnight).total_seconds() > if 5 <= seconds_since_midnight <= 15: > break > session.execute(simple_insert, [i, i]) > result = session.execute("SELECT bar, WRITETIME(bar) FROM test;") > {code} > EDIT: This behavior occurs with server-generated timestamps; in this > particular test, I set {{use_client_timestamp}} to {{False}}. > Under normal circumstances, the values and writetimes would increase > together, but when inserted over the leap second, they don't. These {{value, > writetime}} pairs are sorted by writetime: > {code} > (582, 1435708799285000) > (579, 1435708799339000) > (583, 1435708799593000) > (580, 1435708799643000) > (584, 1435708799897000) > (581, 1435708799958000) > {code} > The values were inserted in increasing order, but their writetimes are in a > different order because of the repeated second. During the first instance of > 23:59:59, the values 579, 580, and 581 were inserted at the beginning, > middle, and end of the second. During the leap second, which is also > 23:59:59, 582, 583, and 584 were inserted, also at the beginning, middle, and > end of the second. However, since the two seconds are the same second, they > appear interleaved with respect to timestamps, as shown above. > So, should I consider this behavior correct? If not, how should Cassandra > correctly handle the discontinuity introduced by the insertion of a leap > second? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500053#comment-14500053 ] Alan Boudreault commented on CASSANDRA-7409: All scenario with basic patterns have been run. Same url than above. > Allow multiple overlapping sstables in L1 > - > > Key: CASSANDRA-7409 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 > Project: Cassandra > Issue Type: Improvement >Reporter: Carl Yeksigian >Assignee: Carl Yeksigian > Labels: compaction > Fix For: 3.0 > > > Currently, when a normal L0 compaction takes place (not STCS), we take up to > MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and > compact them together. If we didn't have to deal with the overlapping L1 > tables, we could compact a higher number of L0 sstables together into a set > of non-overlapping L1 sstables. > This could be done by delaying the invariant that L1 has no overlapping > sstables. Going from L1 to L2, we would be compacting fewer sstables together > which overlap. > When reading, we will not have the same one sstable per level (except L0) > guarantee, but this can be bounded (once we have too many sets of sstables, > either compact them back into the same level, or compact them up to the next > level). > This could be generalized to allow any level to be the maximum for this > overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9131) Defining correct behavior during leap second insertion
[ https://issues.apache.org/jira/browse/CASSANDRA-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500011#comment-14500011 ] Benedict commented on CASSANDRA-9131: - The problem with server-side timestamps is unsafe retry by clients in the event of a failure. CASSANDRA-6106 is a reference point, in that this provided both a wrapper around microsecond resolution as well as a staggered application of shifts in system clock time (also ensuring it never went backwards). > Defining correct behavior during leap second insertion > -- > > Key: CASSANDRA-9131 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9131 > Project: Cassandra > Issue Type: Bug > Environment: Linux ip-172-31-0-5 3.2.0-57-virtual #87-Ubuntu SMP Tue > Nov 12 21:53:49 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Jim Witschey >Assignee: Jim Witschey > > On Linux platforms, the insertion of a leap second breaks the monotonicity of > timestamps. This can make values appear to have been inserted into Cassandra > in a different order than they were. I want to know what behavior is expected > and desirable for inserts over this discontinuity. > From a timestamp perspective, an inserted leap second looks like a repeat of > the previous second: > {code} > $ while true ; do echo "`date +%s%N` `date -u`" ; sleep .5 ; done > 1435708798171327029 Tue Jun 30 23:59:58 UTC 2015 > 1435708798679392477 Tue Jun 30 23:59:58 UTC 2015 > 1435708799187550335 Tue Jun 30 23:59:59 UTC 2015 > 1435708799695670453 Tue Jun 30 23:59:59 UTC 2015 > 1435708799203902068 Tue Jun 30 23:59:59 UTC 2015 > 1435708799712168566 Tue Jun 30 23:59:59 UTC 2015 > 1435708800220473932 Wed Jul 1 00:00:00 UTC 2015 > 1435708800728908190 Wed Jul 1 00:00:00 UTC 2015 > 1435708801237611983 Wed Jul 1 00:00:01 UTC 2015 > 1435708801746251996 Wed Jul 1 00:00:01 UTC 2015 > {code} > Note that 23:59:59 repeats itself, and that the timestamps increase during > the first time through, then step back down to the beginning of the second > and increase again. > As a result, the timestamps on values inserted during these seconds will be > out of order. I set up a 4-node cluster running under Ubuntu 12.04.3 and > synced them to shortly before the leap second would be inserted. During the > insertion of the leap second, I ran a test with logic something like: > {code} > simple_insert = session.prepare( > 'INSERT INTO test (foo, bar) VALUES (?, ?);') > for i in itertools.count(): > # stop after midnight > now = datetime.utcnow() > last_midnight = now.replace(hour=0, minute=0, > second=0, microsecond=0) > seconds_since_midnight = (now - last_midnight).total_seconds() > if 5 <= seconds_since_midnight <= 15: > break > session.execute(simple_insert, [i, i]) > result = session.execute("SELECT bar, WRITETIME(bar) FROM test;") > {code} > EDIT: This behavior occurs with server-generated timestamps; in this > particular test, I set {{use_client_timestamp}} to {{False}}. > Under normal circumstances, the values and writetimes would increase > together, but when inserted over the leap second, they don't. These {{value, > writetime}} pairs are sorted by writetime: > {code} > (582, 1435708799285000) > (579, 1435708799339000) > (583, 1435708799593000) > (580, 1435708799643000) > (584, 1435708799897000) > (581, 1435708799958000) > {code} > The values were inserted in increasing order, but their writetimes are in a > different order because of the repeated second. During the first instance of > 23:59:59, the values 579, 580, and 581 were inserted at the beginning, > middle, and end of the second. During the leap second, which is also > 23:59:59, 582, 583, and 584 were inserted, also at the beginning, middle, and > end of the second. However, since the two seconds are the same second, they > appear interleaved with respect to timestamps, as shown above. > So, should I consider this behavior correct? If not, how should Cassandra > correctly handle the discontinuity introduced by the insertion of a leap > second? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8967) Allow RolesCache to be invalidated
[ https://issues.apache.org/jira/browse/CASSANDRA-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-8967: Attachment: 8967.txt Patch to add JMX methods for this to RolesCache. > Allow RolesCache to be invalidated > -- > > Key: CASSANDRA-8967 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8967 > Project: Cassandra > Issue Type: New Feature >Reporter: Brandon Williams >Assignee: Brandon Williams > Fix For: 3.0 > > Attachments: 8967.txt > > > Much like CASSANDRA-8722, we should add this to RolesCache as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9131) Defining correct behavior during leap second insertion
[ https://issues.apache.org/jira/browse/CASSANDRA-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1453#comment-1453 ] Robert Stupp commented on CASSANDRA-9131: - Hm. I'm not convinced of client side timestamps. IMO maintaining timestamps on the clients (either these are ”fat clients” or other servers) just moves the problem to an area that might not be sensitive for correct system wall clock (e.g. NTP). I've seen operations handling Win and Linux environments completely separate - but both worlds with a constant time drift of several minutes (not funny). I'm not completely against client provided timestamps - but would prefer to make that an optional feature (i.e. move {{TIMESTAMP xx}} to the protocol). TL;DR just want to throw in an idea: We could encapsulate {{System.currentTimeMillis()}} - if we detect that the clock went backwards, we slow down ”our system clock” and vice versa if the clock moves forwards. > Defining correct behavior during leap second insertion > -- > > Key: CASSANDRA-9131 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9131 > Project: Cassandra > Issue Type: Bug > Environment: Linux ip-172-31-0-5 3.2.0-57-virtual #87-Ubuntu SMP Tue > Nov 12 21:53:49 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Jim Witschey >Assignee: Jim Witschey > > On Linux platforms, the insertion of a leap second breaks the monotonicity of > timestamps. This can make values appear to have been inserted into Cassandra > in a different order than they were. I want to know what behavior is expected > and desirable for inserts over this discontinuity. > From a timestamp perspective, an inserted leap second looks like a repeat of > the previous second: > {code} > $ while true ; do echo "`date +%s%N` `date -u`" ; sleep .5 ; done > 1435708798171327029 Tue Jun 30 23:59:58 UTC 2015 > 1435708798679392477 Tue Jun 30 23:59:58 UTC 2015 > 1435708799187550335 Tue Jun 30 23:59:59 UTC 2015 > 1435708799695670453 Tue Jun 30 23:59:59 UTC 2015 > 1435708799203902068 Tue Jun 30 23:59:59 UTC 2015 > 1435708799712168566 Tue Jun 30 23:59:59 UTC 2015 > 1435708800220473932 Wed Jul 1 00:00:00 UTC 2015 > 1435708800728908190 Wed Jul 1 00:00:00 UTC 2015 > 1435708801237611983 Wed Jul 1 00:00:01 UTC 2015 > 1435708801746251996 Wed Jul 1 00:00:01 UTC 2015 > {code} > Note that 23:59:59 repeats itself, and that the timestamps increase during > the first time through, then step back down to the beginning of the second > and increase again. > As a result, the timestamps on values inserted during these seconds will be > out of order. I set up a 4-node cluster running under Ubuntu 12.04.3 and > synced them to shortly before the leap second would be inserted. During the > insertion of the leap second, I ran a test with logic something like: > {code} > simple_insert = session.prepare( > 'INSERT INTO test (foo, bar) VALUES (?, ?);') > for i in itertools.count(): > # stop after midnight > now = datetime.utcnow() > last_midnight = now.replace(hour=0, minute=0, > second=0, microsecond=0) > seconds_since_midnight = (now - last_midnight).total_seconds() > if 5 <= seconds_since_midnight <= 15: > break > session.execute(simple_insert, [i, i]) > result = session.execute("SELECT bar, WRITETIME(bar) FROM test;") > {code} > EDIT: This behavior occurs with server-generated timestamps; in this > particular test, I set {{use_client_timestamp}} to {{False}}. > Under normal circumstances, the values and writetimes would increase > together, but when inserted over the leap second, they don't. These {{value, > writetime}} pairs are sorted by writetime: > {code} > (582, 1435708799285000) > (579, 1435708799339000) > (583, 1435708799593000) > (580, 1435708799643000) > (584, 1435708799897000) > (581, 1435708799958000) > {code} > The values were inserted in increasing order, but their writetimes are in a > different order because of the repeated second. During the first instance of > 23:59:59, the values 579, 580, and 581 were inserted at the beginning, > middle, and end of the second. During the leap second, which is also > 23:59:59, 582, 583, and 584 were inserted, also at the beginning, middle, and > end of the second. However, since the two seconds are the same second, they > appear interleaved with respect to timestamps, as shown above. > So, should I consider this behavior correct? If not, how should Cassandra > correctly handle the discontinuity introduced by the insertion of a leap > second? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8766) SSTableRewriter opens all sstables as early before completing the compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-8766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1454#comment-1454 ] Benedict commented on CASSANDRA-8766: - I think this might be superceded by CASSANDRA-7066. So we should perhaps defer looking at this until that's in. > SSTableRewriter opens all sstables as early before completing the compaction > > > Key: CASSANDRA-8766 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8766 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Benedict >Assignee: Joshua McKenzie >Priority: Minor > Fix For: 2.1.5 > > > In CASSANDRA-8320, we made the rewriter call switchWriter() inside of > finish(); in CASSANDRA-8124 was made switchWriter() open its data as EARLY. > This combination means we no longer honour disabling of early opening, which > is potentially a problem on windows for the deletion of the contents (which > is why we disable early opening on Windows). > I've commented on CASSANDRA-8124, as I suspect I'm missing something about > this. Although I have no doubt the old behaviour of opening as TMP file > reduced the window for problems, and opening as TMPLINK now does the same, > it's not entirely clear to me its the right fix (though it may be) since we > shouldn't be susceptible to this window anyway? Either way, we perhaps need > to come up with something else, because this could potentially break windows > support. Perhaps if we simply did not swap in the TMPLINK file so that it > never actually get mapped, it would perhaps be enough. [~JoshuaMcKenzie], > WDYT? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8766) SSTableRewriter opens all sstables as early before completing the compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-8766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-8766: --- Assignee: Joshua McKenzie > SSTableRewriter opens all sstables as early before completing the compaction > > > Key: CASSANDRA-8766 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8766 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Benedict >Assignee: Joshua McKenzie >Priority: Minor > Fix For: 2.1.5 > > > In CASSANDRA-8320, we made the rewriter call switchWriter() inside of > finish(); in CASSANDRA-8124 was made switchWriter() open its data as EARLY. > This combination means we no longer honour disabling of early opening, which > is potentially a problem on windows for the deletion of the contents (which > is why we disable early opening on Windows). > I've commented on CASSANDRA-8124, as I suspect I'm missing something about > this. Although I have no doubt the old behaviour of opening as TMP file > reduced the window for problems, and opening as TMPLINK now does the same, > it's not entirely clear to me its the right fix (though it may be) since we > shouldn't be susceptible to this window anyway? Either way, we perhaps need > to come up with something else, because this could potentially break windows > support. Perhaps if we simply did not swap in the TMPLINK file so that it > never actually get mapped, it would perhaps be enough. [~JoshuaMcKenzie], > WDYT? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9204) AssertionError in CompactionExecutor thread
[ https://issues.apache.org/jira/browse/CASSANDRA-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499968#comment-14499968 ] Marcus Eriksson commented on CASSANDRA-9204: [~benedict] could you have a look? Guessing this will be fixed by CASSANDRA-8568 > AssertionError in CompactionExecutor thread > --- > > Key: CASSANDRA-9204 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9204 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Philip Thompson >Assignee: Marcus Eriksson > Fix For: 3.0 > > Attachments: node1.log > > > While running the dtest > {{upgrade_through_versions_test.py:TestRandomPartitionerUpgrade.upgrade_test}}, > the test is failing due to a large number of exceptions in the logs related > to compaction. Here is a snippet of one. The full log is attached. These > exceptions occurred after upgrading to trunk. The cluster had already > upgraded 1.2 -> 2.0 -> 2.1 successfully. > {code} > ERROR [CompactionExecutor:2] 2015-04-16 12:05:11,747 Cassandra > Daemon.java: Exception in thread Thread[CompactionExecutor:2,1 > ,main] > java.lang.AssertionError: null > at org.apache.cassandra.io.sstable.format.SSTableReade > r.setReplacedBy(SSTableReader.java:905) ~[main/:na] > at org.apache.cassandra.io.sstable.SSTableRewriter.fin > ishAndMaybeThrow(SSTableRewriter.java:461) ~[main/:na] > at org.apache.cassandra.io.sstable.SSTableRewriter.fin > ish(SSTableRewriter.java:418) ~[main/:na] > at org.apache.cassandra.io.sstable.SSTableRewriter.fin > ish(SSTableRewriter.java:398) ~[main/:na] > at org.apache.cassandra.db.compaction.writers.DefaultC > ompactionWriter.finish(DefaultCompactionWriter.java:77) ~[main > /:na] > at org.apache.cassandra.db.compaction.CompactionTask.r > unMayThrow(CompactionTask.java:202) ~[main/:na] > at org.apache.cassandra.utils.WrappedRunnable.run(Wrap > pedRunnable.java:28) ~[main/:na] > at org.apache.cassandra.db.compaction.CompactionTask.e > xecuteInternal(CompactionTask.java:73) ~[main/:na] > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:58) > ~[main/:na] > at > org.apache.cassandra.db.compaction.CompactionManager$5.execute(CompactionManager.java:371) > ~[main/:na] > at > org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:280) > ~[main/:na] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_75] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_75] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_75] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9131) Defining correct behavior during leap second insertion
[ https://issues.apache.org/jira/browse/CASSANDRA-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499963#comment-14499963 ] Benedict commented on CASSANDRA-9131: - I can't recall entirely, but I don't think that was quite the way the conversation resolved. It's not exactly "expected" behaviour on either side, but obviously there's nothing we can do about clients that are subjected to this bug, and we intend to deprecate server-side timestamps (if perhaps never eliminate them entirely). So the question is # if we should fix server side timestamps by making the clock universally monotonically increasing (as opposed to only per-client connection) which would at least somewhat mitigate this problem (but not eliminate it entirely, for any sequence of inserts hitting multiple servers) # if we should update the client protocol spec/docs to make clear that this problem exists, and that clients are expected to work around it ## if we should offer some sample code, and work with the Java Driver team to ensure this problem doesn't affect it > Defining correct behavior during leap second insertion > -- > > Key: CASSANDRA-9131 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9131 > Project: Cassandra > Issue Type: Bug > Environment: Linux ip-172-31-0-5 3.2.0-57-virtual #87-Ubuntu SMP Tue > Nov 12 21:53:49 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Jim Witschey >Assignee: Jim Witschey > > On Linux platforms, the insertion of a leap second breaks the monotonicity of > timestamps. This can make values appear to have been inserted into Cassandra > in a different order than they were. I want to know what behavior is expected > and desirable for inserts over this discontinuity. > From a timestamp perspective, an inserted leap second looks like a repeat of > the previous second: > {code} > $ while true ; do echo "`date +%s%N` `date -u`" ; sleep .5 ; done > 1435708798171327029 Tue Jun 30 23:59:58 UTC 2015 > 1435708798679392477 Tue Jun 30 23:59:58 UTC 2015 > 1435708799187550335 Tue Jun 30 23:59:59 UTC 2015 > 1435708799695670453 Tue Jun 30 23:59:59 UTC 2015 > 1435708799203902068 Tue Jun 30 23:59:59 UTC 2015 > 1435708799712168566 Tue Jun 30 23:59:59 UTC 2015 > 1435708800220473932 Wed Jul 1 00:00:00 UTC 2015 > 1435708800728908190 Wed Jul 1 00:00:00 UTC 2015 > 1435708801237611983 Wed Jul 1 00:00:01 UTC 2015 > 1435708801746251996 Wed Jul 1 00:00:01 UTC 2015 > {code} > Note that 23:59:59 repeats itself, and that the timestamps increase during > the first time through, then step back down to the beginning of the second > and increase again. > As a result, the timestamps on values inserted during these seconds will be > out of order. I set up a 4-node cluster running under Ubuntu 12.04.3 and > synced them to shortly before the leap second would be inserted. During the > insertion of the leap second, I ran a test with logic something like: > {code} > simple_insert = session.prepare( > 'INSERT INTO test (foo, bar) VALUES (?, ?);') > for i in itertools.count(): > # stop after midnight > now = datetime.utcnow() > last_midnight = now.replace(hour=0, minute=0, > second=0, microsecond=0) > seconds_since_midnight = (now - last_midnight).total_seconds() > if 5 <= seconds_since_midnight <= 15: > break > session.execute(simple_insert, [i, i]) > result = session.execute("SELECT bar, WRITETIME(bar) FROM test;") > {code} > EDIT: This behavior occurs with server-generated timestamps; in this > particular test, I set {{use_client_timestamp}} to {{False}}. > Under normal circumstances, the values and writetimes would increase > together, but when inserted over the leap second, they don't. These {{value, > writetime}} pairs are sorted by writetime: > {code} > (582, 1435708799285000) > (579, 1435708799339000) > (583, 1435708799593000) > (580, 1435708799643000) > (584, 1435708799897000) > (581, 1435708799958000) > {code} > The values were inserted in increasing order, but their writetimes are in a > different order because of the repeated second. During the first instance of > 23:59:59, the values 579, 580, and 581 were inserted at the beginning, > middle, and end of the second. During the leap second, which is also > 23:59:59, 582, 583, and 584 were inserted, also at the beginning, middle, and > end of the second. However, since the two seconds are the same second, they > appear interleaved with respect to timestamps, as shown above. > So, should I consider this behavior correct? If not, how should Cassandra > correctly handle the discontinuity introduced by the insertion of a leap > second? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8723) Cassandra 2.1.2 Memory issue - java process memory usage continuously increases until process is killed by OOM killer
[ https://issues.apache.org/jira/browse/CASSANDRA-8723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499960#comment-14499960 ] Ariel Weisberg commented on CASSANDRA-8723: --- [~jeffl] If this is something that you are reproducing regularly in a test environment it would help to get a heap dump some time before the process dies. Maybe run a script in the background that checks whether CassandraDaemon is running and dumps the heap every few minutes. We can look at what native allocations exist via the heap dump since we wrap them all with POJOs. > Cassandra 2.1.2 Memory issue - java process memory usage continuously > increases until process is killed by OOM killer > - > > Key: CASSANDRA-8723 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8723 > Project: Cassandra > Issue Type: Bug >Reporter: Jeff Liu > Fix For: 2.1.5 > > Attachments: cassandra.yaml > > > Issue: > We have an on-going issue with cassandra nodes running with continuously > increasing memory until killed by OOM. > {noformat} > Jan 29 10:15:41 cass-chisel19 kernel: [24533109.783481] Out of memory: Kill > process 13919 (java) score 911 or sacrifice child > Jan 29 10:15:41 cass-chisel19 kernel: [24533109.783557] Killed process 13919 > (java) total-vm:18366340kB, anon-rss:6461472kB, file-rss:6684kB > {noformat} > System Profile: > cassandra version 2.1.2 > system: aws c1.xlarge instance with 8 cores, 7.1G memory. > cassandra jvm: > -Xms1792M -Xmx1792M -Xmn400M -Xss256k > {noformat} > java -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.8.jar > -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1792M -Xmx1792M > -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled > -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 > -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly > -XX:+UseTLAB -XX:+CMSClassUnloadingEnabled -XX:+UseCondCardMark > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime > -XX:+PrintPromotionFailure -Xloggc:/var/log/cassandra/gc-1421511249.log > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=48M > -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 > -Dcom.sun.management.jmxremote.ssl=false > -Dcom.sun.management.jmxremote.authenticate=false > -javaagent:/usr/share/java/graphite-reporter-agent-1.0-SNAPSHOT.jar=graphiteServer=metrics-a.hq.nest.com;graphitePort=2003;graphitePollInt=60 > -Dlogback.configurationFile=logback.xml > -Dcassandra.logdir=/var/log/cassandra -Dcassandra.storagedir= > -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid -cp > /etc/cassandra:/usr/share/cassandra/lib/airline-0.6.jar:/usr/share/cassandra/lib/antlr-runtime-3.5.2.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/commons-math3-3.2.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-16.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.0.6.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.8.jar:/usr/share/cassandra/lib/javax.inject.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna-4.0.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/logback-classic-1.1.2.jar:/usr/share/cassandra/lib/logback-core-1.1.2.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/metrics-graphite-2.2.0.jar:/usr/share/cassandra/lib/mx4j-tools.jar:/usr/share/cassandra/lib/netty-all-4.0.23.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.2.jar:/usr/share/cassandra/lib/stream-2.5.2.jar:/usr/share/cassandra/lib/stringtemplate-4.0.2.jar:/usr/share/cassandra/lib/super-csv-2.1.0.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/apache-cassandra-2.1.2.jar:/usr/share/cassandra/apache-cassandra-thrift-2.1.2.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/cassandra-driver-core-2.0.5.jar:/usr/share/cassandra/netty-3.9.0.Final.jar:/usr/share/cassandra/stress.jar: > -XX:HeapDumpPath=/var/lib/cassandra/java_1421511
cassandra git commit: Re-add cold_reads_to_omit param for backwards compatibility
Repository: cassandra Updated Branches: refs/heads/cassandra-2.1 df014036b -> 5d88ff4e4 Re-add cold_reads_to_omit param for backwards compatibility Patch by Tommy Stendahl; reviewed by marcuse for CASSANDRA-9203 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/5d88ff4e Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/5d88ff4e Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/5d88ff4e Branch: refs/heads/cassandra-2.1 Commit: 5d88ff4e41210f95d0e3e53ded779765b0136c2a Parents: df01403 Author: Tommy Stendahl Authored: Fri Apr 17 16:47:33 2015 +0200 Committer: Marcus Eriksson Committed: Fri Apr 17 16:47:33 2015 +0200 -- CHANGES.txt | 1 + .../db/compaction/SizeTieredCompactionStrategyOptions.java| 3 +++ 2 files changed, 4 insertions(+) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/5d88ff4e/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 74ec921..80ab11c 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.1.5 + * Re-add deprecated cold_reads_to_omit param for backwards compat (CASSANDRA-9203) * Make anticompaction visible in compactionstats (CASSANDRA-9098) * Improve nodetool getendpoints documentation about the partition key parameter (CASSANDRA-6458) http://git-wip-us.apache.org/repos/asf/cassandra/blob/5d88ff4e/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyOptions.java -- diff --git a/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyOptions.java b/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyOptions.java index 911bb9f..9a840e1 100644 --- a/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyOptions.java +++ b/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyOptions.java @@ -29,6 +29,8 @@ public final class SizeTieredCompactionStrategyOptions protected static final String MIN_SSTABLE_SIZE_KEY = "min_sstable_size"; protected static final String BUCKET_LOW_KEY = "bucket_low"; protected static final String BUCKET_HIGH_KEY = "bucket_high"; +@Deprecated +protected static final String COLD_READS_TO_OMIT_KEY = "cold_reads_to_omit"; protected long minSSTableSize; protected double bucketLow; @@ -91,6 +93,7 @@ public final class SizeTieredCompactionStrategyOptions uncheckedOptions.remove(MIN_SSTABLE_SIZE_KEY); uncheckedOptions.remove(BUCKET_LOW_KEY); uncheckedOptions.remove(BUCKET_HIGH_KEY); +uncheckedOptions.remove(COLD_READS_TO_OMIT_KEY); return uncheckedOptions; }
[2/2] cassandra git commit: Merge branch 'cassandra-2.1' into trunk
Merge branch 'cassandra-2.1' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/0f72f79d Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/0f72f79d Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/0f72f79d Branch: refs/heads/trunk Commit: 0f72f79d5f9ed54cb9b9e33d371f4f13eae21dca Parents: 4adf29d 5d88ff4 Author: Marcus Eriksson Authored: Fri Apr 17 16:49:07 2015 +0200 Committer: Marcus Eriksson Committed: Fri Apr 17 16:49:07 2015 +0200 -- --
[1/2] cassandra git commit: Re-add cold_reads_to_omit param for backwards compatibility
Repository: cassandra Updated Branches: refs/heads/trunk 4adf29d4e -> 0f72f79d5 Re-add cold_reads_to_omit param for backwards compatibility Patch by Tommy Stendahl; reviewed by marcuse for CASSANDRA-9203 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/5d88ff4e Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/5d88ff4e Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/5d88ff4e Branch: refs/heads/trunk Commit: 5d88ff4e41210f95d0e3e53ded779765b0136c2a Parents: df01403 Author: Tommy Stendahl Authored: Fri Apr 17 16:47:33 2015 +0200 Committer: Marcus Eriksson Committed: Fri Apr 17 16:47:33 2015 +0200 -- CHANGES.txt | 1 + .../db/compaction/SizeTieredCompactionStrategyOptions.java| 3 +++ 2 files changed, 4 insertions(+) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/5d88ff4e/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 74ec921..80ab11c 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.1.5 + * Re-add deprecated cold_reads_to_omit param for backwards compat (CASSANDRA-9203) * Make anticompaction visible in compactionstats (CASSANDRA-9098) * Improve nodetool getendpoints documentation about the partition key parameter (CASSANDRA-6458) http://git-wip-us.apache.org/repos/asf/cassandra/blob/5d88ff4e/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyOptions.java -- diff --git a/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyOptions.java b/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyOptions.java index 911bb9f..9a840e1 100644 --- a/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyOptions.java +++ b/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyOptions.java @@ -29,6 +29,8 @@ public final class SizeTieredCompactionStrategyOptions protected static final String MIN_SSTABLE_SIZE_KEY = "min_sstable_size"; protected static final String BUCKET_LOW_KEY = "bucket_low"; protected static final String BUCKET_HIGH_KEY = "bucket_high"; +@Deprecated +protected static final String COLD_READS_TO_OMIT_KEY = "cold_reads_to_omit"; protected long minSSTableSize; protected double bucketLow; @@ -91,6 +93,7 @@ public final class SizeTieredCompactionStrategyOptions uncheckedOptions.remove(MIN_SSTABLE_SIZE_KEY); uncheckedOptions.remove(BUCKET_LOW_KEY); uncheckedOptions.remove(BUCKET_HIGH_KEY); +uncheckedOptions.remove(COLD_READS_TO_OMIT_KEY); return uncheckedOptions; }
[jira] [Commented] (CASSANDRA-8718) nodetool cleanup causes segfault
[ https://issues.apache.org/jira/browse/CASSANDRA-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499932#comment-14499932 ] Philip Thompson commented on CASSANDRA-8718: We've tested cleanup on both of those JDK's, and not have encountered a similar issue. [~JoshuaMcKenzie], does the attached log give you any info that may help us reproduce the issue? > nodetool cleanup causes segfault > > > Key: CASSANDRA-8718 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8718 > Project: Cassandra > Issue Type: Bug >Reporter: Maxim Ivanov >Priority: Minor > Fix For: 2.0.15 > > Attachments: java_hs_err.log > > > When doing cleanup on C* 2.0.12 following error crashes the java process: > {code} > INFO 17:59:02,800 Cleaning up > SSTableReader(path='/data/sdd/cassandra_prod/vdna/analytics/vdna-analytics-jb-21670-Data.db') > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f750890268e, pid=28039, tid=140130222446336 > # > # JRE version: Java(TM) SE Runtime Environment (7.0_71-b14) (build > 1.7.0_71-b14) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.71-b01 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # J 2655 C2 > org.apache.cassandra.io.sstable.IndexSummary.binarySearch(Lorg/apache/cassandra/db/RowPosition;)I > (88 bytes) @ 0x7f750890268e [0x7f7508902580+0x10e] > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /var/lib/cassandra_prod/hs_err_pid28039.log > Compiled method (c2) 913167265 4849 > org.apache.cassandra.dht.Token::maxKeyBound (24 bytes) > total in heap [0x7f7508572450,0x7f7508573318] = 3784 > relocation [0x7f7508572570,0x7f7508572618] = 168 > main code [0x7f7508572620,0x7f7508572cc0] = 1696 > stub code [0x7f7508572cc0,0x7f7508572cf8] = 56 > oops [0x7f7508572cf8,0x7f7508572d90] = 152 > scopes data[0x7f7508572d90,0x7f7508573118] = 904 > scopes pcs [0x7f7508573118,0x7f7508573268] = 336 > dependencies [0x7f7508573268,0x7f7508573280] = 24 > handler table [0x7f7508573280,0x7f75085732e0] = 96 > nul chk table [0x7f75085732e0,0x7f7508573318] = 56 > # > # If you would like to submit a bug report, please visit: > # http://bugreport.sun.com/bugreport/crash.jsp > # > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8795) Cassandra (possibly under load) occasionally throws an exception during CQL create table
[ https://issues.apache.org/jira/browse/CASSANDRA-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499928#comment-14499928 ] Benedict commented on CASSANDRA-8795: - [~philipthompson] good question. [~iamaleksey]? You're the closest to a "domain expert" on schema stuff that I can think of. Do these things fall under the purview of "CQL" [~slebresne]? [~driftx] maybe? This stuff all predates me, so I'm just throwing up the bat signal to the old timers really. > Cassandra (possibly under load) occasionally throws an exception during CQL > create table > > > Key: CASSANDRA-8795 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8795 > Project: Cassandra > Issue Type: Bug >Reporter: Darren Warner > Fix For: 2.1.5 > > > CQLSH will return the following: > {code} > { name: 'ResponseError', > message: 'java.lang.RuntimeException: > java.util.concurrent.ExecutionException: java.lang.NullPointerException', > info: 'Represents an error message from the server', > code: 0, > query: 'CREATE TABLE IF NOT EXISTS roles_by_users( userid TIMEUUID, role > INT, entityid TIMEUUID, entity_type TEXT, enabled BOOLEAN, PRIMARY KEY > (userid, role, entityid, entity_type) );' } > {code} > Cassandra system.log shows: > {code} > ERROR [MigrationStage:1] 2015-02-11 14:38:48,610 CassandraDaemon.java:153 - > Exception in thread Thread[MigrationStage:1,5,main] > java.lang.NullPointerException: null > at > org.apache.cassandra.db.DefsTables.addColumnFamily(DefsTables.java:371) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.db.DefsTables.mergeColumnFamilies(DefsTables.java:293) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.db.DefsTables.mergeSchemaInternal(DefsTables.java:194) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.db.DefsTables.mergeSchema(DefsTables.java:166) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.service.MigrationManager$2.runMayThrow(MigrationManager.java:393) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_31] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_31] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_31] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_31] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_31] > ERROR [SharedPool-Worker-2] 2015-02-11 14:38:48,620 QueryMessage.java:132 - > Unexpected error during query > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.NullPointerException > at > org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:398) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.service.MigrationManager.announce(MigrationManager.java:374) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.service.MigrationManager.announceNewColumnFamily(MigrationManager.java:249) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.cql3.statements.CreateTableStatement.announceMigration(CreateTableStatement.java:113) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.cql3.statements.SchemaAlteringStatement.execute(SchemaAlteringStatement.java:80) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:226) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:248) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:119) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439) > [apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335) > [apache-cassandra-2.1.2.jar:2.1.2] > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > [netty-all-4.0.23.Final.jar:4.0.23.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) > [netty-all-4.0.23.Final.jar:4.0.23.Final] > at > io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) > [
[jira] [Commented] (CASSANDRA-8723) Cassandra 2.1.2 Memory issue - java process memory usage continuously increases until process is killed by OOM killer
[ https://issues.apache.org/jira/browse/CASSANDRA-8723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499929#comment-14499929 ] Philip Thompson commented on CASSANDRA-8723: [~jeffl], have you had the opportunity to try 2.1.3 yet? What was 2.1.4 will now be 2.1.5, which has a tentative release tag and should be out in approximately a week. If upgrading fixes your issue, please let us know. > Cassandra 2.1.2 Memory issue - java process memory usage continuously > increases until process is killed by OOM killer > - > > Key: CASSANDRA-8723 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8723 > Project: Cassandra > Issue Type: Bug >Reporter: Jeff Liu > Fix For: 2.1.5 > > Attachments: cassandra.yaml > > > Issue: > We have an on-going issue with cassandra nodes running with continuously > increasing memory until killed by OOM. > {noformat} > Jan 29 10:15:41 cass-chisel19 kernel: [24533109.783481] Out of memory: Kill > process 13919 (java) score 911 or sacrifice child > Jan 29 10:15:41 cass-chisel19 kernel: [24533109.783557] Killed process 13919 > (java) total-vm:18366340kB, anon-rss:6461472kB, file-rss:6684kB > {noformat} > System Profile: > cassandra version 2.1.2 > system: aws c1.xlarge instance with 8 cores, 7.1G memory. > cassandra jvm: > -Xms1792M -Xmx1792M -Xmn400M -Xss256k > {noformat} > java -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.8.jar > -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1792M -Xmx1792M > -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled > -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 > -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly > -XX:+UseTLAB -XX:+CMSClassUnloadingEnabled -XX:+UseCondCardMark > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime > -XX:+PrintPromotionFailure -Xloggc:/var/log/cassandra/gc-1421511249.log > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=48M > -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 > -Dcom.sun.management.jmxremote.ssl=false > -Dcom.sun.management.jmxremote.authenticate=false > -javaagent:/usr/share/java/graphite-reporter-agent-1.0-SNAPSHOT.jar=graphiteServer=metrics-a.hq.nest.com;graphitePort=2003;graphitePollInt=60 > -Dlogback.configurationFile=logback.xml > -Dcassandra.logdir=/var/log/cassandra -Dcassandra.storagedir= > -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid -cp > /etc/cassandra:/usr/share/cassandra/lib/airline-0.6.jar:/usr/share/cassandra/lib/antlr-runtime-3.5.2.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/commons-math3-3.2.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-16.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.0.6.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.8.jar:/usr/share/cassandra/lib/javax.inject.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna-4.0.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/logback-classic-1.1.2.jar:/usr/share/cassandra/lib/logback-core-1.1.2.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/metrics-graphite-2.2.0.jar:/usr/share/cassandra/lib/mx4j-tools.jar:/usr/share/cassandra/lib/netty-all-4.0.23.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.2.jar:/usr/share/cassandra/lib/stream-2.5.2.jar:/usr/share/cassandra/lib/stringtemplate-4.0.2.jar:/usr/share/cassandra/lib/super-csv-2.1.0.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/apache-cassandra-2.1.2.jar:/usr/share/cassandra/apache-cassandra-thrift-2.1.2.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/cassandra-driver-core-2.0.5.jar:/usr/share/cassandra/netty-3.9.0.Final.jar:/usr/share/cassandra/stress.jar: > -XX:HeapDumpPath=/var/lib/cassandra/java_1421511248.hprof > -XX:ErrorFile=/var/lib/cassandra/hs_err_1421511248.log > org.apache.cassandra.service.CassandraDaemon > {noformat} -- This message was sent
[jira] [Updated] (CASSANDRA-8741) Running a drain before a decommission apparently the wrong thing to do
[ https://issues.apache.org/jira/browse/CASSANDRA-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-8741: --- Fix Version/s: 2.1.5 2.0.15 > Running a drain before a decommission apparently the wrong thing to do > -- > > Key: CASSANDRA-8741 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8741 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Ubuntu 14.04; Cassandra 2.0.11.82 (Datastax Enterprise > 4.5.3) >Reporter: Casey Marshall >Priority: Trivial > Labels: lhf > Fix For: 2.0.15, 2.1.5 > > > This might simply be a documentation issue. It appears that running "nodetool > drain" is a very wrong thing to do before running a "nodetool decommission". > The idea was that I was going to safely shut off writes and flush everything > to disk before beginning the decommission. What happens is the "decommission" > call appears to fail very early on after starting, and afterwards, the node > in question is stuck in state LEAVING, but all other nodes in the ring see > that node as NORMAL, but down. No streams are ever sent from the node being > decommissioned to other nodes. > The drain command does indeed shut down the "BatchlogTasks" executor > (org/apache/cassandra/service/StorageService.java, line 3445 in git tag > "cassandra-2.0.11") but the decommission process tries using that executor > when calling the "startBatchlogReplay" function > (org/apache/cassandra/db/BatchlogManager.java, line 123) called through > org.apache.cassandra.service.StorageService.unbootstrap (see the stack trace > pasted below). > This also failed in a similar way on Cassandra 1.2.13-ish (DSE 3.2.4). > So, either something is wrong with the drain/decommission commands, or it's > very wrong to run a drain before a decommission. What's worse, there seems to > be no way to recover this node once it is in this state; you need to shut it > down and run "removenode". > My terminal output: > {code} > ubuntu@x:~$ nodetool drain > ubuntu@x:~$ tail /var/log/^C > ubuntu@x:~$ nodetool decommission > Exception in thread "main" java.util.concurrent.RejectedExecutionException: > Task > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@3008fa33 > rejected from > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@1d6242e8[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 52] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) > at > java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:325) > at > java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530) > at > java.util.concurrent.ScheduledThreadPoolExecutor.submit(ScheduledThreadPoolExecutor.java:629) > at > org.apache.cassandra.db.BatchlogManager.startBatchlogReplay(BatchlogManager.java:123) > at > org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2966) > at > org.apache.cassandra.service.StorageService.decommission(StorageService.java:2934) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) > at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) > at > com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) > at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) > at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) > at > javax.management.remote.rmi.RMIConnectionImpl.doO
[jira] [Updated] (CASSANDRA-8741) Running a drain before a decommission apparently the wrong thing to do
[ https://issues.apache.org/jira/browse/CASSANDRA-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-8741: --- Description: This might simply be a documentation issue. It appears that running "nodetool drain" is a very wrong thing to do before running a "nodetool decommission". The idea was that I was going to safely shut off writes and flush everything to disk before beginning the decommission. What happens is the "decommission" call appears to fail very early on after starting, and afterwards, the node in question is stuck in state LEAVING, but all other nodes in the ring see that node as NORMAL, but down. No streams are ever sent from the node being decommissioned to other nodes. The drain command does indeed shut down the "BatchlogTasks" executor (org/apache/cassandra/service/StorageService.java, line 3445 in git tag "cassandra-2.0.11") but the decommission process tries using that executor when calling the "startBatchlogReplay" function (org/apache/cassandra/db/BatchlogManager.java, line 123) called through org.apache.cassandra.service.StorageService.unbootstrap (see the stack trace pasted below). This also failed in a similar way on Cassandra 1.2.13-ish (DSE 3.2.4). So, either something is wrong with the drain/decommission commands, or it's very wrong to run a drain before a decommission. What's worse, there seems to be no way to recover this node once it is in this state; you need to shut it down and run "removenode". My terminal output: {code} ubuntu@x:~$ nodetool drain ubuntu@x:~$ tail /var/log/^C ubuntu@x:~$ nodetool decommission Exception in thread "main" java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@3008fa33 rejected from org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@1d6242e8[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 52] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:325) at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530) at java.util.concurrent.ScheduledThreadPoolExecutor.submit(ScheduledThreadPoolExecutor.java:629) at org.apache.cassandra.db.BatchlogManager.startBatchlogReplay(BatchlogManager.java:123) at org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2966) at org.apache.cassandra.service.StorageService.decommission(StorageService.java:2934) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java
[jira] [Commented] (CASSANDRA-8795) Cassandra (possibly under load) occasionally throws an exception during CQL create table
[ https://issues.apache.org/jira/browse/CASSANDRA-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499923#comment-14499923 ] Philip Thompson commented on CASSANDRA-8795: [~benedict], who should handle fixing the problem in MigrationManager? > Cassandra (possibly under load) occasionally throws an exception during CQL > create table > > > Key: CASSANDRA-8795 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8795 > Project: Cassandra > Issue Type: Bug >Reporter: Darren Warner > Fix For: 2.1.5 > > > CQLSH will return the following: > {code} > { name: 'ResponseError', > message: 'java.lang.RuntimeException: > java.util.concurrent.ExecutionException: java.lang.NullPointerException', > info: 'Represents an error message from the server', > code: 0, > query: 'CREATE TABLE IF NOT EXISTS roles_by_users( userid TIMEUUID, role > INT, entityid TIMEUUID, entity_type TEXT, enabled BOOLEAN, PRIMARY KEY > (userid, role, entityid, entity_type) );' } > {code} > Cassandra system.log shows: > {code} > ERROR [MigrationStage:1] 2015-02-11 14:38:48,610 CassandraDaemon.java:153 - > Exception in thread Thread[MigrationStage:1,5,main] > java.lang.NullPointerException: null > at > org.apache.cassandra.db.DefsTables.addColumnFamily(DefsTables.java:371) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.db.DefsTables.mergeColumnFamilies(DefsTables.java:293) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.db.DefsTables.mergeSchemaInternal(DefsTables.java:194) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.db.DefsTables.mergeSchema(DefsTables.java:166) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.service.MigrationManager$2.runMayThrow(MigrationManager.java:393) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_31] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_31] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_31] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_31] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_31] > ERROR [SharedPool-Worker-2] 2015-02-11 14:38:48,620 QueryMessage.java:132 - > Unexpected error during query > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.NullPointerException > at > org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:398) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.service.MigrationManager.announce(MigrationManager.java:374) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.service.MigrationManager.announceNewColumnFamily(MigrationManager.java:249) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.cql3.statements.CreateTableStatement.announceMigration(CreateTableStatement.java:113) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.cql3.statements.SchemaAlteringStatement.execute(SchemaAlteringStatement.java:80) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:226) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:248) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:119) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439) > [apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335) > [apache-cassandra-2.1.2.jar:2.1.2] > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > [netty-all-4.0.23.Final.jar:4.0.23.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) > [netty-all-4.0.23.Final.jar:4.0.23.Final] > at > io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) > [netty-all-4.0.23.Final.jar:4.0.23.Final] > at > io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324) > [netty-all-4.0.23.Final.jar:4.0.23.Final] > at > java
[jira] [Commented] (CASSANDRA-8798) don't throw TombstoneOverwhelmingException during bootstrap
[ https://issues.apache.org/jira/browse/CASSANDRA-8798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499921#comment-14499921 ] Philip Thompson commented on CASSANDRA-8798: [~aweisberg], you want to look over Jeff's proposed patch? > don't throw TombstoneOverwhelmingException during bootstrap > --- > > Key: CASSANDRA-8798 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8798 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: mck >Assignee: Jeff Jirsa > Fix For: 2.0.15 > > Attachments: 8798.txt > > > During bootstrap honouring tombstone_failure_threshold seems > counter-productive as the node is not serving requests so not protecting > anything. > Instead what happens is bootstrap fails, and a cluster that obviously needs > an extra node isn't getting it... > **History** > When adding a new node bootstrap process looks complete in that streaming is > finished, compactions finished, and all disk and cpu activity is calm. > But the node is still stuck in "joining" status. > The last stage in the bootstrapping process is the rebuilding of secondary > indexes. grepping the logs confirmed it failed during this stage. > {code}grep SecondaryIndexManager cassandra/logs/*{code} > To see what secondary index rebuilding was initiated > {code} > grep "index build of " cassandra/logs/* | awk -F" for data in " '{print $1}' > INFO 13:18:11,252 Submitting index build of addresses.unobfuscatedIndex > INFO 13:18:11,352 Submitting index build of Inbox.FINNBOXID_INDEX > INFO 23:03:54,758 Submitting index build of [events.collected_tbIndex, > events.real_tbIndex] > {code} > To get an idea of successful secondary index rebuilding > {code}grep "Index build of "cassandra/logs/* > INFO 13:18:11,263 Index build of addresses.unobfuscatedIndex complete > INFO 13:18:11,355 Index build of Inbox.FINNBOXID_INDEX complete > {code} > Looking closer at {{[events.collected_tbIndex, events.real_tbIndex]}} showed > the following stacktrace > {code} > ERROR [StreamReceiveTask:121] 2015-02-12 05:54:47,768 CassandraDaemon.java > (line 199) Exception in thread Thread[StreamReceiveTask:121,5,main] > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: > org.apache.cassandra.db.filter.TombstoneOverwhelmingException > at > org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:413) > at > org.apache.cassandra.db.index.SecondaryIndexManager.maybeBuildSecondaryIndexes(SecondaryIndexManager.java:142) > at > org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:130) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: > org.apache.cassandra.db.filter.TombstoneOverwhelmingException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:188) > at > org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:409) > ... 7 more > Caused by: java.lang.RuntimeException: > org.apache.cassandra.db.filter.TombstoneOverwhelmingException > at > org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:160) > at > org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:143) > at org.apache.cassandra.db.Keyspace.indexRow(Keyspace.java:406) > at > org.apache.cassandra.db.index.SecondaryIndexBuilder.build(SecondaryIndexBuilder.java:62) > at > org.apache.cassandra.db.compaction.CompactionManager$9.run(CompactionManager.java:834) > ... 5 more > Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException > at > org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:202) > at > org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122) > at > org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80) > at > org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72) > at > org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297) > at > org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53) > at > org.
[jira] [Updated] (CASSANDRA-8798) don't throw TombstoneOverwhelmingException during bootstrap
[ https://issues.apache.org/jira/browse/CASSANDRA-8798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-8798: --- Assignee: Jeff Jirsa > don't throw TombstoneOverwhelmingException during bootstrap > --- > > Key: CASSANDRA-8798 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8798 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: mck >Assignee: Jeff Jirsa > Fix For: 2.0.15 > > Attachments: 8798.txt > > > During bootstrap honouring tombstone_failure_threshold seems > counter-productive as the node is not serving requests so not protecting > anything. > Instead what happens is bootstrap fails, and a cluster that obviously needs > an extra node isn't getting it... > **History** > When adding a new node bootstrap process looks complete in that streaming is > finished, compactions finished, and all disk and cpu activity is calm. > But the node is still stuck in "joining" status. > The last stage in the bootstrapping process is the rebuilding of secondary > indexes. grepping the logs confirmed it failed during this stage. > {code}grep SecondaryIndexManager cassandra/logs/*{code} > To see what secondary index rebuilding was initiated > {code} > grep "index build of " cassandra/logs/* | awk -F" for data in " '{print $1}' > INFO 13:18:11,252 Submitting index build of addresses.unobfuscatedIndex > INFO 13:18:11,352 Submitting index build of Inbox.FINNBOXID_INDEX > INFO 23:03:54,758 Submitting index build of [events.collected_tbIndex, > events.real_tbIndex] > {code} > To get an idea of successful secondary index rebuilding > {code}grep "Index build of "cassandra/logs/* > INFO 13:18:11,263 Index build of addresses.unobfuscatedIndex complete > INFO 13:18:11,355 Index build of Inbox.FINNBOXID_INDEX complete > {code} > Looking closer at {{[events.collected_tbIndex, events.real_tbIndex]}} showed > the following stacktrace > {code} > ERROR [StreamReceiveTask:121] 2015-02-12 05:54:47,768 CassandraDaemon.java > (line 199) Exception in thread Thread[StreamReceiveTask:121,5,main] > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: > org.apache.cassandra.db.filter.TombstoneOverwhelmingException > at > org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:413) > at > org.apache.cassandra.db.index.SecondaryIndexManager.maybeBuildSecondaryIndexes(SecondaryIndexManager.java:142) > at > org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:130) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: > org.apache.cassandra.db.filter.TombstoneOverwhelmingException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:188) > at > org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:409) > ... 7 more > Caused by: java.lang.RuntimeException: > org.apache.cassandra.db.filter.TombstoneOverwhelmingException > at > org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:160) > at > org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:143) > at org.apache.cassandra.db.Keyspace.indexRow(Keyspace.java:406) > at > org.apache.cassandra.db.index.SecondaryIndexBuilder.build(SecondaryIndexBuilder.java:62) > at > org.apache.cassandra.db.compaction.CompactionManager$9.run(CompactionManager.java:834) > ... 5 more > Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException > at > org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:202) > at > org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122) > at > org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80) > at > org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72) > at > org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297) > at > org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53) > at > org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1547) >
[jira] [Updated] (CASSANDRA-9131) Defining correct behavior during leap second insertion
[ https://issues.apache.org/jira/browse/CASSANDRA-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-9131: --- Assignee: Jim Witschey [~mambocab], should this be closed as "Not a Problem", based on irc discussion? > Defining correct behavior during leap second insertion > -- > > Key: CASSANDRA-9131 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9131 > Project: Cassandra > Issue Type: Bug > Environment: Linux ip-172-31-0-5 3.2.0-57-virtual #87-Ubuntu SMP Tue > Nov 12 21:53:49 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Jim Witschey >Assignee: Jim Witschey > > On Linux platforms, the insertion of a leap second breaks the monotonicity of > timestamps. This can make values appear to have been inserted into Cassandra > in a different order than they were. I want to know what behavior is expected > and desirable for inserts over this discontinuity. > From a timestamp perspective, an inserted leap second looks like a repeat of > the previous second: > {code} > $ while true ; do echo "`date +%s%N` `date -u`" ; sleep .5 ; done > 1435708798171327029 Tue Jun 30 23:59:58 UTC 2015 > 1435708798679392477 Tue Jun 30 23:59:58 UTC 2015 > 1435708799187550335 Tue Jun 30 23:59:59 UTC 2015 > 1435708799695670453 Tue Jun 30 23:59:59 UTC 2015 > 1435708799203902068 Tue Jun 30 23:59:59 UTC 2015 > 1435708799712168566 Tue Jun 30 23:59:59 UTC 2015 > 1435708800220473932 Wed Jul 1 00:00:00 UTC 2015 > 1435708800728908190 Wed Jul 1 00:00:00 UTC 2015 > 1435708801237611983 Wed Jul 1 00:00:01 UTC 2015 > 1435708801746251996 Wed Jul 1 00:00:01 UTC 2015 > {code} > Note that 23:59:59 repeats itself, and that the timestamps increase during > the first time through, then step back down to the beginning of the second > and increase again. > As a result, the timestamps on values inserted during these seconds will be > out of order. I set up a 4-node cluster running under Ubuntu 12.04.3 and > synced them to shortly before the leap second would be inserted. During the > insertion of the leap second, I ran a test with logic something like: > {code} > simple_insert = session.prepare( > 'INSERT INTO test (foo, bar) VALUES (?, ?);') > for i in itertools.count(): > # stop after midnight > now = datetime.utcnow() > last_midnight = now.replace(hour=0, minute=0, > second=0, microsecond=0) > seconds_since_midnight = (now - last_midnight).total_seconds() > if 5 <= seconds_since_midnight <= 15: > break > session.execute(simple_insert, [i, i]) > result = session.execute("SELECT bar, WRITETIME(bar) FROM test;") > {code} > EDIT: This behavior occurs with server-generated timestamps; in this > particular test, I set {{use_client_timestamp}} to {{False}}. > Under normal circumstances, the values and writetimes would increase > together, but when inserted over the leap second, they don't. These {{value, > writetime}} pairs are sorted by writetime: > {code} > (582, 1435708799285000) > (579, 1435708799339000) > (583, 1435708799593000) > (580, 1435708799643000) > (584, 1435708799897000) > (581, 1435708799958000) > {code} > The values were inserted in increasing order, but their writetimes are in a > different order because of the repeated second. During the first instance of > 23:59:59, the values 579, 580, and 581 were inserted at the beginning, > middle, and end of the second. During the leap second, which is also > 23:59:59, 582, 583, and 584 were inserted, also at the beginning, middle, and > end of the second. However, since the two seconds are the same second, they > appear interleaved with respect to timestamps, as shown above. > So, should I consider this behavior correct? If not, how should Cassandra > correctly handle the discontinuity introduced by the insertion of a leap > second? -- This message was sent by Atlassian JIRA (v6.3.4#6332)