[jira] [Commented] (CASSANDRA-13455) lose check of null strings in decoding client token
[ https://issues.apache.org/jira/browse/CASSANDRA-13455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975768#comment-15975768 ] Amos Jianjun Kong commented on CASSANDRA-13455: --- I agree with that empty passwords should be allowed for both PasswordAuthenticator and AllowAllAuthenticator. Checking the empty username in decodeCredentials() will found the problem early, however PasswordAuthenticator can do it by itself. So we can treat this issue as NOTABUG and ignore the patches. Thanks for your responses :-) > lose check of null strings in decoding client token > --- > > Key: CASSANDRA-13455 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13455 > Project: Cassandra > Issue Type: Bug > Environment: CentOS7.2 > Java 1.8 >Reporter: Amos Jianjun Kong >Assignee: Amos Jianjun Kong > Fix For: 3.10 > > Attachments: 0001-auth-check-both-null-points-and-null-strings.patch, > 0001-auth-strictly-delimit-in-decoding-client-token.patch > > > RFC4616 requests AuthZID, USERNAME, PASSWORD are delimited by single '\000'. > Current code actually delimits by serial '\000', when username or password > is null, it caused decoding derangement. > The problem was found in code review. > > update: above description is wrong, the problem is that : > When client responses null strings for username or password, > current decodeCredentials() can't identify it. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-12835) Tracing payload not passed from QueryMessage to tracing session
[ https://issues.apache.org/jira/browse/CASSANDRA-12835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975754#comment-15975754 ] mck commented on CASSANDRA-12835: - {quote}given a commented "+1" by the reviewer am I free to push (updating the commit msg to mark you as reviewer)?{quote} looking through the commit log it would appear the answer is yes :-) just waiting for the dtests to finish before pushing. > Tracing payload not passed from QueryMessage to tracing session > --- > > Key: CASSANDRA-12835 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12835 > Project: Cassandra > Issue Type: Bug >Reporter: Hannu Kröger >Assignee: mck >Priority: Critical > Labels: tracing > Fix For: 3.11.x, 4.x > > > Caused by CASSANDRA-10392. > Related to CASSANDRA-11706. > When querying using CQL statements (not prepared) the message type is > QueryMessage and the code in > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/QueryMessage.java#L101 > is as follows: > {code:java} > if (state.traceNextQuery()) > { > state.createTracingSession(); > ImmutableMap.Builderbuilder = > ImmutableMap.builder(); > {code} > {{state.createTracingSession();}} should probably be > {{state.createTracingSession(getCustomPayload());}}. At least that fixes the > problem for me. > This also raises the question whether some other parts of the code should > pass the custom payload as well (I'm not the right person to analyze this): > {code} > $ ag createTracingSession > src/java/org/apache/cassandra/service/QueryState.java > 80:public void createTracingSession() > 82:createTracingSession(Collections.EMPTY_MAP); > 85:public void createTracingSession(Map customPayload) > src/java/org/apache/cassandra/thrift/CassandraServer.java > 2528:state().getQueryState().createTracingSession(); > src/java/org/apache/cassandra/transport/messages/BatchMessage.java > 163:state.createTracingSession(); > src/java/org/apache/cassandra/transport/messages/ExecuteMessage.java > 114:state.createTracingSession(getCustomPayload()); > src/java/org/apache/cassandra/transport/messages/QueryMessage.java > 101:state.createTracingSession(); > src/java/org/apache/cassandra/transport/messages/PrepareMessage.java > 74:state.createTracingSession(); > {code} > This is not marked as `minor` as the CASSANDRA-11706 was because this cannot > be fixed by the tracing plugin. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-12835) Tracing payload not passed from QueryMessage to tracing session
[ https://issues.apache.org/jira/browse/CASSANDRA-12835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972551#comment-15972551 ] mck edited comment on CASSANDRA-12835 at 4/19/17 11:42 PM: --- [~tjake], patches are updated here: || Branch || Testall || Dtest || | [3.11|https://github.com/michaelsembwever/cassandra/commit/56770c6c6a0268b9b0a2f8927df41f61e02e38f6] | [testall|https://circleci.com/gh/michaelsembwever/cassandra/23] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/20/] | | [trunk|https://github.com/michaelsembwever/cassandra/commit/4ab20fdad52c6fe645e996598da225547cce973f] | [testall|https://circleci.com/gh/michaelsembwever/cassandra/24] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/19/] | (dtests are queued and will likely take some time to complete) was (Author: michaelsembwever): [~tjake], patches are updated here: || Branch || Testall || Dtest || | [3.11|https://github.com/michaelsembwever/cassandra/commit/56770c6c6a0268b9b0a2f8927df41f61e02e38f6] | [testall|https://circleci.com/gh/michaelsembwever/cassandra/16] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/20/] | | [trunk|https://github.com/michaelsembwever/cassandra/commit/4ab20fdad52c6fe645e996598da225547cce973f] | [testall|https://circleci.com/gh/michaelsembwever/cassandra/20] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/19/] | (dtests are queued and will likely take some time to complete) > Tracing payload not passed from QueryMessage to tracing session > --- > > Key: CASSANDRA-12835 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12835 > Project: Cassandra > Issue Type: Bug >Reporter: Hannu Kröger >Assignee: mck >Priority: Critical > Labels: tracing > Fix For: 3.11.x, 4.x > > > Caused by CASSANDRA-10392. > Related to CASSANDRA-11706. > When querying using CQL statements (not prepared) the message type is > QueryMessage and the code in > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/QueryMessage.java#L101 > is as follows: > {code:java} > if (state.traceNextQuery()) > { > state.createTracingSession(); > ImmutableMap.Builderbuilder = > ImmutableMap.builder(); > {code} > {{state.createTracingSession();}} should probably be > {{state.createTracingSession(getCustomPayload());}}. At least that fixes the > problem for me. > This also raises the question whether some other parts of the code should > pass the custom payload as well (I'm not the right person to analyze this): > {code} > $ ag createTracingSession > src/java/org/apache/cassandra/service/QueryState.java > 80:public void createTracingSession() > 82:createTracingSession(Collections.EMPTY_MAP); > 85:public void createTracingSession(Map customPayload) > src/java/org/apache/cassandra/thrift/CassandraServer.java > 2528:state().getQueryState().createTracingSession(); > src/java/org/apache/cassandra/transport/messages/BatchMessage.java > 163:state.createTracingSession(); > src/java/org/apache/cassandra/transport/messages/ExecuteMessage.java > 114:state.createTracingSession(getCustomPayload()); > src/java/org/apache/cassandra/transport/messages/QueryMessage.java > 101:state.createTracingSession(); > src/java/org/apache/cassandra/transport/messages/PrepareMessage.java > 74:state.createTracingSession(); > {code} > This is not marked as `minor` as the CASSANDRA-11706 was because this cannot > be fixed by the tracing plugin. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-12835) Tracing payload not passed from QueryMessage to tracing session
[ https://issues.apache.org/jira/browse/CASSANDRA-12835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972551#comment-15972551 ] mck edited comment on CASSANDRA-12835 at 4/19/17 11:40 PM: --- [~tjake], patches are updated here: || Branch || Testall || Dtest || | [3.11|https://github.com/michaelsembwever/cassandra/commit/56770c6c6a0268b9b0a2f8927df41f61e02e38f6] | [testall|https://circleci.com/gh/michaelsembwever/cassandra/16] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/20/] | | [trunk|https://github.com/michaelsembwever/cassandra/commit/4ab20fdad52c6fe645e996598da225547cce973f] | [testall|https://circleci.com/gh/michaelsembwever/cassandra/20] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/19/] | (dtests are queued and will likely take some time to complete) was (Author: michaelsembwever): [~tjake], patches are updated here: || Branch || Testall || Dtest || | [3.11|https://github.com/michaelsembwever/cassandra/commit/4105fc71c652794d3ae1fba475f01ebf00199a07] | [testall|https://circleci.com/gh/michaelsembwever/cassandra/16] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/20/] | | [trunk|https://github.com/michaelsembwever/cassandra/commit/c4de4f0dd0e70d7d67ade1e315ee3053494cf51c] | [testall|https://circleci.com/gh/michaelsembwever/cassandra/20] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/19/] | (dtests are queued and will likely take some time to complete) > Tracing payload not passed from QueryMessage to tracing session > --- > > Key: CASSANDRA-12835 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12835 > Project: Cassandra > Issue Type: Bug >Reporter: Hannu Kröger >Assignee: mck >Priority: Critical > Labels: tracing > Fix For: 3.11.x, 4.x > > > Caused by CASSANDRA-10392. > Related to CASSANDRA-11706. > When querying using CQL statements (not prepared) the message type is > QueryMessage and the code in > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/QueryMessage.java#L101 > is as follows: > {code:java} > if (state.traceNextQuery()) > { > state.createTracingSession(); > ImmutableMap.Builderbuilder = > ImmutableMap.builder(); > {code} > {{state.createTracingSession();}} should probably be > {{state.createTracingSession(getCustomPayload());}}. At least that fixes the > problem for me. > This also raises the question whether some other parts of the code should > pass the custom payload as well (I'm not the right person to analyze this): > {code} > $ ag createTracingSession > src/java/org/apache/cassandra/service/QueryState.java > 80:public void createTracingSession() > 82:createTracingSession(Collections.EMPTY_MAP); > 85:public void createTracingSession(Map customPayload) > src/java/org/apache/cassandra/thrift/CassandraServer.java > 2528:state().getQueryState().createTracingSession(); > src/java/org/apache/cassandra/transport/messages/BatchMessage.java > 163:state.createTracingSession(); > src/java/org/apache/cassandra/transport/messages/ExecuteMessage.java > 114:state.createTracingSession(getCustomPayload()); > src/java/org/apache/cassandra/transport/messages/QueryMessage.java > 101:state.createTracingSession(); > src/java/org/apache/cassandra/transport/messages/PrepareMessage.java > 74:state.createTracingSession(); > {code} > This is not marked as `minor` as the CASSANDRA-11706 was because this cannot > be fixed by the tracing plugin. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-12835) Tracing payload not passed from QueryMessage to tracing session
[ https://issues.apache.org/jira/browse/CASSANDRA-12835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mck updated CASSANDRA-12835: Status: Ready to Commit (was: Patch Available) > Tracing payload not passed from QueryMessage to tracing session > --- > > Key: CASSANDRA-12835 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12835 > Project: Cassandra > Issue Type: Bug >Reporter: Hannu Kröger >Assignee: mck >Priority: Critical > Labels: tracing > Fix For: 3.11.x, 4.x > > > Caused by CASSANDRA-10392. > Related to CASSANDRA-11706. > When querying using CQL statements (not prepared) the message type is > QueryMessage and the code in > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/QueryMessage.java#L101 > is as follows: > {code:java} > if (state.traceNextQuery()) > { > state.createTracingSession(); > ImmutableMap.Builderbuilder = > ImmutableMap.builder(); > {code} > {{state.createTracingSession();}} should probably be > {{state.createTracingSession(getCustomPayload());}}. At least that fixes the > problem for me. > This also raises the question whether some other parts of the code should > pass the custom payload as well (I'm not the right person to analyze this): > {code} > $ ag createTracingSession > src/java/org/apache/cassandra/service/QueryState.java > 80:public void createTracingSession() > 82:createTracingSession(Collections.EMPTY_MAP); > 85:public void createTracingSession(Map customPayload) > src/java/org/apache/cassandra/thrift/CassandraServer.java > 2528:state().getQueryState().createTracingSession(); > src/java/org/apache/cassandra/transport/messages/BatchMessage.java > 163:state.createTracingSession(); > src/java/org/apache/cassandra/transport/messages/ExecuteMessage.java > 114:state.createTracingSession(getCustomPayload()); > src/java/org/apache/cassandra/transport/messages/QueryMessage.java > 101:state.createTracingSession(); > src/java/org/apache/cassandra/transport/messages/PrepareMessage.java > 74:state.createTracingSession(); > {code} > This is not marked as `minor` as the CASSANDRA-11706 was because this cannot > be fixed by the tracing plugin. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13463) nodetool toppartitions - error: String didn't validate
[ https://issues.apache.org/jira/browse/CASSANDRA-13463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975528#comment-15975528 ] Aleksandr Ivanov commented on CASSANDRA-13463: -- Forgot to mention, that same command could work on 2nd, 3rd try. But if period is long (>10s) then it doesn't work at all. Paritions are 2..3KB big, read rate on node ~300/s, write rate ~100/s > nodetool toppartitions - error: String didn't validate > -- > > Key: CASSANDRA-13463 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13463 > Project: Cassandra > Issue Type: Bug > Components: Observability > Environment: Debian Jessie, Java 1.8.0-121, Cassandra v3.0.11 >Reporter: Aleksandr Ivanov > > nodetool toppartitions doesn't work for most of runs and failing with > following message > {code} > error: String didn't validate. > -- StackTrace -- > org.apache.cassandra.serializers.MarshalException: String didn't validate. > at > org.apache.cassandra.serializers.UTF8Serializer.validate(UTF8Serializer.java:35) > at > org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) > at > org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1559) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) > at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) > at > com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) > at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) > at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) > at > com.sun.jmx.remote.security.MBeanServerAccessController.invoke(MBeanServerAccessController.java:468) > at > javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468) > at > javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) > at > javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309) > at java.security.AccessController.doPrivileged(Native Method) > at > javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1408) > at > javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829) > at sun.reflect.GeneratedMethodAccessor90.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:346) > at sun.rmi.transport.Transport$1.run(Transport.java:200) > at sun.rmi.transport.Transport$1.run(Transport.java:197) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Transport.java:196) > at > sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683) > at java.security.AccessController.doPrivileged(Native Method) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > It is easily reproducible if period is longer that 1 second. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13463) nodetool toppartitions - error: String didn't validate
[ https://issues.apache.org/jira/browse/CASSANDRA-13463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975514#comment-15975514 ] Chris Lohfink commented on CASSANDRA-13463: --- relevant change : https://github.com/aweisberg/cassandra/commit/e5c7992ea3099bb90930cad4282803fb6556de18#diff-98f5acb96aa6d684781936c141132e2aR1481 > nodetool toppartitions - error: String didn't validate > -- > > Key: CASSANDRA-13463 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13463 > Project: Cassandra > Issue Type: Bug > Components: Observability > Environment: Debian Jessie, Java 1.8.0-121, Cassandra v3.0.11 >Reporter: Aleksandr Ivanov > > nodetool toppartitions doesn't work for most of runs and failing with > following message > {code} > error: String didn't validate. > -- StackTrace -- > org.apache.cassandra.serializers.MarshalException: String didn't validate. > at > org.apache.cassandra.serializers.UTF8Serializer.validate(UTF8Serializer.java:35) > at > org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) > at > org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1559) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) > at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) > at > com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) > at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) > at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) > at > com.sun.jmx.remote.security.MBeanServerAccessController.invoke(MBeanServerAccessController.java:468) > at > javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468) > at > javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) > at > javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309) > at java.security.AccessController.doPrivileged(Native Method) > at > javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1408) > at > javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829) > at sun.reflect.GeneratedMethodAccessor90.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:346) > at sun.rmi.transport.Transport$1.run(Transport.java:200) > at sun.rmi.transport.Transport$1.run(Transport.java:197) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Transport.java:196) > at > sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683) > at java.security.AccessController.doPrivileged(Native Method) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > It is easily reproducible if period is longer that 1 second. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13463) nodetool toppartitions - error: String didn't validate
[ https://issues.apache.org/jira/browse/CASSANDRA-13463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975509#comment-15975509 ] Chris Lohfink commented on CASSANDRA-13463: --- I think it would fix it at least. The `.array()` doesn't work when its not backed by an array (ie when partition comes from a files byte buffer). [~aweisberg]'s fix had nothing to do with that but he changed it to use the byte buffer properly. > nodetool toppartitions - error: String didn't validate > -- > > Key: CASSANDRA-13463 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13463 > Project: Cassandra > Issue Type: Bug > Components: Observability > Environment: Debian Jessie, Java 1.8.0-121, Cassandra v3.0.11 >Reporter: Aleksandr Ivanov > > nodetool toppartitions doesn't work for most of runs and failing with > following message > {code} > error: String didn't validate. > -- StackTrace -- > org.apache.cassandra.serializers.MarshalException: String didn't validate. > at > org.apache.cassandra.serializers.UTF8Serializer.validate(UTF8Serializer.java:35) > at > org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) > at > org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1559) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) > at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) > at > com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) > at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) > at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) > at > com.sun.jmx.remote.security.MBeanServerAccessController.invoke(MBeanServerAccessController.java:468) > at > javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468) > at > javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) > at > javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309) > at java.security.AccessController.doPrivileged(Native Method) > at > javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1408) > at > javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829) > at sun.reflect.GeneratedMethodAccessor90.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:346) > at sun.rmi.transport.Transport$1.run(Transport.java:200) > at sun.rmi.transport.Transport$1.run(Transport.java:197) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Transport.java:196) > at > sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683) > at java.security.AccessController.doPrivileged(Native Method) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > It is easily reproducible if period is longer that 1 second. -- This message was sent by Atlassian JIRA
[jira] [Comment Edited] (CASSANDRA-13463) nodetool toppartitions - error: String didn't validate
[ https://issues.apache.org/jira/browse/CASSANDRA-13463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975509#comment-15975509 ] Chris Lohfink edited comment on CASSANDRA-13463 at 4/19/17 8:59 PM: I think it would fix it at least. The {{.array()}} doesn't work when its not backed by an array (ie when partition comes from a files byte buffer). [~aweisberg]'s fix had nothing to do with that but he changed it to use the byte buffer properly. was (Author: cnlwsu): I think it would fix it at least. The `.array()` doesn't work when its not backed by an array (ie when partition comes from a files byte buffer). [~aweisberg]'s fix had nothing to do with that but he changed it to use the byte buffer properly. > nodetool toppartitions - error: String didn't validate > -- > > Key: CASSANDRA-13463 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13463 > Project: Cassandra > Issue Type: Bug > Components: Observability > Environment: Debian Jessie, Java 1.8.0-121, Cassandra v3.0.11 >Reporter: Aleksandr Ivanov > > nodetool toppartitions doesn't work for most of runs and failing with > following message > {code} > error: String didn't validate. > -- StackTrace -- > org.apache.cassandra.serializers.MarshalException: String didn't validate. > at > org.apache.cassandra.serializers.UTF8Serializer.validate(UTF8Serializer.java:35) > at > org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) > at > org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1559) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) > at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) > at > com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) > at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) > at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) > at > com.sun.jmx.remote.security.MBeanServerAccessController.invoke(MBeanServerAccessController.java:468) > at > javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468) > at > javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) > at > javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309) > at java.security.AccessController.doPrivileged(Native Method) > at > javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1408) > at > javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829) > at sun.reflect.GeneratedMethodAccessor90.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:346) > at sun.rmi.transport.Transport$1.run(Transport.java:200) > at sun.rmi.transport.Transport$1.run(Transport.java:197) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Transport.java:196) > at > sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683) > at java.security.AccessController.doPrivileged(Native Method) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) > at >
[jira] [Commented] (CASSANDRA-13463) nodetool toppartitions - error: String didn't validate
[ https://issues.apache.org/jira/browse/CASSANDRA-13463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975501#comment-15975501 ] Aleksandr Ivanov commented on CASSANDRA-13463: -- Unfortunately not. Don't have 3.11 or 3.2+ environment. But I can build patched 3.0.x in order to test CASSANDRA-9241 fix. > nodetool toppartitions - error: String didn't validate > -- > > Key: CASSANDRA-13463 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13463 > Project: Cassandra > Issue Type: Bug > Components: Observability > Environment: Debian Jessie, Java 1.8.0-121, Cassandra v3.0.11 >Reporter: Aleksandr Ivanov > > nodetool toppartitions doesn't work for most of runs and failing with > following message > {code} > error: String didn't validate. > -- StackTrace -- > org.apache.cassandra.serializers.MarshalException: String didn't validate. > at > org.apache.cassandra.serializers.UTF8Serializer.validate(UTF8Serializer.java:35) > at > org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) > at > org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1559) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) > at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) > at > com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) > at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) > at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) > at > com.sun.jmx.remote.security.MBeanServerAccessController.invoke(MBeanServerAccessController.java:468) > at > javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468) > at > javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) > at > javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309) > at java.security.AccessController.doPrivileged(Native Method) > at > javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1408) > at > javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829) > at sun.reflect.GeneratedMethodAccessor90.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:346) > at sun.rmi.transport.Transport$1.run(Transport.java:200) > at sun.rmi.transport.Transport$1.run(Transport.java:197) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Transport.java:196) > at > sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683) > at java.security.AccessController.doPrivileged(Native Method) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > It is easily reproducible if period is longer that 1 second. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13463) nodetool toppartitions - error: String didn't validate
[ https://issues.apache.org/jira/browse/CASSANDRA-13463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975478#comment-15975478 ] Chris Lohfink edited comment on CASSANDRA-13463 at 4/19/17 8:43 PM: I believe this was fixed in CASSANDRA-9241. Can you try on a 3.11 or 3.2+ version? was (Author: cnlwsu): fixed in CASSANDRA-9241 > nodetool toppartitions - error: String didn't validate > -- > > Key: CASSANDRA-13463 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13463 > Project: Cassandra > Issue Type: Bug > Components: Observability > Environment: Debian Jessie, Java 1.8.0-121, Cassandra v3.0.11 >Reporter: Aleksandr Ivanov > > nodetool toppartitions doesn't work for most of runs and failing with > following message > {code} > error: String didn't validate. > -- StackTrace -- > org.apache.cassandra.serializers.MarshalException: String didn't validate. > at > org.apache.cassandra.serializers.UTF8Serializer.validate(UTF8Serializer.java:35) > at > org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) > at > org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1559) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) > at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) > at > com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) > at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) > at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) > at > com.sun.jmx.remote.security.MBeanServerAccessController.invoke(MBeanServerAccessController.java:468) > at > javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468) > at > javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) > at > javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309) > at java.security.AccessController.doPrivileged(Native Method) > at > javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1408) > at > javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829) > at sun.reflect.GeneratedMethodAccessor90.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:346) > at sun.rmi.transport.Transport$1.run(Transport.java:200) > at sun.rmi.transport.Transport$1.run(Transport.java:197) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Transport.java:196) > at > sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683) > at java.security.AccessController.doPrivileged(Native Method) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > It is easily reproducible if period is longer that 1 second. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13463) nodetool toppartitions - error: String didn't validate
[ https://issues.apache.org/jira/browse/CASSANDRA-13463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975478#comment-15975478 ] Chris Lohfink commented on CASSANDRA-13463: --- fixed in CASSANDRA-9241 > nodetool toppartitions - error: String didn't validate > -- > > Key: CASSANDRA-13463 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13463 > Project: Cassandra > Issue Type: Bug > Components: Observability > Environment: Debian Jessie, Java 1.8.0-121, Cassandra v3.0.11 >Reporter: Aleksandr Ivanov > > nodetool toppartitions doesn't work for most of runs and failing with > following message > {code} > error: String didn't validate. > -- StackTrace -- > org.apache.cassandra.serializers.MarshalException: String didn't validate. > at > org.apache.cassandra.serializers.UTF8Serializer.validate(UTF8Serializer.java:35) > at > org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) > at > org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1559) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) > at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) > at > com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) > at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) > at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) > at > com.sun.jmx.remote.security.MBeanServerAccessController.invoke(MBeanServerAccessController.java:468) > at > javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468) > at > javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) > at > javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309) > at java.security.AccessController.doPrivileged(Native Method) > at > javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1408) > at > javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829) > at sun.reflect.GeneratedMethodAccessor90.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:346) > at sun.rmi.transport.Transport$1.run(Transport.java:200) > at sun.rmi.transport.Transport$1.run(Transport.java:197) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Transport.java:196) > at > sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683) > at java.security.AccessController.doPrivileged(Native Method) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > It is easily reproducible if period is longer that 1 second. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13463) nodetool toppartitions - error: String didn't validate
[ https://issues.apache.org/jira/browse/CASSANDRA-13463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975468#comment-15975468 ] Chris Lohfink commented on CASSANDRA-13463: --- can you provide schema for table your doing this on and the partitions your inserting/reading? I cannot reproduce with simple tables. > nodetool toppartitions - error: String didn't validate > -- > > Key: CASSANDRA-13463 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13463 > Project: Cassandra > Issue Type: Bug > Components: Observability > Environment: Debian Jessie, Java 1.8.0-121, Cassandra v3.0.11 >Reporter: Aleksandr Ivanov > > nodetool toppartitions doesn't work for most of runs and failing with > following message > {code} > error: String didn't validate. > -- StackTrace -- > org.apache.cassandra.serializers.MarshalException: String didn't validate. > at > org.apache.cassandra.serializers.UTF8Serializer.validate(UTF8Serializer.java:35) > at > org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) > at > org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1559) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) > at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) > at > com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) > at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) > at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) > at > com.sun.jmx.remote.security.MBeanServerAccessController.invoke(MBeanServerAccessController.java:468) > at > javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468) > at > javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) > at > javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309) > at java.security.AccessController.doPrivileged(Native Method) > at > javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1408) > at > javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829) > at sun.reflect.GeneratedMethodAccessor90.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:346) > at sun.rmi.transport.Transport$1.run(Transport.java:200) > at sun.rmi.transport.Transport$1.run(Transport.java:197) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Transport.java:196) > at > sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683) > at java.security.AccessController.doPrivileged(Native Method) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > It is easily reproducible if period is longer that 1 second. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13463) nodetool toppartitions - error: String didn't validate
[ https://issues.apache.org/jira/browse/CASSANDRA-13463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Ivanov updated CASSANDRA-13463: - Description: nodetool toppartitions doesn't work for most of runs and failing with following message {code} error: String didn't validate. -- StackTrace -- org.apache.cassandra.serializers.MarshalException: String didn't validate. at org.apache.cassandra.serializers.UTF8Serializer.validate(UTF8Serializer.java:35) at org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) at org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1559) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at com.sun.jmx.remote.security.MBeanServerAccessController.invoke(MBeanServerAccessController.java:468) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309) at java.security.AccessController.doPrivileged(Native Method) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1408) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829) at sun.reflect.GeneratedMethodAccessor90.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:346) at sun.rmi.transport.Transport$1.run(Transport.java:200) at sun.rmi.transport.Transport$1.run(Transport.java:197) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:196) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} It is easily reproducible if period is longer that 1 second. was: nodetool toppartitions doesn't work for most of runs and failing with following message {code} error: String didn't validate. -- StackTrace -- org.apache.cassandra.serializers.MarshalException: String didn't validate. at org.apache.cassandra.serializers.UTF8Serializer.validate(UTF8Serializer.java:35) at org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) at org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1559) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) at
[jira] [Commented] (CASSANDRA-13006) Disable automatic heap dumps on OOM error
[ https://issues.apache.org/jira/browse/CASSANDRA-13006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975265#comment-15975265 ] Nibin G commented on CASSANDRA-13006: - Why can't we delegate the heap dump generation to JVM if jmap is not available in class path ? JRE can generate heap dump even if jmap is not there in the path. > Disable automatic heap dumps on OOM error > - > > Key: CASSANDRA-13006 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13006 > Project: Cassandra > Issue Type: Bug > Components: Configuration >Reporter: anmols >Assignee: Benjamin Lerer >Priority: Minor > Fix For: 3.0.9 > > Attachments: 13006-3.0.9.txt > > > With CASSANDRA-9861, a change was added to enable collecting heap dumps by > default if the process encountered an OOM error. These heap dumps are stored > in the Apache Cassandra home directory unless configured otherwise (see > [Cassandra Support > Document|https://support.datastax.com/hc/en-us/articles/204225959-Generating-and-Analyzing-Heap-Dumps] > for this feature). > > The creation and storage of heap dumps aides debugging and investigative > workflows, but is not be desirable for a production environment where these > heap dumps may occupy a large amount of disk space and require manual > intervention for cleanups. > > Managing heap dumps on out of memory errors and configuring the paths for > these heap dumps are available as JVM options in JVM. The current behavior > conflicts with the Boolean JVM flag HeapDumpOnOutOfMemoryError. > > A patch can be proposed here that would make the heap dump on OOM error honor > the HeapDumpOnOutOfMemoryError flag. Users who would want to still generate > heap dumps on OOM errors can set the -XX:+HeapDumpOnOutOfMemoryError JVM > option. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with storage requirements approximating RF=2
[ https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-13442: --- Description: Replication factors like RF=2 can't provide strong consistency and availability because if a single node is lost it's impossible to reach a quorum of replicas. Stepping up to RF=3 will allow you to lose a node and still achieve quorum for reads and writes, but requires committing additional storage. The requirement of a quorum for writes/reads doesn't seem to be something that can be relaxed without additional constraints on queries, but it seems like it should be possible to relax the requirement that 3 full copies of the entire data set are kept. What is actually required is a covering data set for the range and we should be able to achieve a covering data set and high availability without having three full copies. After a repair we know that some subset of the data set is fully replicated. At that point we don't have to read from a quorum of nodes for the repaired data. It is sufficient to read from a single node for the repaired data and a quorum of nodes for the unrepaired data. One way to exploit this would be to have N replicas, say the last N replicas (where N varies with RF) in the preference list, delete all repaired data after a repair completes. Subsequent quorum reads will be able to retrieve the repaired data from any of the two full replicas and the unrepaired data from a quorum read of any replica including the "transient" replicas. Configuration for something like this in NTS might be something similar to { DC1="3-1", DC2="3-2" } where the first value is the replication factor used for consistency and the second values is the number of transient replicas. If you specify { DC1=3, DC2=3 } then the number of transient replicas defaults to 0 and you get the same behavior you have today. was: Replication factors like RF=2 can't provide strong consistency and availability because if a single node is lost it's impossible to reach a quorum of replicas. Stepping up to RF=3 will allow you to lose a node and still achieve quorum for reads and writes, but requires committing additional storage. The requirement of a quorum for writes/reads doesn't seem to be something that can be relaxed without additional constraints on queries, but it seems like it should be possible to relax the requirement that 3 full copies of the entire data set are kept. What is actually required is a covering data set for the range and we should be able to achieve a covering data set and high availability without having three full copies. After a repair we know that some subset of the data set is fully replicated. At that point we don't have to read from a quorum of nodes for the repaired data. It is sufficient to read from a single node for the repaired data and a quorum of nodes for the unrepaired data. One way to exploit this would be to have N replicas, say the last N replicas (where N varies with RF) in the preference list, delete all repaired data after a repair completes. Subsequent quorum reads will be able to retrieve the repaired data from any of the two full replicas and the unrepaired data from a quorum read of any replica including the "transient" replicas. > Support a means of strongly consistent highly available replication with > storage requirements approximating RF=2 > > > Key: CASSANDRA-13442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13442 > Project: Cassandra > Issue Type: Improvement > Components: Compaction, Coordination, Distributed Metadata, Local > Write-Read Paths >Reporter: Ariel Weisberg > > Replication factors like RF=2 can't provide strong consistency and > availability because if a single node is lost it's impossible to reach a > quorum of replicas. Stepping up to RF=3 will allow you to lose a node and > still achieve quorum for reads and writes, but requires committing additional > storage. > The requirement of a quorum for writes/reads doesn't seem to be something > that can be relaxed without additional constraints on queries, but it seems > like it should be possible to relax the requirement that 3 full copies of the > entire data set are kept. What is actually required is a covering data set > for the range and we should be able to achieve a covering data set and high > availability without having three full copies. > After a repair we know that some subset of the data set is fully replicated. > At that point we don't have to read from a quorum of nodes for the repaired > data. It is sufficient to read from a single node for the repaired data and a > quorum of nodes for the unrepaired data. > One way to
[jira] [Updated] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with tunable storage requirements
[ https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-13442: --- Summary: Support a means of strongly consistent highly available replication with tunable storage requirements (was: Support a means of strongly consistent highly available replication with storage requirements approximating RF=2) > Support a means of strongly consistent highly available replication with > tunable storage requirements > - > > Key: CASSANDRA-13442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13442 > Project: Cassandra > Issue Type: Improvement > Components: Compaction, Coordination, Distributed Metadata, Local > Write-Read Paths >Reporter: Ariel Weisberg > > Replication factors like RF=2 can't provide strong consistency and > availability because if a single node is lost it's impossible to reach a > quorum of replicas. Stepping up to RF=3 will allow you to lose a node and > still achieve quorum for reads and writes, but requires committing additional > storage. > The requirement of a quorum for writes/reads doesn't seem to be something > that can be relaxed without additional constraints on queries, but it seems > like it should be possible to relax the requirement that 3 full copies of the > entire data set are kept. What is actually required is a covering data set > for the range and we should be able to achieve a covering data set and high > availability without having three full copies. > After a repair we know that some subset of the data set is fully replicated. > At that point we don't have to read from a quorum of nodes for the repaired > data. It is sufficient to read from a single node for the repaired data and a > quorum of nodes for the unrepaired data. > One way to exploit this would be to have N replicas, say the last N replicas > (where N varies with RF) in the preference list, delete all repaired data > after a repair completes. Subsequent quorum reads will be able to retrieve > the repaired data from any of the two full replicas and the unrepaired data > from a quorum read of any replica including the "transient" replicas. > Configuration for something like this in NTS might be something similar to { > DC1="3-1", DC2="3-2" } where the first value is the replication factor used > for consistency and the second values is the number of transient replicas. If > you specify { DC1=3, DC2=3 } then the number of transient replicas defaults > to 0 and you get the same behavior you have today. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13006) Disable automatic heap dumps on OOM error
[ https://issues.apache.org/jira/browse/CASSANDRA-13006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975192#comment-15975192 ] Jeremiah Jordan commented on CASSANDRA-13006: - We could fall back to trying to use the "com.sun.management:type=HotSpotDiagnostic" bean directly if we can't find jmap. Some links for doing this: https://blogs.oracle.com/sundararajan/entry/programmatically_dumping_heap_from_java http://stackoverflow.com/a/12297339/138693 > Disable automatic heap dumps on OOM error > - > > Key: CASSANDRA-13006 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13006 > Project: Cassandra > Issue Type: Bug > Components: Configuration >Reporter: anmols >Assignee: Benjamin Lerer >Priority: Minor > Fix For: 3.0.9 > > Attachments: 13006-3.0.9.txt > > > With CASSANDRA-9861, a change was added to enable collecting heap dumps by > default if the process encountered an OOM error. These heap dumps are stored > in the Apache Cassandra home directory unless configured otherwise (see > [Cassandra Support > Document|https://support.datastax.com/hc/en-us/articles/204225959-Generating-and-Analyzing-Heap-Dumps] > for this feature). > > The creation and storage of heap dumps aides debugging and investigative > workflows, but is not be desirable for a production environment where these > heap dumps may occupy a large amount of disk space and require manual > intervention for cleanups. > > Managing heap dumps on out of memory errors and configuring the paths for > these heap dumps are available as JVM options in JVM. The current behavior > conflicts with the Boolean JVM flag HeapDumpOnOutOfMemoryError. > > A patch can be proposed here that would make the heap dump on OOM error honor > the HeapDumpOnOutOfMemoryError flag. Users who would want to still generate > heap dumps on OOM errors can set the -XX:+HeapDumpOnOutOfMemoryError JVM > option. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13006) Disable automatic heap dumps on OOM error
[ https://issues.apache.org/jira/browse/CASSANDRA-13006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975153#comment-15975153 ] Nibin G edited comment on CASSANDRA-13006 at 4/19/17 5:49 PM: -- Oracle Java's JRE 8 and Server JRE 8 for linux environments are not shipping jmap anymore. That means, we have to use Oracle Java's JDK for the heap dumps to be generated from cassandra. And some of the security compliance won't permit the use of JDK in production. It would be great if an option is provided to disable heap dump from the application code[1]. So that JVM can generate the heap dump. Or use jcmd utility (available in server-jre 8 and jdk 8) instead of jmap. [1] https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/src/java/org/apache/cassandra/utils/JVMStabilityInspector.java#L56 was (Author: nibin.gv): Oracle Java's JRE 8 and Server JRE 8 for linux environments are not shipping jmap anymore. That means, we have to use Oracle Java's JDK for the heap dumps to be generated. And some of the security compliance won't permit the use of JDK in production. It would be great if an option is provided to disable heap dump from the application code[1]. So that JVM can generate the heap dump. [1] https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/src/java/org/apache/cassandra/utils/JVMStabilityInspector.java#L56 > Disable automatic heap dumps on OOM error > - > > Key: CASSANDRA-13006 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13006 > Project: Cassandra > Issue Type: Bug > Components: Configuration >Reporter: anmols >Assignee: Benjamin Lerer >Priority: Minor > Fix For: 3.0.9 > > Attachments: 13006-3.0.9.txt > > > With CASSANDRA-9861, a change was added to enable collecting heap dumps by > default if the process encountered an OOM error. These heap dumps are stored > in the Apache Cassandra home directory unless configured otherwise (see > [Cassandra Support > Document|https://support.datastax.com/hc/en-us/articles/204225959-Generating-and-Analyzing-Heap-Dumps] > for this feature). > > The creation and storage of heap dumps aides debugging and investigative > workflows, but is not be desirable for a production environment where these > heap dumps may occupy a large amount of disk space and require manual > intervention for cleanups. > > Managing heap dumps on out of memory errors and configuring the paths for > these heap dumps are available as JVM options in JVM. The current behavior > conflicts with the Boolean JVM flag HeapDumpOnOutOfMemoryError. > > A patch can be proposed here that would make the heap dump on OOM error honor > the HeapDumpOnOutOfMemoryError flag. Users who would want to still generate > heap dumps on OOM errors can set the -XX:+HeapDumpOnOutOfMemoryError JVM > option. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13006) Disable automatic heap dumps on OOM error
[ https://issues.apache.org/jira/browse/CASSANDRA-13006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975153#comment-15975153 ] Nibin G commented on CASSANDRA-13006: - Oracle Java's JRE 8 and Server JRE 8 for linux environments are not shipping jmap anymore. That means, we have to use Oracle Java's JDK for the heap dumps to be generated. And some of the security compliance won't permit the use of JDK in production. It would be great if an option is provided to disable heap dump from the application code[1]. So that JVM can generate the heap dump. [1] https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/src/java/org/apache/cassandra/utils/JVMStabilityInspector.java#L56 > Disable automatic heap dumps on OOM error > - > > Key: CASSANDRA-13006 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13006 > Project: Cassandra > Issue Type: Bug > Components: Configuration >Reporter: anmols >Assignee: Benjamin Lerer >Priority: Minor > Fix For: 3.0.9 > > Attachments: 13006-3.0.9.txt > > > With CASSANDRA-9861, a change was added to enable collecting heap dumps by > default if the process encountered an OOM error. These heap dumps are stored > in the Apache Cassandra home directory unless configured otherwise (see > [Cassandra Support > Document|https://support.datastax.com/hc/en-us/articles/204225959-Generating-and-Analyzing-Heap-Dumps] > for this feature). > > The creation and storage of heap dumps aides debugging and investigative > workflows, but is not be desirable for a production environment where these > heap dumps may occupy a large amount of disk space and require manual > intervention for cleanups. > > Managing heap dumps on out of memory errors and configuring the paths for > these heap dumps are available as JVM options in JVM. The current behavior > conflicts with the Boolean JVM flag HeapDumpOnOutOfMemoryError. > > A patch can be proposed here that would make the heap dump on OOM error honor > the HeapDumpOnOutOfMemoryError flag. Users who would want to still generate > heap dumps on OOM errors can set the -XX:+HeapDumpOnOutOfMemoryError JVM > option. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Corentin Chary updated CASSANDRA-13418: --- Agreed for the option. Would be easy to implement it using a new one. IMOH it's more dangerous to have nothing as this would degrade write performances and take up to twice the space originally planned. Compared to that it isn't really an issue to have re-appearing data after an explicit deletion (I think that's the worse that can happen, can be wrong) > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13365) Nodes entering GC loop, does not recover
[ https://issues.apache.org/jira/browse/CASSANDRA-13365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975075#comment-15975075 ] Jeff Jirsa commented on CASSANDRA-13365: {code} 9: 2870889 160769784 org.apache.cassandra.transport.messages.ResultMessage$Rows 10: 2937336 140992128 io.netty.buffer.SlicedAbstractByteBuf 11: 8854 118773984 [Lio.netty.util.Recycler$DefaultHandle; 12: 2830805 113232200 org.apache.cassandra.db.rows.BufferCell 13: 2937336 93994752 org.apache.cassandra.transport.Frame$Header 14: 2870928 91869696 org.apache.cassandra.cql3.ResultSet$ResultMetadata 15: 2728627 87316064 org.apache.cassandra.db.rows.BTreeRow {code} 2.8M ResultMessage$Rows, BufferCells, ResultSets, and BTreeRows suggests you have an awful lot of read results in flight, and you've filled the heap with them. Is it possible someone is doing a query with a very, very large LIMIT rather than using driver's fetchSize() or manual paging? Or do you have concurrent read threads turned up very high? > Nodes entering GC loop, does not recover > > > Key: CASSANDRA-13365 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13365 > Project: Cassandra > Issue Type: Bug > Environment: 34-node cluster over 4 DCs > Linux CentOS 7.2 x86 > Mix of 64GB/128GB RAM / node > Mix of 32/40 hardware threads / node, Xeon ~2.4Ghz > High read volume, low write volume, occasional sstable bulk loading >Reporter: Mina Naguib > > Over the last week we've been observing two related problems affecting our > Cassandra cluster > Problem 1: 1-few nodes per DC entering GC loop, not recovering > Checking the heap usage stats, there's a sudden jump of 1-3GB. Some nodes > recover, but some don't and log this: > {noformat} > 2017-03-21T11:23:02.957-0400: 54099.519: [Full GC (Allocation Failure) > 13G->11G(14G), 29.4127307 secs] > 2017-03-21T11:23:45.270-0400: 54141.833: [Full GC (Allocation Failure) > 13G->12G(14G), 28.1561881 secs] > 2017-03-21T11:24:20.307-0400: 54176.869: [Full GC (Allocation Failure) > 13G->13G(14G), 27.7019501 secs] > 2017-03-21T11:24:50.528-0400: 54207.090: [Full GC (Allocation Failure) > 13G->13G(14G), 27.1372267 secs] > 2017-03-21T11:25:19.190-0400: 54235.752: [Full GC (Allocation Failure) > 13G->13G(14G), 27.0703975 secs] > 2017-03-21T11:25:46.711-0400: 54263.273: [Full GC (Allocation Failure) > 13G->13G(14G), 27.3187768 secs] > 2017-03-21T11:26:15.419-0400: 54291.981: [Full GC (Allocation Failure) > 13G->13G(14G), 26.9493405 secs] > 2017-03-21T11:26:43.399-0400: 54319.961: [Full GC (Allocation Failure) > 13G->13G(14G), 27.5222085 secs] > 2017-03-21T11:27:11.383-0400: 54347.945: [Full GC (Allocation Failure) > 13G->13G(14G), 27.1769581 secs] > 2017-03-21T11:27:40.174-0400: 54376.737: [Full GC (Allocation Failure) > 13G->13G(14G), 27.4639031 secs] > 2017-03-21T11:28:08.946-0400: 54405.508: [Full GC (Allocation Failure) > 13G->13G(14G), 30.3480523 secs] > 2017-03-21T11:28:40.117-0400: 54436.680: [Full GC (Allocation Failure) > 13G->13G(14G), 27.8220513 secs] > 2017-03-21T11:29:08.459-0400: 54465.022: [Full GC (Allocation Failure) > 13G->13G(14G), 27.4691271 secs] > 2017-03-21T11:29:37.114-0400: 54493.676: [Full GC (Allocation Failure) > 13G->13G(14G), 27.0275733 secs] > 2017-03-21T11:30:04.635-0400: 54521.198: [Full GC (Allocation Failure) > 13G->13G(14G), 27.1902627 secs] > 2017-03-21T11:30:32.114-0400: 54548.676: [Full GC (Allocation Failure) > 13G->13G(14G), 27.8872850 secs] > 2017-03-21T11:31:01.430-0400: 54577.993: [Full GC (Allocation Failure) > 13G->13G(14G), 27.1609706 secs] > 2017-03-21T11:31:29.024-0400: 54605.587: [Full GC (Allocation Failure) > 13G->13G(14G), 27.3635138 secs] > 2017-03-21T11:31:57.303-0400: 54633.865: [Full GC (Allocation Failure) > 13G->13G(14G), 27.4143510 secs] > 2017-03-21T11:32:25.110-0400: 54661.672: [Full GC (Allocation Failure) > 13G->13G(14G), 27.8595986 secs] > 2017-03-21T11:32:53.922-0400: 54690.485: [Full GC (Allocation Failure) > 13G->13G(14G), 27.5242543 secs] > 2017-03-21T11:33:21.867-0400: 54718.429: [Full GC (Allocation Failure) > 13G->13G(14G), 30.8930130 secs] > 2017-03-21T11:33:53.712-0400: 54750.275: [Full GC (Allocation Failure) > 13G->13G(14G), 27.6523013 secs] > 2017-03-21T11:34:21.760-0400: 54778.322: [Full GC (Allocation Failure) > 13G->13G(14G), 27.3030198 secs] > 2017-03-21T11:34:50.073-0400: 54806.635: [Full GC (Allocation Failure) > 13G->13G(14G), 27.1594154 secs] > 2017-03-21T11:35:17.743-0400: 54834.306: [Full GC (Allocation Failure) > 13G->13G(14G), 27.3766949 secs] > 2017-03-21T11:35:45.797-0400: 54862.360: [Full GC (Allocation Failure) > 13G->13G(14G), 27.5756770 secs] > 2017-03-21T11:36:13.816-0400: 54890.378: [Full GC
[jira] [Commented] (CASSANDRA-13365) Nodes entering GC loop, does not recover
[ https://issues.apache.org/jira/browse/CASSANDRA-13365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975031#comment-15975031 ] ZhaoYang commented on CASSANDRA-13365: -- [~minaguib] could you share what queries are running at that moment? or you could try `sjk-plus` to see which thread is allocating huge memory > Nodes entering GC loop, does not recover > > > Key: CASSANDRA-13365 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13365 > Project: Cassandra > Issue Type: Bug > Environment: 34-node cluster over 4 DCs > Linux CentOS 7.2 x86 > Mix of 64GB/128GB RAM / node > Mix of 32/40 hardware threads / node, Xeon ~2.4Ghz > High read volume, low write volume, occasional sstable bulk loading >Reporter: Mina Naguib > > Over the last week we've been observing two related problems affecting our > Cassandra cluster > Problem 1: 1-few nodes per DC entering GC loop, not recovering > Checking the heap usage stats, there's a sudden jump of 1-3GB. Some nodes > recover, but some don't and log this: > {noformat} > 2017-03-21T11:23:02.957-0400: 54099.519: [Full GC (Allocation Failure) > 13G->11G(14G), 29.4127307 secs] > 2017-03-21T11:23:45.270-0400: 54141.833: [Full GC (Allocation Failure) > 13G->12G(14G), 28.1561881 secs] > 2017-03-21T11:24:20.307-0400: 54176.869: [Full GC (Allocation Failure) > 13G->13G(14G), 27.7019501 secs] > 2017-03-21T11:24:50.528-0400: 54207.090: [Full GC (Allocation Failure) > 13G->13G(14G), 27.1372267 secs] > 2017-03-21T11:25:19.190-0400: 54235.752: [Full GC (Allocation Failure) > 13G->13G(14G), 27.0703975 secs] > 2017-03-21T11:25:46.711-0400: 54263.273: [Full GC (Allocation Failure) > 13G->13G(14G), 27.3187768 secs] > 2017-03-21T11:26:15.419-0400: 54291.981: [Full GC (Allocation Failure) > 13G->13G(14G), 26.9493405 secs] > 2017-03-21T11:26:43.399-0400: 54319.961: [Full GC (Allocation Failure) > 13G->13G(14G), 27.5222085 secs] > 2017-03-21T11:27:11.383-0400: 54347.945: [Full GC (Allocation Failure) > 13G->13G(14G), 27.1769581 secs] > 2017-03-21T11:27:40.174-0400: 54376.737: [Full GC (Allocation Failure) > 13G->13G(14G), 27.4639031 secs] > 2017-03-21T11:28:08.946-0400: 54405.508: [Full GC (Allocation Failure) > 13G->13G(14G), 30.3480523 secs] > 2017-03-21T11:28:40.117-0400: 54436.680: [Full GC (Allocation Failure) > 13G->13G(14G), 27.8220513 secs] > 2017-03-21T11:29:08.459-0400: 54465.022: [Full GC (Allocation Failure) > 13G->13G(14G), 27.4691271 secs] > 2017-03-21T11:29:37.114-0400: 54493.676: [Full GC (Allocation Failure) > 13G->13G(14G), 27.0275733 secs] > 2017-03-21T11:30:04.635-0400: 54521.198: [Full GC (Allocation Failure) > 13G->13G(14G), 27.1902627 secs] > 2017-03-21T11:30:32.114-0400: 54548.676: [Full GC (Allocation Failure) > 13G->13G(14G), 27.8872850 secs] > 2017-03-21T11:31:01.430-0400: 54577.993: [Full GC (Allocation Failure) > 13G->13G(14G), 27.1609706 secs] > 2017-03-21T11:31:29.024-0400: 54605.587: [Full GC (Allocation Failure) > 13G->13G(14G), 27.3635138 secs] > 2017-03-21T11:31:57.303-0400: 54633.865: [Full GC (Allocation Failure) > 13G->13G(14G), 27.4143510 secs] > 2017-03-21T11:32:25.110-0400: 54661.672: [Full GC (Allocation Failure) > 13G->13G(14G), 27.8595986 secs] > 2017-03-21T11:32:53.922-0400: 54690.485: [Full GC (Allocation Failure) > 13G->13G(14G), 27.5242543 secs] > 2017-03-21T11:33:21.867-0400: 54718.429: [Full GC (Allocation Failure) > 13G->13G(14G), 30.8930130 secs] > 2017-03-21T11:33:53.712-0400: 54750.275: [Full GC (Allocation Failure) > 13G->13G(14G), 27.6523013 secs] > 2017-03-21T11:34:21.760-0400: 54778.322: [Full GC (Allocation Failure) > 13G->13G(14G), 27.3030198 secs] > 2017-03-21T11:34:50.073-0400: 54806.635: [Full GC (Allocation Failure) > 13G->13G(14G), 27.1594154 secs] > 2017-03-21T11:35:17.743-0400: 54834.306: [Full GC (Allocation Failure) > 13G->13G(14G), 27.3766949 secs] > 2017-03-21T11:35:45.797-0400: 54862.360: [Full GC (Allocation Failure) > 13G->13G(14G), 27.5756770 secs] > 2017-03-21T11:36:13.816-0400: 54890.378: [Full GC (Allocation Failure) > 13G->13G(14G), 27.5541813 secs] > 2017-03-21T11:36:41.926-0400: 54918.488: [Full GC (Allocation Failure) > 13G->13G(14G), 33.7510103 secs] > 2017-03-21T11:37:16.132-0400: 54952.695: [Full GC (Allocation Failure) > 13G->13G(14G), 27.4856611 secs] > 2017-03-21T11:37:44.454-0400: 54981.017: [Full GC (Allocation Failure) > 13G->13G(14G), 28.1269335 secs] > 2017-03-21T11:38:12.774-0400: 55009.337: [Full GC (Allocation Failure) > 13G->13G(14G), 27.7830448 secs] > 2017-03-21T11:38:40.840-0400: 55037.402: [Full GC (Allocation Failure) > 13G->13G(14G), 27.3527326 secs] > 2017-03-21T11:39:08.610-0400: 55065.173: [Full GC (Allocation Failure) > 13G->13G(14G), 27.5828941 secs] > 2017-03-21T11:39:36.833-0400: 55093.396: [Full GC (Allocation Failure) >
[jira] [Updated] (CASSANDRA-13307) The specification of protocol version in cqlsh means the python driver doesn't automatically downgrade protocol version.
[ https://issues.apache.org/jira/browse/CASSANDRA-13307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-13307: --- Fix Version/s: (was: 3.11.x) 3.11.0 > The specification of protocol version in cqlsh means the python driver > doesn't automatically downgrade protocol version. > > > Key: CASSANDRA-13307 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13307 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Matt Byrd >Assignee: Matt Byrd >Priority: Minor > Labels: doc-impacting > Fix For: 3.11.0, 4.0 > > > Hi, > Looks like we've regressed on the issue described in: > https://issues.apache.org/jira/browse/CASSANDRA-9467 > In that we're no longer able to connect from newer cqlsh versions > (e.g trunk) to older versions of Cassandra with a lower version of the > protocol (e.g 2.1 with protocol version 3) > The problem seems to be that we're relying on the ability for the client to > automatically downgrade protocol version implemented in Cassandra here: > https://issues.apache.org/jira/browse/CASSANDRA-12838 > and utilised in the python client here: > https://datastax-oss.atlassian.net/browse/PYTHON-240 > The problem however comes when we implemented: > https://datastax-oss.atlassian.net/browse/PYTHON-537 > "Don't downgrade protocol version if explicitly set" > (included when we bumped from 3.5.0 to 3.7.0 of the python driver as part of > fixing: https://issues.apache.org/jira/browse/CASSANDRA-11534) > Since we do explicitly specify the protocol version in the bin/cqlsh.py. > I've got a patch which just adds an option to explicitly specify the protocol > version (for those who want to do that) and then otherwise defaults to not > setting the protocol version, i.e using the protocol version from the client > which we ship, which should by default be the same protocol as the server. > Then it should downgrade gracefully as was intended. > Let me know if that seems reasonable. > Thanks, > Matt -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13463) nodetool toppartitions - error: String didn't validate
[ https://issues.apache.org/jira/browse/CASSANDRA-13463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-13463: --- Reproduced In: 3.0.11 > nodetool toppartitions - error: String didn't validate > -- > > Key: CASSANDRA-13463 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13463 > Project: Cassandra > Issue Type: Bug > Components: Observability > Environment: Debian Jessie, Java 1.8.0-121, Cassandra v3.0.11 >Reporter: Aleksandr Ivanov > > nodetool toppartitions doesn't work for most of runs and failing with > following message > {code} > error: String didn't validate. > -- StackTrace -- > org.apache.cassandra.serializers.MarshalException: String didn't validate. > at > org.apache.cassandra.serializers.UTF8Serializer.validate(UTF8Serializer.java:35) > at > org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) > at > org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1559) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) > at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) > at > com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) > at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) > at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) > at > com.sun.jmx.remote.security.MBeanServerAccessController.invoke(MBeanServerAccessController.java:468) > at > javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468) > at > javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) > at > javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309) > at java.security.AccessController.doPrivileged(Native Method) > at > javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1408) > at > javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829) > at sun.reflect.GeneratedMethodAccessor90.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:346) > at sun.rmi.transport.Transport$1.run(Transport.java:200) > at sun.rmi.transport.Transport$1.run(Transport.java:197) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Transport.java:196) > at > sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683) > at java.security.AccessController.doPrivileged(Native Method) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > It can easily reproducible if period is longer that 1 second. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13463) nodetool toppartitions - error: String didn't validate
[ https://issues.apache.org/jira/browse/CASSANDRA-13463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-13463: --- Component/s: Observability > nodetool toppartitions - error: String didn't validate > -- > > Key: CASSANDRA-13463 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13463 > Project: Cassandra > Issue Type: Bug > Components: Observability > Environment: Debian Jessie, Java 1.8.0-121, Cassandra v3.0.11 >Reporter: Aleksandr Ivanov > > nodetool toppartitions doesn't work for most of runs and failing with > following message > {code} > error: String didn't validate. > -- StackTrace -- > org.apache.cassandra.serializers.MarshalException: String didn't validate. > at > org.apache.cassandra.serializers.UTF8Serializer.validate(UTF8Serializer.java:35) > at > org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) > at > org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1559) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) > at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) > at > com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) > at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) > at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) > at > com.sun.jmx.remote.security.MBeanServerAccessController.invoke(MBeanServerAccessController.java:468) > at > javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468) > at > javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) > at > javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309) > at java.security.AccessController.doPrivileged(Native Method) > at > javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1408) > at > javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829) > at sun.reflect.GeneratedMethodAccessor90.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:346) > at sun.rmi.transport.Transport$1.run(Transport.java:200) > at sun.rmi.transport.Transport$1.run(Transport.java:197) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Transport.java:196) > at > sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683) > at java.security.AccessController.doPrivileged(Native Method) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > It can easily reproducible if period is longer that 1 second. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974996#comment-15974996 ] Jeff Jirsa commented on CASSANDRA-13418: [~iksaif] - this isn't a review, but at first glance I'm not in love with the idea of extending that config option in that way - it wasn't meant for that purpose, though it's sort of tangential (it was really meant for the specific task of grouping sstables for the cleanup compaction, and this isn't the cleanup compaction). There's also the bigger question of whether or not we really want to expose this to users. It's dangerous. I really really wanted something like this at my last employer, but the "this can be dangerous" factor prevented me from writing it. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CASSANDRA-13463) nodetool toppartitions - error: String didn't validate
Aleksandr Ivanov created CASSANDRA-13463: Summary: nodetool toppartitions - error: String didn't validate Key: CASSANDRA-13463 URL: https://issues.apache.org/jira/browse/CASSANDRA-13463 Project: Cassandra Issue Type: Bug Environment: Debian Jessie, Java 1.8.0-121, Cassandra v3.0.11 Reporter: Aleksandr Ivanov nodetool toppartitions doesn't work for most of runs and failing with following message {code} error: String didn't validate. -- StackTrace -- org.apache.cassandra.serializers.MarshalException: String didn't validate. at org.apache.cassandra.serializers.UTF8Serializer.validate(UTF8Serializer.java:35) at org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) at org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1559) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at com.sun.jmx.remote.security.MBeanServerAccessController.invoke(MBeanServerAccessController.java:468) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309) at java.security.AccessController.doPrivileged(Native Method) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1408) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829) at sun.reflect.GeneratedMethodAccessor90.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:346) at sun.rmi.transport.Transport$1.run(Transport.java:200) at sun.rmi.transport.Transport$1.run(Transport.java:197) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:196) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} It can easily reproducible if period is longer that 1 second. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13308) Gossip breaks, Hint files not being deleted on nodetool decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-13308: --- Resolution: Fixed Fix Version/s: (was: 3.11.x) (was: 4.x) (was: 3.0.x) 4.0 3.11.0 3.0.14 Status: Resolved (was: Ready to Commit) Thanks Aleksey. Committed with nits as [5089e74ef4a0eaeb1c439d57f074de1c496421f2|https://github.com/apache/cassandra/commit/5089e74ef4a0eaeb1c439d57f074de1c496421f2] > Gossip breaks, Hint files not being deleted on nodetool decommission > > > Key: CASSANDRA-13308 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13308 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Using Cassandra version 3.0.9 >Reporter: Arijit >Assignee: Jeff Jirsa > Fix For: 3.0.14, 3.11.0, 4.0 > > Attachments: 28207.stack, logs, logs_decommissioned_node > > > How to reproduce the issue I'm seeing: > Shut down Cassandra on one node of the cluster and wait until we accumulate a > ton of hints. Start Cassandra on the node and immediately run "nodetool > decommission" on it. > The node streams its replicas and marks itself as DECOMMISSIONED, but other > nodes do not seem to see this message. "nodetool status" shows the > decommissioned node in state "UL" on all other nodes (it is also present in > system.peers), and Cassandra logs show that gossip tasks on nodes are not > proceeding (number of pending tasks keeps increasing). Jstack suggests that a > gossip task is blocked on hints dispatch (I can provide traces if this is not > obvious). Because the cluster is large and there are a lot of hints, this is > taking a while. > On inspecting "/var/lib/cassandra/hints" on the nodes, I see a bunch of hint > files for the decommissioned node. Documentation seems to suggest that these > hints should be deleted during "nodetool decommission", but it does not seem > to be the case here. This is the bug being reported. > To recover from this scenario, if I manually delete hint files on the nodes, > the hints dispatcher threads throw a bunch of exceptions and the > decommissioned node is now in state "DL" (perhaps it missed some gossip > messages?). The node is still in my "system.peers" table > Restarting Cassandra on all nodes after this step does not fix the issue (the > node remains in the peers table). In fact, after this point the > decommissioned node is in state "DN" -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[3/6] cassandra git commit: Interrupt replaying hints on decommission
Interrupt replaying hints on decommission Patch by Jeff Jirsa; Reviewed by Aleksey Yeschenko for CASSANDRA-13308 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/5089e74e Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/5089e74e Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/5089e74e Branch: refs/heads/trunk Commit: 5089e74ef4a0eaeb1c439d57f074de1c496421f2 Parents: 3110d27 Author: Jeff JirsaAuthored: Wed Apr 19 08:26:02 2017 -0700 Committer: Jeff Jirsa Committed: Wed Apr 19 08:57:45 2017 -0700 -- CHANGES.txt | 1 + .../apache/cassandra/hints/HintsDispatchExecutor.java| 8 src/java/org/apache/cassandra/hints/HintsDispatcher.java | 9 +++-- src/java/org/apache/cassandra/hints/HintsService.java| 11 ++- 4 files changed, 22 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/5089e74e/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 918c46b..e55d4cb 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,5 +1,6 @@ 3.0.14 * Handling partially written hint files (CASSANDRA-12728) + * Interrupt replaying hints on decommission (CASSANDRA-13308) 3.0.13 * Make reading of range tombstones more reliable (CASSANDRA-12811) http://git-wip-us.apache.org/repos/asf/cassandra/blob/5089e74e/src/java/org/apache/cassandra/hints/HintsDispatchExecutor.java -- diff --git a/src/java/org/apache/cassandra/hints/HintsDispatchExecutor.java b/src/java/org/apache/cassandra/hints/HintsDispatchExecutor.java index 333232d..58b30bd 100644 --- a/src/java/org/apache/cassandra/hints/HintsDispatchExecutor.java +++ b/src/java/org/apache/cassandra/hints/HintsDispatchExecutor.java @@ -117,6 +117,14 @@ final class HintsDispatchExecutor } } +void interruptDispatch(UUID hostId) +{ +Future future = scheduledDispatches.remove(hostId); + +if (null != future) +future.cancel(true); +} + private final class TransferHintsTask implements Runnable { private final HintsCatalog catalog; http://git-wip-us.apache.org/repos/asf/cassandra/blob/5089e74e/src/java/org/apache/cassandra/hints/HintsDispatcher.java -- diff --git a/src/java/org/apache/cassandra/hints/HintsDispatcher.java b/src/java/org/apache/cassandra/hints/HintsDispatcher.java index d7a3515..351b3fa 100644 --- a/src/java/org/apache/cassandra/hints/HintsDispatcher.java +++ b/src/java/org/apache/cassandra/hints/HintsDispatcher.java @@ -26,6 +26,8 @@ import java.util.concurrent.atomic.AtomicBoolean; import java.util.function.Function; import com.google.common.util.concurrent.RateLimiter; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; import org.apache.cassandra.config.DatabaseDescriptor; import org.apache.cassandra.gms.FailureDetector; @@ -42,6 +44,8 @@ import org.apache.cassandra.utils.concurrent.SimpleCondition; */ final class HintsDispatcher implements AutoCloseable { +private static final Logger logger = LoggerFactory.getLogger(HintsDispatcher.class); + private enum Action { CONTINUE, ABORT } private final HintsReader reader; @@ -181,7 +185,7 @@ final class HintsDispatcher implements AutoCloseable private static final class Callback implements IAsyncCallbackWithFailure { -enum Outcome { SUCCESS, TIMEOUT, FAILURE } +enum Outcome { SUCCESS, TIMEOUT, FAILURE, INTERRUPTED } private final long start = System.nanoTime(); private final SimpleCondition condition = new SimpleCondition(); @@ -198,7 +202,8 @@ final class HintsDispatcher implements AutoCloseable } catch (InterruptedException e) { -throw new AssertionError(e); +logger.warn("Hint dispatch was interrupted", e); +return Outcome.INTERRUPTED; } return timedOut ? Outcome.TIMEOUT : outcome; http://git-wip-us.apache.org/repos/asf/cassandra/blob/5089e74e/src/java/org/apache/cassandra/hints/HintsService.java -- diff --git a/src/java/org/apache/cassandra/hints/HintsService.java b/src/java/org/apache/cassandra/hints/HintsService.java index 5a32786..9cd4ed3 100644 --- a/src/java/org/apache/cassandra/hints/HintsService.java +++ b/src/java/org/apache/cassandra/hints/HintsService.java @@ -287,10 +287,11 @@ public final class HintsService implements HintsServiceMBean /** * Cleans up hints-related state
[5/6] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11
Merge branch 'cassandra-3.0' into cassandra-3.11 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/5f644548 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/5f644548 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/5f644548 Branch: refs/heads/cassandra-3.11 Commit: 5f6445480341fbcbf15cdf36f4dda5f1b1a93102 Parents: 9c54d02 5089e74 Author: Jeff JirsaAuthored: Wed Apr 19 08:58:02 2017 -0700 Committer: Jeff Jirsa Committed: Wed Apr 19 08:58:45 2017 -0700 -- CHANGES.txt | 1 + .../apache/cassandra/hints/HintsDispatchExecutor.java| 8 src/java/org/apache/cassandra/hints/HintsDispatcher.java | 9 +++-- src/java/org/apache/cassandra/hints/HintsService.java| 11 ++- 4 files changed, 22 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/5f644548/CHANGES.txt -- diff --cc CHANGES.txt index 1757266,e55d4cb..92ecb39 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,33 -1,8 +1,34 @@@ -3.0.14 - * Handling partially written hint files (CASSANDRA-12728) +3.11.0 + * V5 protocol flags decoding broken (CASSANDRA-13443) + * Use write lock not read lock for removing sstables from compaction strategies. (CASSANDRA-13422) + * Use corePoolSize equal to maxPoolSize in JMXEnabledThreadPoolExecutors (CASSANDRA-13329) + * Avoid rebuilding SASI indexes containing no values (CASSANDRA-12962) + * Add charset to Analyser input stream (CASSANDRA-13151) + * Fix testLimitSSTables flake caused by concurrent flush (CASSANDRA-12820) + * cdc column addition strikes again (CASSANDRA-13382) + * Fix static column indexes (CASSANDRA-13277) + * DataOutputBuffer.asNewBuffer broken (CASSANDRA-13298) + * unittest CipherFactoryTest failed on MacOS (CASSANDRA-13370) + * Forbid SELECT restrictions and CREATE INDEX over non-frozen UDT columns (CASSANDRA-13247) + * Default logging we ship will incorrectly print "?:?" for "%F:%L" pattern (CASSANDRA-13317) + * Possible AssertionError in UnfilteredRowIteratorWithLowerBound (CASSANDRA-13366) + * Support unaligned memory access for AArch64 (CASSANDRA-13326) + * Improve SASI range iterator efficiency on intersection with an empty range (CASSANDRA-12915). + * Fix equality comparisons of columns using the duration type (CASSANDRA-13174) + * Obfuscate password in stress-graphs (CASSANDRA-12233) + * Move to FastThreadLocalThread and FastThreadLocal (CASSANDRA-13034) + * nodetool stopdaemon errors out (CASSANDRA-13030) + * Tables in system_distributed should not use gcgs of 0 (CASSANDRA-12954) + * Fix primary index calculation for SASI (CASSANDRA-12910) + * More fixes to the TokenAllocator (CASSANDRA-12990) + * NoReplicationTokenAllocator should work with zero replication factor (CASSANDRA-12983) + * Address message coalescing regression (CASSANDRA-12676) + * Delete illegal character from StandardTokenizerImpl.jflex (CASSANDRA-13417) + * Fix cqlsh automatic protocol downgrade regression (CASSANDRA-13307) +Merged from 3.0: + * Interrupt replaying hints on decommission (CASSANDRA-13308) - -3.0.13 + * Handling partially written hint files (CASSANDRA-12728) + * Fix NPE issue in StorageService (CASSANDRA-13060) * Make reading of range tombstones more reliable (CASSANDRA-12811) * Fix startup problems due to schema tables not completely flushed (CASSANDRA-12213) * Fix view builder bug that can filter out data on restart (CASSANDRA-13405) http://git-wip-us.apache.org/repos/asf/cassandra/blob/5f644548/src/java/org/apache/cassandra/hints/HintsDispatchExecutor.java -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/5f644548/src/java/org/apache/cassandra/hints/HintsDispatcher.java -- diff --cc src/java/org/apache/cassandra/hints/HintsDispatcher.java index 3ac77a3,351b3fa..c432553 --- a/src/java/org/apache/cassandra/hints/HintsDispatcher.java +++ b/src/java/org/apache/cassandra/hints/HintsDispatcher.java @@@ -26,9 -26,11 +26,11 @@@ import java.util.function.BooleanSuppli import java.util.function.Function; import com.google.common.util.concurrent.RateLimiter; + import org.slf4j.Logger; + import org.slf4j.LoggerFactory; -import org.apache.cassandra.config.DatabaseDescriptor; -import org.apache.cassandra.gms.FailureDetector; +import org.apache.cassandra.exceptions.RequestFailureReason; +import org.apache.cassandra.metrics.HintsServiceMetrics; import org.apache.cassandra.net.IAsyncCallbackWithFailure; import org.apache.cassandra.net.MessageIn; import
[1/6] cassandra git commit: Interrupt replaying hints on decommission
Repository: cassandra Updated Branches: refs/heads/cassandra-3.0 3110d27dd -> 5089e74ef refs/heads/cassandra-3.11 9c54d02f7 -> 5f6445480 refs/heads/trunk 08c216d12 -> 9b1295e41 Interrupt replaying hints on decommission Patch by Jeff Jirsa; Reviewed by Aleksey Yeschenko for CASSANDRA-13308 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/5089e74e Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/5089e74e Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/5089e74e Branch: refs/heads/cassandra-3.0 Commit: 5089e74ef4a0eaeb1c439d57f074de1c496421f2 Parents: 3110d27 Author: Jeff JirsaAuthored: Wed Apr 19 08:26:02 2017 -0700 Committer: Jeff Jirsa Committed: Wed Apr 19 08:57:45 2017 -0700 -- CHANGES.txt | 1 + .../apache/cassandra/hints/HintsDispatchExecutor.java| 8 src/java/org/apache/cassandra/hints/HintsDispatcher.java | 9 +++-- src/java/org/apache/cassandra/hints/HintsService.java| 11 ++- 4 files changed, 22 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/5089e74e/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 918c46b..e55d4cb 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,5 +1,6 @@ 3.0.14 * Handling partially written hint files (CASSANDRA-12728) + * Interrupt replaying hints on decommission (CASSANDRA-13308) 3.0.13 * Make reading of range tombstones more reliable (CASSANDRA-12811) http://git-wip-us.apache.org/repos/asf/cassandra/blob/5089e74e/src/java/org/apache/cassandra/hints/HintsDispatchExecutor.java -- diff --git a/src/java/org/apache/cassandra/hints/HintsDispatchExecutor.java b/src/java/org/apache/cassandra/hints/HintsDispatchExecutor.java index 333232d..58b30bd 100644 --- a/src/java/org/apache/cassandra/hints/HintsDispatchExecutor.java +++ b/src/java/org/apache/cassandra/hints/HintsDispatchExecutor.java @@ -117,6 +117,14 @@ final class HintsDispatchExecutor } } +void interruptDispatch(UUID hostId) +{ +Future future = scheduledDispatches.remove(hostId); + +if (null != future) +future.cancel(true); +} + private final class TransferHintsTask implements Runnable { private final HintsCatalog catalog; http://git-wip-us.apache.org/repos/asf/cassandra/blob/5089e74e/src/java/org/apache/cassandra/hints/HintsDispatcher.java -- diff --git a/src/java/org/apache/cassandra/hints/HintsDispatcher.java b/src/java/org/apache/cassandra/hints/HintsDispatcher.java index d7a3515..351b3fa 100644 --- a/src/java/org/apache/cassandra/hints/HintsDispatcher.java +++ b/src/java/org/apache/cassandra/hints/HintsDispatcher.java @@ -26,6 +26,8 @@ import java.util.concurrent.atomic.AtomicBoolean; import java.util.function.Function; import com.google.common.util.concurrent.RateLimiter; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; import org.apache.cassandra.config.DatabaseDescriptor; import org.apache.cassandra.gms.FailureDetector; @@ -42,6 +44,8 @@ import org.apache.cassandra.utils.concurrent.SimpleCondition; */ final class HintsDispatcher implements AutoCloseable { +private static final Logger logger = LoggerFactory.getLogger(HintsDispatcher.class); + private enum Action { CONTINUE, ABORT } private final HintsReader reader; @@ -181,7 +185,7 @@ final class HintsDispatcher implements AutoCloseable private static final class Callback implements IAsyncCallbackWithFailure { -enum Outcome { SUCCESS, TIMEOUT, FAILURE } +enum Outcome { SUCCESS, TIMEOUT, FAILURE, INTERRUPTED } private final long start = System.nanoTime(); private final SimpleCondition condition = new SimpleCondition(); @@ -198,7 +202,8 @@ final class HintsDispatcher implements AutoCloseable } catch (InterruptedException e) { -throw new AssertionError(e); +logger.warn("Hint dispatch was interrupted", e); +return Outcome.INTERRUPTED; } return timedOut ? Outcome.TIMEOUT : outcome; http://git-wip-us.apache.org/repos/asf/cassandra/blob/5089e74e/src/java/org/apache/cassandra/hints/HintsService.java -- diff --git a/src/java/org/apache/cassandra/hints/HintsService.java b/src/java/org/apache/cassandra/hints/HintsService.java index 5a32786..9cd4ed3 100644 --- a/src/java/org/apache/cassandra/hints/HintsService.java
[6/6] cassandra git commit: Merge branch 'cassandra-3.11' into trunk
Merge branch 'cassandra-3.11' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/9b1295e4 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/9b1295e4 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/9b1295e4 Branch: refs/heads/trunk Commit: 9b1295e419b93a194b2270e5b31b689d3ab05dd2 Parents: 08c216d 5f64454 Author: Jeff JirsaAuthored: Wed Apr 19 08:58:55 2017 -0700 Committer: Jeff Jirsa Committed: Wed Apr 19 08:59:29 2017 -0700 -- CHANGES.txt | 1 + .../apache/cassandra/hints/HintsDispatchExecutor.java| 8 src/java/org/apache/cassandra/hints/HintsDispatcher.java | 9 +++-- src/java/org/apache/cassandra/hints/HintsService.java| 11 ++- 4 files changed, 22 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/9b1295e4/CHANGES.txt -- diff --cc CHANGES.txt index 13df7e6,92ecb39..c742570 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -80,7 -24,9 +80,8 @@@ * NoReplicationTokenAllocator should work with zero replication factor (CASSANDRA-12983) * Address message coalescing regression (CASSANDRA-12676) * Delete illegal character from StandardTokenizerImpl.jflex (CASSANDRA-13417) - * Fix cqlsh automatic protocol downgrade regression (CASSANDRA-13307) Merged from 3.0: + * Interrupt replaying hints on decommission (CASSANDRA-13308) * Handling partially written hint files (CASSANDRA-12728) * Fix NPE issue in StorageService (CASSANDRA-13060) * Make reading of range tombstones more reliable (CASSANDRA-12811) http://git-wip-us.apache.org/repos/asf/cassandra/blob/9b1295e4/src/java/org/apache/cassandra/hints/HintsDispatchExecutor.java -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/9b1295e4/src/java/org/apache/cassandra/hints/HintsDispatcher.java -- diff --cc src/java/org/apache/cassandra/hints/HintsDispatcher.java index 4a08540,c432553..323eeb1 --- a/src/java/org/apache/cassandra/hints/HintsDispatcher.java +++ b/src/java/org/apache/cassandra/hints/HintsDispatcher.java @@@ -26,8 -26,9 +26,10 @@@ import java.util.function.BooleanSuppli import java.util.function.Function; import com.google.common.util.concurrent.RateLimiter; + import org.slf4j.Logger; + import org.slf4j.LoggerFactory; +import org.apache.cassandra.db.monitoring.ApproximateTime; import org.apache.cassandra.exceptions.RequestFailureReason; import org.apache.cassandra.metrics.HintsServiceMetrics; import org.apache.cassandra.net.IAsyncCallbackWithFailure;
[4/6] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11
Merge branch 'cassandra-3.0' into cassandra-3.11 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/5f644548 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/5f644548 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/5f644548 Branch: refs/heads/trunk Commit: 5f6445480341fbcbf15cdf36f4dda5f1b1a93102 Parents: 9c54d02 5089e74 Author: Jeff JirsaAuthored: Wed Apr 19 08:58:02 2017 -0700 Committer: Jeff Jirsa Committed: Wed Apr 19 08:58:45 2017 -0700 -- CHANGES.txt | 1 + .../apache/cassandra/hints/HintsDispatchExecutor.java| 8 src/java/org/apache/cassandra/hints/HintsDispatcher.java | 9 +++-- src/java/org/apache/cassandra/hints/HintsService.java| 11 ++- 4 files changed, 22 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/5f644548/CHANGES.txt -- diff --cc CHANGES.txt index 1757266,e55d4cb..92ecb39 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,33 -1,8 +1,34 @@@ -3.0.14 - * Handling partially written hint files (CASSANDRA-12728) +3.11.0 + * V5 protocol flags decoding broken (CASSANDRA-13443) + * Use write lock not read lock for removing sstables from compaction strategies. (CASSANDRA-13422) + * Use corePoolSize equal to maxPoolSize in JMXEnabledThreadPoolExecutors (CASSANDRA-13329) + * Avoid rebuilding SASI indexes containing no values (CASSANDRA-12962) + * Add charset to Analyser input stream (CASSANDRA-13151) + * Fix testLimitSSTables flake caused by concurrent flush (CASSANDRA-12820) + * cdc column addition strikes again (CASSANDRA-13382) + * Fix static column indexes (CASSANDRA-13277) + * DataOutputBuffer.asNewBuffer broken (CASSANDRA-13298) + * unittest CipherFactoryTest failed on MacOS (CASSANDRA-13370) + * Forbid SELECT restrictions and CREATE INDEX over non-frozen UDT columns (CASSANDRA-13247) + * Default logging we ship will incorrectly print "?:?" for "%F:%L" pattern (CASSANDRA-13317) + * Possible AssertionError in UnfilteredRowIteratorWithLowerBound (CASSANDRA-13366) + * Support unaligned memory access for AArch64 (CASSANDRA-13326) + * Improve SASI range iterator efficiency on intersection with an empty range (CASSANDRA-12915). + * Fix equality comparisons of columns using the duration type (CASSANDRA-13174) + * Obfuscate password in stress-graphs (CASSANDRA-12233) + * Move to FastThreadLocalThread and FastThreadLocal (CASSANDRA-13034) + * nodetool stopdaemon errors out (CASSANDRA-13030) + * Tables in system_distributed should not use gcgs of 0 (CASSANDRA-12954) + * Fix primary index calculation for SASI (CASSANDRA-12910) + * More fixes to the TokenAllocator (CASSANDRA-12990) + * NoReplicationTokenAllocator should work with zero replication factor (CASSANDRA-12983) + * Address message coalescing regression (CASSANDRA-12676) + * Delete illegal character from StandardTokenizerImpl.jflex (CASSANDRA-13417) + * Fix cqlsh automatic protocol downgrade regression (CASSANDRA-13307) +Merged from 3.0: + * Interrupt replaying hints on decommission (CASSANDRA-13308) - -3.0.13 + * Handling partially written hint files (CASSANDRA-12728) + * Fix NPE issue in StorageService (CASSANDRA-13060) * Make reading of range tombstones more reliable (CASSANDRA-12811) * Fix startup problems due to schema tables not completely flushed (CASSANDRA-12213) * Fix view builder bug that can filter out data on restart (CASSANDRA-13405) http://git-wip-us.apache.org/repos/asf/cassandra/blob/5f644548/src/java/org/apache/cassandra/hints/HintsDispatchExecutor.java -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/5f644548/src/java/org/apache/cassandra/hints/HintsDispatcher.java -- diff --cc src/java/org/apache/cassandra/hints/HintsDispatcher.java index 3ac77a3,351b3fa..c432553 --- a/src/java/org/apache/cassandra/hints/HintsDispatcher.java +++ b/src/java/org/apache/cassandra/hints/HintsDispatcher.java @@@ -26,9 -26,11 +26,11 @@@ import java.util.function.BooleanSuppli import java.util.function.Function; import com.google.common.util.concurrent.RateLimiter; + import org.slf4j.Logger; + import org.slf4j.LoggerFactory; -import org.apache.cassandra.config.DatabaseDescriptor; -import org.apache.cassandra.gms.FailureDetector; +import org.apache.cassandra.exceptions.RequestFailureReason; +import org.apache.cassandra.metrics.HintsServiceMetrics; import org.apache.cassandra.net.IAsyncCallbackWithFailure; import org.apache.cassandra.net.MessageIn; import
[2/6] cassandra git commit: Interrupt replaying hints on decommission
Interrupt replaying hints on decommission Patch by Jeff Jirsa; Reviewed by Aleksey Yeschenko for CASSANDRA-13308 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/5089e74e Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/5089e74e Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/5089e74e Branch: refs/heads/cassandra-3.11 Commit: 5089e74ef4a0eaeb1c439d57f074de1c496421f2 Parents: 3110d27 Author: Jeff JirsaAuthored: Wed Apr 19 08:26:02 2017 -0700 Committer: Jeff Jirsa Committed: Wed Apr 19 08:57:45 2017 -0700 -- CHANGES.txt | 1 + .../apache/cassandra/hints/HintsDispatchExecutor.java| 8 src/java/org/apache/cassandra/hints/HintsDispatcher.java | 9 +++-- src/java/org/apache/cassandra/hints/HintsService.java| 11 ++- 4 files changed, 22 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/5089e74e/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 918c46b..e55d4cb 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,5 +1,6 @@ 3.0.14 * Handling partially written hint files (CASSANDRA-12728) + * Interrupt replaying hints on decommission (CASSANDRA-13308) 3.0.13 * Make reading of range tombstones more reliable (CASSANDRA-12811) http://git-wip-us.apache.org/repos/asf/cassandra/blob/5089e74e/src/java/org/apache/cassandra/hints/HintsDispatchExecutor.java -- diff --git a/src/java/org/apache/cassandra/hints/HintsDispatchExecutor.java b/src/java/org/apache/cassandra/hints/HintsDispatchExecutor.java index 333232d..58b30bd 100644 --- a/src/java/org/apache/cassandra/hints/HintsDispatchExecutor.java +++ b/src/java/org/apache/cassandra/hints/HintsDispatchExecutor.java @@ -117,6 +117,14 @@ final class HintsDispatchExecutor } } +void interruptDispatch(UUID hostId) +{ +Future future = scheduledDispatches.remove(hostId); + +if (null != future) +future.cancel(true); +} + private final class TransferHintsTask implements Runnable { private final HintsCatalog catalog; http://git-wip-us.apache.org/repos/asf/cassandra/blob/5089e74e/src/java/org/apache/cassandra/hints/HintsDispatcher.java -- diff --git a/src/java/org/apache/cassandra/hints/HintsDispatcher.java b/src/java/org/apache/cassandra/hints/HintsDispatcher.java index d7a3515..351b3fa 100644 --- a/src/java/org/apache/cassandra/hints/HintsDispatcher.java +++ b/src/java/org/apache/cassandra/hints/HintsDispatcher.java @@ -26,6 +26,8 @@ import java.util.concurrent.atomic.AtomicBoolean; import java.util.function.Function; import com.google.common.util.concurrent.RateLimiter; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; import org.apache.cassandra.config.DatabaseDescriptor; import org.apache.cassandra.gms.FailureDetector; @@ -42,6 +44,8 @@ import org.apache.cassandra.utils.concurrent.SimpleCondition; */ final class HintsDispatcher implements AutoCloseable { +private static final Logger logger = LoggerFactory.getLogger(HintsDispatcher.class); + private enum Action { CONTINUE, ABORT } private final HintsReader reader; @@ -181,7 +185,7 @@ final class HintsDispatcher implements AutoCloseable private static final class Callback implements IAsyncCallbackWithFailure { -enum Outcome { SUCCESS, TIMEOUT, FAILURE } +enum Outcome { SUCCESS, TIMEOUT, FAILURE, INTERRUPTED } private final long start = System.nanoTime(); private final SimpleCondition condition = new SimpleCondition(); @@ -198,7 +202,8 @@ final class HintsDispatcher implements AutoCloseable } catch (InterruptedException e) { -throw new AssertionError(e); +logger.warn("Hint dispatch was interrupted", e); +return Outcome.INTERRUPTED; } return timedOut ? Outcome.TIMEOUT : outcome; http://git-wip-us.apache.org/repos/asf/cassandra/blob/5089e74e/src/java/org/apache/cassandra/hints/HintsService.java -- diff --git a/src/java/org/apache/cassandra/hints/HintsService.java b/src/java/org/apache/cassandra/hints/HintsService.java index 5a32786..9cd4ed3 100644 --- a/src/java/org/apache/cassandra/hints/HintsService.java +++ b/src/java/org/apache/cassandra/hints/HintsService.java @@ -287,10 +287,11 @@ public final class HintsService implements HintsServiceMBean /** * Cleans up hints-related
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974885#comment-15974885 ] Christian Esken edited comment on CASSANDRA-13265 at 4/19/17 3:14 PM: -- No problem. I was away for Easter, so I did not even notice you being busy. I just started my CircleCI test for the first time. Its working on the first branch (trunk) for an hour and is not complete yet, so I guess with all the branches it can take a day to complete. I have restarted the build with more parallelism and hopefully that will create a more acceptable turnaround time. I will send an update whenever that is complete. https://circleci.com/gh/christian-esken/cassandra/3 was (Author: cesken): No problem. I was away for Easter, so I did not even notice you being busy. I just started my CircleCI test for the first time. Its working on the first branch (trunk) for an hour and is not complete yet, so I guess with all the branches it can take a day to complete. I have restarted the build with more parallelism and will send an update whenever that is complete. Hopefully that will create a more acceptable turnaround time. https://circleci.com/gh/christian-esken/cassandra/3 > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974885#comment-15974885 ] Christian Esken commented on CASSANDRA-13265: - No problem. I was away for Easter, so I did not even notice you being busy. I just started my CircleCI test for the first time. Its working on the first branch (trunk) for an hour and is not complete yet, so I guess with all the branches it can take a day to complete. I have restarted the build with more parallelism and will send an update whenever that is complete. Hopefully that will create a more acceptable turnaround time. https://circleci.com/gh/christian-esken/cassandra/3 > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies
[ https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974873#comment-15974873 ] Sylvain Lebresne commented on CASSANDRA-12126: -- Exactly. > CAS Reads Inconsistencies > -- > > Key: CASSANDRA-12126 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12126 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: sankalp kohli >Assignee: Stefan Podkowinski > > While looking at the CAS code in Cassandra, I found a potential issue with > CAS Reads. Here is how it can happen with RF=3 > 1) You issue a CAS Write and it fails in the propose phase. A machine replies > true to a propose and saves the commit in accepted filed. The other two > machines B and C does not get to the accept phase. > Current state is that machine A has this commit in paxos table as accepted > but not committed and B and C does not. > 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the > value written in step 1. This step is as if nothing is inflight. > 3) Issue another CAS Read and it goes to A and B. Now we will discover that > there is something inflight from A and will propose and commit it with the > current ballot. Now we can read the value written in step 1 as part of this > CAS read. > If we skip step 3 and instead run step 4, we will never learn about value > written in step 1. > 4. Issue a CAS Write and it involves only B and C. This will succeed and > commit a different value than step 1. Step 1 value will never be seen again > and was never seen before. > If you read the Lamport “paxos made simple” paper and read section 2.3. It > talks about this issue which is how learners can find out if majority of the > acceptors have accepted the proposal. > In step 3, it is correct that we propose the value again since we dont know > if it was accepted by majority of acceptors. When we ask majority of > acceptors, and more than one acceptors but not majority has something in > flight, we have no way of knowing if it is accepted by majority of acceptors. > So this behavior is correct. > However we need to fix step 2, since it caused reads to not be linearizable > with respect to writes and other reads. In this case, we know that majority > of acceptors have no inflight commit which means we have majority that > nothing was accepted by majority. I think we should run a propose step here > with empty commit and that will cause write written in step 1 to not be > visible ever after. > With this fix, we will either see data written in step 1 on next serial read > or will never see it which is what we want. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies
[ https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974834#comment-15974834 ] Jonathan Ellis edited comment on CASSANDRA-12126 at 4/19/17 2:56 PM: - I see. So you are saying that 1: Write 2: Read -> Nothing 3: Read -> Something Is broken because to go from Nothing to Something [in a linearized system] there needs to be a write in between. was (Author: jbellis): I see. So you are saying that 1: Write 2: Read -> Nothing 3: Read -> Something Is broken because to go from Nothing to Something there needs to be a write in between. > CAS Reads Inconsistencies > -- > > Key: CASSANDRA-12126 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12126 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: sankalp kohli >Assignee: Stefan Podkowinski > > While looking at the CAS code in Cassandra, I found a potential issue with > CAS Reads. Here is how it can happen with RF=3 > 1) You issue a CAS Write and it fails in the propose phase. A machine replies > true to a propose and saves the commit in accepted filed. The other two > machines B and C does not get to the accept phase. > Current state is that machine A has this commit in paxos table as accepted > but not committed and B and C does not. > 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the > value written in step 1. This step is as if nothing is inflight. > 3) Issue another CAS Read and it goes to A and B. Now we will discover that > there is something inflight from A and will propose and commit it with the > current ballot. Now we can read the value written in step 1 as part of this > CAS read. > If we skip step 3 and instead run step 4, we will never learn about value > written in step 1. > 4. Issue a CAS Write and it involves only B and C. This will succeed and > commit a different value than step 1. Step 1 value will never be seen again > and was never seen before. > If you read the Lamport “paxos made simple” paper and read section 2.3. It > talks about this issue which is how learners can find out if majority of the > acceptors have accepted the proposal. > In step 3, it is correct that we propose the value again since we dont know > if it was accepted by majority of acceptors. When we ask majority of > acceptors, and more than one acceptors but not majority has something in > flight, we have no way of knowing if it is accepted by majority of acceptors. > So this behavior is correct. > However we need to fix step 2, since it caused reads to not be linearizable > with respect to writes and other reads. In this case, we know that majority > of acceptors have no inflight commit which means we have majority that > nothing was accepted by majority. I think we should run a propose step here > with empty commit and that will cause write written in step 1 to not be > visible ever after. > With this fix, we will either see data written in step 1 on next serial read > or will never see it which is what we want. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies
[ https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974834#comment-15974834 ] Jonathan Ellis commented on CASSANDRA-12126: I see. So you are saying that 1: Write 2: Read -> Nothing 3: Read -> Something Is broken because to go from Nothing to Something there needs to be a write in between. > CAS Reads Inconsistencies > -- > > Key: CASSANDRA-12126 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12126 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: sankalp kohli >Assignee: Stefan Podkowinski > > While looking at the CAS code in Cassandra, I found a potential issue with > CAS Reads. Here is how it can happen with RF=3 > 1) You issue a CAS Write and it fails in the propose phase. A machine replies > true to a propose and saves the commit in accepted filed. The other two > machines B and C does not get to the accept phase. > Current state is that machine A has this commit in paxos table as accepted > but not committed and B and C does not. > 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the > value written in step 1. This step is as if nothing is inflight. > 3) Issue another CAS Read and it goes to A and B. Now we will discover that > there is something inflight from A and will propose and commit it with the > current ballot. Now we can read the value written in step 1 as part of this > CAS read. > If we skip step 3 and instead run step 4, we will never learn about value > written in step 1. > 4. Issue a CAS Write and it involves only B and C. This will succeed and > commit a different value than step 1. Step 1 value will never be seen again > and was never seen before. > If you read the Lamport “paxos made simple” paper and read section 2.3. It > talks about this issue which is how learners can find out if majority of the > acceptors have accepted the proposal. > In step 3, it is correct that we propose the value again since we dont know > if it was accepted by majority of acceptors. When we ask majority of > acceptors, and more than one acceptors but not majority has something in > flight, we have no way of knowing if it is accepted by majority of acceptors. > So this behavior is correct. > However we need to fix step 2, since it caused reads to not be linearizable > with respect to writes and other reads. In this case, we know that majority > of acceptors have no inflight commit which means we have majority that > nothing was accepted by majority. I think we should run a propose step here > with empty commit and that will cause write written in step 1 to not be > visible ever after. > With this fix, we will either see data written in step 1 on next serial read > or will never see it which is what we want. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[1/2] cassandra git commit: Fix cqlsh automatic protocol downgrade regression Patch by Matt Byrd; reviewed by Mick Semb Wever for CASSANDRA-13307
Repository: cassandra Updated Branches: refs/heads/trunk e52420624 -> 08c216d12 Fix cqlsh automatic protocol downgrade regression Patch by Matt Byrd; reviewed by Mick Semb Wever for CASSANDRA-13307 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/9c54d02f Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/9c54d02f Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/9c54d02f Branch: refs/heads/trunk Commit: 9c54d02f73245d3a9a05d37f7d0002421abb852f Parents: 65c1fdd Author: Matt ByrdAuthored: Wed Mar 8 13:55:01 2017 -0800 Committer: Mick Semb Wever Committed: Wed Apr 19 16:15:37 2017 +1000 -- CHANGES.txt | 1 + bin/cqlsh.py | 19 +-- 2 files changed, 14 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/9c54d02f/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 19d8162..1757266 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -24,6 +24,7 @@ * NoReplicationTokenAllocator should work with zero replication factor (CASSANDRA-12983) * Address message coalescing regression (CASSANDRA-12676) * Delete illegal character from StandardTokenizerImpl.jflex (CASSANDRA-13417) + * Fix cqlsh automatic protocol downgrade regression (CASSANDRA-13307) Merged from 3.0: * Handling partially written hint files (CASSANDRA-12728) * Fix NPE issue in StorageService (CASSANDRA-13060) http://git-wip-us.apache.org/repos/asf/cassandra/blob/9c54d02f/bin/cqlsh.py -- diff --git a/bin/cqlsh.py b/bin/cqlsh.py index 2387342..e765dee 100644 --- a/bin/cqlsh.py +++ b/bin/cqlsh.py @@ -178,7 +178,6 @@ from cqlshlib.util import get_file_encoding_bomsize, trim_if_present DEFAULT_HOST = '127.0.0.1' DEFAULT_PORT = 9042 DEFAULT_SSL = False -DEFAULT_PROTOCOL_VERSION = 4 DEFAULT_CONNECT_TIMEOUT_SECONDS = 5 DEFAULT_REQUEST_TIMEOUT_SECONDS = 10 @@ -223,6 +222,9 @@ parser.add_option('--cqlversion', default=None, help='Specify a particular CQL version, ' 'by default the highest version supported by the server will be used.' ' Examples: "3.0.3", "3.1.0"') +parser.add_option("--protocol-version", type="int", default=None, + help='Specify a specific protcol version otherwise the client will default and downgrade as necessary') + parser.add_option("-e", "--execute", help='Execute the statement and quit.') parser.add_option("--connect-timeout", default=DEFAULT_CONNECT_TIMEOUT_SECONDS, dest='connect_timeout', help='Specify the connection timeout in seconds (default: %default seconds).') @@ -449,7 +451,7 @@ class Shell(cmd.Cmd): ssl=False, single_statement=None, request_timeout=DEFAULT_REQUEST_TIMEOUT_SECONDS, - protocol_version=DEFAULT_PROTOCOL_VERSION, + protocol_version=None, connect_timeout=DEFAULT_CONNECT_TIMEOUT_SECONDS): cmd.Cmd.__init__(self, completekey=completekey) self.hostname = hostname @@ -468,13 +470,16 @@ class Shell(cmd.Cmd): if use_conn: self.conn = use_conn else: +kwargs = {} +if protocol_version is not None: +kwargs['protocol_version'] = protocol_version self.conn = Cluster(contact_points=(self.hostname,), port=self.port, cql_version=cqlver, -protocol_version=protocol_version, auth_provider=self.auth_provider, ssl_options=sslhandling.ssl_settings(hostname, CONFIG_FILE) if ssl else None, load_balancing_policy=WhiteListRoundRobinPolicy([self.hostname]), control_connection_timeout=connect_timeout, -connect_timeout=connect_timeout) +connect_timeout=connect_timeout, +**kwargs) self.owns_connection = not use_conn if keyspace: @@ -1673,9 +1678,9 @@ class Shell(cmd.Cmd): direction = parsed.get_binding('dir').upper() if direction == 'FROM': -task = ImportTask(self, ks, table, columns, fname, opts, DEFAULT_PROTOCOL_VERSION, CONFIG_FILE) +task = ImportTask(self, ks, table, columns, fname, opts, self.conn.protocol_version, CONFIG_FILE) elif direction == 'TO': -task = ExportTask(self, ks, table, columns, fname, opts, DEFAULT_PROTOCOL_VERSION, CONFIG_FILE) +task = ExportTask(self, ks, table, columns,
[2/2] cassandra git commit: Merge branch 'cassandra-3.11' into trunk
Merge branch 'cassandra-3.11' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/08c216d1 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/08c216d1 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/08c216d1 Branch: refs/heads/trunk Commit: 08c216d125e5c8ed33a3403cde185f4e84d31895 Parents: e524206 9c54d02 Author: Mick Semb WeverAuthored: Thu Apr 20 00:44:52 2017 +1000 Committer: Mick Semb Wever Committed: Thu Apr 20 00:44:52 2017 +1000 -- --
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974671#comment-15974671 ] Ariel Weisberg commented on CASSANDRA-13265: Sorry I just had a really busy week last week and I've been trying to get Circle to the point it can run the dtests. I'm mostly there it's just a few failing tests that remain. > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13308) Gossip breaks, Hint files not being deleted on nodetool decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-13308: -- Status: Ready to Commit (was: Patch Available) > Gossip breaks, Hint files not being deleted on nodetool decommission > > > Key: CASSANDRA-13308 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13308 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Using Cassandra version 3.0.9 >Reporter: Arijit >Assignee: Jeff Jirsa > Fix For: 3.0.x, 3.11.x, 4.x > > Attachments: 28207.stack, logs, logs_decommissioned_node > > > How to reproduce the issue I'm seeing: > Shut down Cassandra on one node of the cluster and wait until we accumulate a > ton of hints. Start Cassandra on the node and immediately run "nodetool > decommission" on it. > The node streams its replicas and marks itself as DECOMMISSIONED, but other > nodes do not seem to see this message. "nodetool status" shows the > decommissioned node in state "UL" on all other nodes (it is also present in > system.peers), and Cassandra logs show that gossip tasks on nodes are not > proceeding (number of pending tasks keeps increasing). Jstack suggests that a > gossip task is blocked on hints dispatch (I can provide traces if this is not > obvious). Because the cluster is large and there are a lot of hints, this is > taking a while. > On inspecting "/var/lib/cassandra/hints" on the nodes, I see a bunch of hint > files for the decommissioned node. Documentation seems to suggest that these > hints should be deleted during "nodetool decommission", but it does not seem > to be the case here. This is the bug being reported. > To recover from this scenario, if I manually delete hint files on the nodes, > the hints dispatcher threads throw a bunch of exceptions and the > decommissioned node is now in state "DL" (perhaps it missed some gossip > messages?). The node is still in my "system.peers" table > Restarting Cassandra on all nodes after this step does not fix the issue (the > node remains in the peers table). In fact, after this point the > decommissioned node is in state "DN" -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13308) Gossip breaks, Hint files not being deleted on nodetool decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974665#comment-15974665 ] Aleksey Yeschenko commented on CASSANDRA-13308: --- - {{HintsDispatchExecutor.interruptDispatch()}} only uses the {{hostId}} field of the passed {{HintsStore}} instance, so we might as well just pass the host id directly - in the same method, you should replace {{scheduledDispatches.get()}} call with a call to {{remove()}}, thus eliminating a redundant {{remove()}} later down the line. - no need for the racy {{isDone()}} check either, it doesn't save us anything So, ultimately, just {code} void interruptDispatch(UUID hostId) { Future future = scheduledDispatches.remove(hostId); if (null != future) future.cancel(true); } {code} should be enough. But these are nits, can address on commit. LGTM overall, +1. > Gossip breaks, Hint files not being deleted on nodetool decommission > > > Key: CASSANDRA-13308 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13308 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Using Cassandra version 3.0.9 >Reporter: Arijit >Assignee: Jeff Jirsa > Fix For: 3.0.x, 3.11.x, 4.x > > Attachments: 28207.stack, logs, logs_decommissioned_node > > > How to reproduce the issue I'm seeing: > Shut down Cassandra on one node of the cluster and wait until we accumulate a > ton of hints. Start Cassandra on the node and immediately run "nodetool > decommission" on it. > The node streams its replicas and marks itself as DECOMMISSIONED, but other > nodes do not seem to see this message. "nodetool status" shows the > decommissioned node in state "UL" on all other nodes (it is also present in > system.peers), and Cassandra logs show that gossip tasks on nodes are not > proceeding (number of pending tasks keeps increasing). Jstack suggests that a > gossip task is blocked on hints dispatch (I can provide traces if this is not > obvious). Because the cluster is large and there are a lot of hints, this is > taking a while. > On inspecting "/var/lib/cassandra/hints" on the nodes, I see a bunch of hint > files for the decommissioned node. Documentation seems to suggest that these > hints should be deleted during "nodetool decommission", but it does not seem > to be the case here. This is the bug being reported. > To recover from this scenario, if I manually delete hint files on the nodes, > the hints dispatcher threads throw a bunch of exceptions and the > decommissioned node is now in state "DL" (perhaps it missed some gossip > messages?). The node is still in my "system.peers" table > Restarting Cassandra on all nodes after this step does not fix the issue (the > node remains in the peers table). In fact, after this point the > decommissioned node is in state "DN" -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13397) Return value of CountDownLatch.await() not being checked in Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-13397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974643#comment-15974643 ] Paulo Motta commented on CASSANDRA-13397: - Tests look good but there was a minor conflict when merging to trunk so I will submit a new CI round with the trunk patch: ||trunk|| |[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-13397]| |[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-13397-testall/lastCompletedBuild/testReport/]| |[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-13397-dtest/lastCompletedBuild/testReport/]| > Return value of CountDownLatch.await() not being checked in Repair > -- > > Key: CASSANDRA-13397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13397 > Project: Cassandra > Issue Type: Bug >Reporter: Simon Zhou >Assignee: Simon Zhou >Priority: Minor > Fix For: 3.0.x > > Attachments: CASSANDRA-13397-v1.patch > > > While looking into repair code, I realize that we should check return value > of CountDownLatch.await(). Most of the places that we don't check the return > value, nothing bad would happen due to other protection. However, > ActiveRepairService#prepareForRepair should have the check. Code to reproduce: > {code} > public static void testLatch() throws InterruptedException { > CountDownLatch latch = new CountDownLatch(2); > latch.countDown(); > new Thread(() -> { > try { > Thread.sleep(1200); > } catch (InterruptedException e) { > System.err.println("interrupted"); > } > latch.countDown(); > System.out.println("counted down"); > }).start(); > latch.await(1, TimeUnit.SECONDS); > if (latch.getCount() > 0) { > System.err.println("failed"); > } else { > System.out.println("success"); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13275) Cassandra throws an exception during CQL select query filtering on map key
[ https://issues.apache.org/jira/browse/CASSANDRA-13275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973113#comment-15973113 ] Alex Petrov edited comment on CASSANDRA-13275 at 4/19/17 1:21 PM: -- Partition key filtering was introduced in [CASSANDRA-11031], although {{CONTAINS}} didn't trigger filtering, the read path was trying to convert {{CONTAINS}} restriction to bounds. |[3.11|https://github.com/apache/cassandra/compare/3.11...ifesdjeen:13275-3.11]|[testall|http://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13275-3.11-testall/]|[dtest|http://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13275-3.11-dtest/]| |[trunk|https://github.com/apache/cassandra/compare/trunk...ifesdjeen:13275-trunk]|[testall|http://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13275-trunk-testall/]|[dtest|http://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13275-trunk-dtest/]| |[dtest branch|https://github.com/riptano/cassandra-dtest/compare/master...ifesdjeen:13275-master]| This is not applicable to 3.0 since we do not allow partition key filtering there. was (Author: ifesdjeen): Partition key filtering was introduced in [CASSANDRA-11031], although {{CONTAINS}} didn't trigger filtering, the read path was trying to convert {{CONTAINS}} restriction to bounds. |[3.11|https://github.com/apache/cassandra/compare/3.11...ifesdjeen:13275-3.11]|[testall|http://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13275-3.11-testall/]|[dtest|http://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13275-3.11-dtest/]| |[trunk|https://github.com/apache/cassandra/compare/trunk...ifesdjeen:13275-trunk]|[testall|http://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13275-trunk-testall/]|[dtest|http://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13275-trunk-dtest/]| > Cassandra throws an exception during CQL select query filtering on map key > --- > > Key: CASSANDRA-13275 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13275 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: Abderrahmane CHRAIBI >Assignee: Alex Petrov > > Env: cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4 > Using this table structure: > {code}CREATE TABLE mytable ( > mymap frozen
[jira] [Commented] (CASSANDRA-13462) Unexpected behaviour with range queries on UUIDs
[ https://issues.apache.org/jira/browse/CASSANDRA-13462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974597#comment-15974597 ] Sylvain Lebresne commented on CASSANDRA-13462: -- bq. Is this behaviour documented somewhere? Well, "somewhere" includes a lot of places, but it's admitedly not in the official doc, which is light on details about how the different types compare exactly. Contributions to the doc are welcome. With that said, I don't think that's terribly important because UUID (unless they are TimeUUID, in which case they do sort in the most useful way), are usually randomly generated and so I'm not sure how they sort matters much. bq. I think it would be quite a feat to find someone who was relying on this behaviour! To clarify, when I say that changing that would break users, I'm actually not talking about user relying on any particular ordering in their result set, even though it would indeed break that and is not worth the trouble for that reason alone (and btw, the reason the comparator checks the UUID version first, is because it sorts time uuids (version 1) by their time component first, which can have its uses, so I wouldn't be as confident as you seem to be that no-one rely on the current behavior). I'm talking of the fact that data is sorted on disk and changing how any type sorts thing is impossible without basically destroying existing data. > Unexpected behaviour with range queries on UUIDs > > > Key: CASSANDRA-13462 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13462 > Project: Cassandra > Issue Type: Bug >Reporter: Andrew Jefferson > > My expectation is that UUIDs should behave as 128 bit integers for comparison. > However it seems that the Cassandra implementation compares first the uuid > version number, then the remaining values of the uuid. > e.g. in C* > 1000--3000-- > is greater than > 2000--1000-- > (n.b. the 13th value is the uuid version) > - this is consistent across range queries and using ORDER BY -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974590#comment-15974590 ] Christian Esken commented on CASSANDRA-13265: - bq. For CHANGES.TXT the entry should go at the top of the list of entries for the version the change is for. I don't know why. I also haven't seen this mentioned. Probably someone could add that to https://wiki.apache.org/cassandra/HowToContribute or http://cassandra.apache.org/doc/latest/development/how_to_commit.html . Anyhow I have fixed that. bq. set up with CircleCI [...] Also you transposed 13625 and 13265 I changed the branches to correct the transposing 13625 and 13265. I didn't find any other place than the branch names. I will try to find out about how to do the CircleCI stuff. Meanwhile here are the updated links: https://github.com/christian-esken/cassandra/commits/cassandra-13265-2.2 https://github.com/christian-esken/cassandra/commits/cassandra-13265-3.0 https://github.com/christian-esken/cassandra/commits/cassandra-13265-3.11 https://github.com/christian-esken/cassandra/commits/cassandra-13265-trunk > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13462) Unexpected behaviour with range queries on UUIDs
[ https://issues.apache.org/jira/browse/CASSANDRA-13462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974559#comment-15974559 ] Andrew Jefferson commented on CASSANDRA-13462: -- Thanks for the quick reply - Is this behaviour documented somewhere? I think it would be quite a feat to find someone who was relying on this behaviour! > Unexpected behaviour with range queries on UUIDs > > > Key: CASSANDRA-13462 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13462 > Project: Cassandra > Issue Type: Bug >Reporter: Andrew Jefferson > > My expectation is that UUIDs should behave as 128 bit integers for comparison. > However it seems that the Cassandra implementation compares first the uuid > version number, then the remaining values of the uuid. > e.g. in C* > 1000--3000-- > is greater than > 2000--1000-- > (n.b. the 13th value is the uuid version) > - this is consistent across range queries and using ORDER BY -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13462) Unexpected behaviour with range queries on UUIDs
[ https://issues.apache.org/jira/browse/CASSANDRA-13462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974557#comment-15974557 ] Andrew Jefferson commented on CASSANDRA-13462: -- cqlsh> select * from dev.testinguuids where pk=6 ORDER BY ck ; pk | ck | val +--+- 6 | 1000--0200-- | 1 6 | 2000--0200-- | 1 6 | 1000--1200-- | 1 6 | 2000--1200-- | 1 6 | 1000--2200-- | 1 6 | 2000--2200-- | 1 > Unexpected behaviour with range queries on UUIDs > > > Key: CASSANDRA-13462 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13462 > Project: Cassandra > Issue Type: Bug >Reporter: Andrew Jefferson > > My expectation is that UUIDs should behave as 128 bit integers for comparison. > However it seems that the Cassandra implementation compares first the uuid > version number, then the remaining values of the uuid. > e.g. in C* > 1000--3000-- > is greater than > 2000--1000-- > (n.b. the 13th value is the uuid version) > - this is consistent across range queries and using ORDER BY -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Issue Comment Deleted] (CASSANDRA-13462) Unexpected behaviour with range queries on UUIDs
[ https://issues.apache.org/jira/browse/CASSANDRA-13462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Jefferson updated CASSANDRA-13462: - Comment: was deleted (was: cqlsh> select * from dev.testinguuids where pk=6 ORDER BY ck ; pk | ck | val +--+- 6 | 1000--0200-- | 1 6 | 2000--0200-- | 1 6 | 1000--1200-- | 1 6 | 2000--1200-- | 1 6 | 1000--2200-- | 1 6 | 2000--2200-- | 1) > Unexpected behaviour with range queries on UUIDs > > > Key: CASSANDRA-13462 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13462 > Project: Cassandra > Issue Type: Bug >Reporter: Andrew Jefferson > > My expectation is that UUIDs should behave as 128 bit integers for comparison. > However it seems that the Cassandra implementation compares first the uuid > version number, then the remaining values of the uuid. > e.g. in C* > 1000--3000-- > is greater than > 2000--1000-- > (n.b. the 13th value is the uuid version) > - this is consistent across range queries and using ORDER BY -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CASSANDRA-13462) Unexpected behaviour with range queries on UUIDs
[ https://issues.apache.org/jira/browse/CASSANDRA-13462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne resolved CASSANDRA-13462. -- Resolution: Won't Fix I'm sorry this isn't working as you expected it, but it is the way it works (and has been working for years) and we can't change that without breaking every user that uses UUID ever which is obviously out of question. > Unexpected behaviour with range queries on UUIDs > > > Key: CASSANDRA-13462 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13462 > Project: Cassandra > Issue Type: Bug >Reporter: Andrew Jefferson > > My expectation is that UUIDs should behave as 128 bit integers for comparison. > However it seems that the Cassandra implementation compares first the uuid > version number, then the remaining values of the uuid. > e.g. in C* > 1000--3000-- > is greater than > 2000--1000-- > (n.b. the 13th value is the uuid version) > - this is consistent across range queries and using ORDER BY -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13462) Unexpected behaviour with range queries on UUIDs
[ https://issues.apache.org/jira/browse/CASSANDRA-13462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Jefferson updated CASSANDRA-13462: - Description: My expectation is that UUIDs should behave as 128 bit integers for comparison. However it seems that the Cassandra implementation compares first the uuid version number, then the remaining values of the uuid. e.g. in C* 1000--3000-- is greater than 2000--1000-- (n.b. the 13th value is the uuid version) - this is consistent across range queries and using ORDER BY was: My expectation is that UUIDs should behave as 128 bit integers for comparison. However it seems that the Cassandra implementation compares first the uuid version number, then the remaining values of the uuid. e.g. in C* 1000--3000-- is greater than 2000--1000-- I expect range queries / comparisons on UUIDs to work as though this is the case. But it does not. It seems to require the UUID to have certain properties for range queries to work properly: ``` create table dev.testinguuids ( pk int, ck uuid, val int, PRIMARY KEY ((pk), ck) ) insert into dev.testinguuids (pk,ck,val) VALUES (1, 3000----, 1) select * from dev.testinguuids where pk=1 and ck>1000----; -> returns 1 row select * from dev.testinguuids where pk=1 and ck>--5000--; - > returns 0 rows ``` after a bit of investigation of UUIDs it works correctly for me if I force my query UUIDs to be in the form: --05xx-- i.e. select * from dev.testinguuids where pk=1 and ck>--05xx-- works ok n.b. I have populated my table only with valid type 1 and type 4 uuids. In testing if I create uuids that are of the form: --YYxx-- where YY > 05 then they behave differently as well: ``` # Insert a valid uuid insert into dev.testinguuids (pk,ck,val) VALUES (2, 3000----, 1) insert into dev.testinguuids (pk,ck,val) VALUES (2, 3000----, 1) ``` > Unexpected behaviour with range queries on UUIDs > > > Key: CASSANDRA-13462 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13462 > Project: Cassandra > Issue Type: Bug >Reporter: Andrew Jefferson > > My expectation is that UUIDs should behave as 128 bit integers for comparison. > However it seems that the Cassandra implementation compares first the uuid > version number, then the remaining values of the uuid. > e.g. in C* > 1000--3000-- > is greater than > 2000--1000-- > (n.b. the 13th value is the uuid version) > - this is consistent across range queries and using ORDER BY -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CASSANDRA-13462) Unexpected behaviour with range queries on UUIDs
Andrew Jefferson created CASSANDRA-13462: Summary: Unexpected behaviour with range queries on UUIDs Key: CASSANDRA-13462 URL: https://issues.apache.org/jira/browse/CASSANDRA-13462 Project: Cassandra Issue Type: Bug Reporter: Andrew Jefferson My expectation is that UUIDs should behave as 128 bit integers for comparison. However it seems that the Cassandra implementation compares first the uuid version number, then the remaining values of the uuid. e.g. in C* 1000--3000-- is greater than 2000--1000-- I expect range queries / comparisons on UUIDs to work as though this is the case. But it does not. It seems to require the UUID to have certain properties for range queries to work properly: ``` create table dev.testinguuids ( pk int, ck uuid, val int, PRIMARY KEY ((pk), ck) ) insert into dev.testinguuids (pk,ck,val) VALUES (1, 3000----, 1) select * from dev.testinguuids where pk=1 and ck>1000----; -> returns 1 row select * from dev.testinguuids where pk=1 and ck>--5000--; - > returns 0 rows ``` after a bit of investigation of UUIDs it works correctly for me if I force my query UUIDs to be in the form: --05xx-- i.e. select * from dev.testinguuids where pk=1 and ck>--05xx-- works ok n.b. I have populated my table only with valid type 1 and type 4 uuids. In testing if I create uuids that are of the form: --YYxx-- where YY > 05 then they behave differently as well: ``` # Insert a valid uuid insert into dev.testinguuids (pk,ck,val) VALUES (2, 3000----, 1) insert into dev.testinguuids (pk,ck,val) VALUES (2, 3000----, 1) ``` -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13257) Add repair streaming preview
[ https://issues.apache.org/jira/browse/CASSANDRA-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974353#comment-15974353 ] Stefan Podkowinski commented on CASSANDRA-13257: This is a new feature that should be covered in the docs and NEWS.txt. > Add repair streaming preview > > > Key: CASSANDRA-13257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13257 > Project: Cassandra > Issue Type: New Feature > Components: Streaming and Messaging >Reporter: Blake Eggleston >Assignee: Blake Eggleston > Fix For: 4.0 > > > It would be useful to be able to estimate the amount of repair streaming that > needs to be done, without actually doing any streaming. Our main motivation > for this having something this is validating CASSANDRA-9143 in production, > but I’d imagine it could also be a useful tool in troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies
[ https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974342#comment-15974342 ] Sylvain Lebresne commented on CASSANDRA-12126: -- bq. What is the distinction you are proposing? Not sure, I think we don't put the same definitions on operation visibility. What I'm saying is that "if an operation has a visible outcome, then that outcome should be visible (by serial operations) by any subsequent operation (so as soon as the operation returns to the client if you will)". In particular, if a serial read follows a serial write (meaning that it's started after the write returned, even with a timeout), then if the write has any effect, the read should see it. Note that when you get a timeout on the initial write, you don't know if the write has been applied or not, but the whole point of a serial read is to be able to unequivocally decide what was that outcome. If we can't guarantee that, if there is no way to observe if a timed-out write has been applied or not, then I'm not sure how one would use LWT in the first place. > CAS Reads Inconsistencies > -- > > Key: CASSANDRA-12126 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12126 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: sankalp kohli >Assignee: Stefan Podkowinski > > While looking at the CAS code in Cassandra, I found a potential issue with > CAS Reads. Here is how it can happen with RF=3 > 1) You issue a CAS Write and it fails in the propose phase. A machine replies > true to a propose and saves the commit in accepted filed. The other two > machines B and C does not get to the accept phase. > Current state is that machine A has this commit in paxos table as accepted > but not committed and B and C does not. > 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the > value written in step 1. This step is as if nothing is inflight. > 3) Issue another CAS Read and it goes to A and B. Now we will discover that > there is something inflight from A and will propose and commit it with the > current ballot. Now we can read the value written in step 1 as part of this > CAS read. > If we skip step 3 and instead run step 4, we will never learn about value > written in step 1. > 4. Issue a CAS Write and it involves only B and C. This will succeed and > commit a different value than step 1. Step 1 value will never be seen again > and was never seen before. > If you read the Lamport “paxos made simple” paper and read section 2.3. It > talks about this issue which is how learners can find out if majority of the > acceptors have accepted the proposal. > In step 3, it is correct that we propose the value again since we dont know > if it was accepted by majority of acceptors. When we ask majority of > acceptors, and more than one acceptors but not majority has something in > flight, we have no way of knowing if it is accepted by majority of acceptors. > So this behavior is correct. > However we need to fix step 2, since it caused reads to not be linearizable > with respect to writes and other reads. In this case, we know that majority > of acceptors have no inflight commit which means we have majority that > nothing was accepted by majority. I think we should run a propose step here > with empty commit and that will cause write written in step 1 to not be > visible ever after. > With this fix, we will either see data written in step 1 on next serial read > or will never see it which is what we want. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13257) Add repair streaming preview
[ https://issues.apache.org/jira/browse/CASSANDRA-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974337#comment-15974337 ] Marcus Eriksson commented on CASSANDRA-13257: - +1 > Add repair streaming preview > > > Key: CASSANDRA-13257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13257 > Project: Cassandra > Issue Type: New Feature > Components: Streaming and Messaging >Reporter: Blake Eggleston >Assignee: Blake Eggleston > Fix For: 4.0 > > > It would be useful to be able to estimate the amount of repair streaming that > needs to be done, without actually doing any streaming. Our main motivation > for this having something this is validating CASSANDRA-9143 in production, > but I’d imagine it could also be a useful tool in troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13257) Add repair streaming preview
[ https://issues.apache.org/jira/browse/CASSANDRA-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-13257: Status: Ready to Commit (was: Patch Available) > Add repair streaming preview > > > Key: CASSANDRA-13257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13257 > Project: Cassandra > Issue Type: New Feature > Components: Streaming and Messaging >Reporter: Blake Eggleston >Assignee: Blake Eggleston > Fix For: 4.0 > > > It would be useful to be able to estimate the amount of repair streaming that > needs to be done, without actually doing any streaming. Our main motivation > for this having something this is validating CASSANDRA-9143 in production, > but I’d imagine it could also be a useful tool in troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13430) Cleanup isIncremental/repairedAt usage
[ https://issues.apache.org/jira/browse/CASSANDRA-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974330#comment-15974330 ] Marcus Eriksson commented on CASSANDRA-13430: - +1 > Cleanup isIncremental/repairedAt usage > -- > > Key: CASSANDRA-13430 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13430 > Project: Cassandra > Issue Type: Improvement >Reporter: Blake Eggleston >Assignee: Blake Eggleston > Fix For: 4.0 > > > Post CASSANDRA-9143, there's no longer a reason to pass around > {{isIncremental}} or {{repairedAt}} in streaming sessions, as well as some > places in repair. The {{pendingRepair}} & {{repairedAt}} values should only > be set at the beginning/finalize stages of incremental repair and just follow > sstables around as they're streamed. Keeping these values with sstables also > fixes an edge case where you could leak repaired data back into unrepaired if > you run full and incremental repairs concurrently. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13430) Cleanup isIncremental/repairedAt usage
[ https://issues.apache.org/jira/browse/CASSANDRA-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-13430: Status: Ready to Commit (was: Patch Available) > Cleanup isIncremental/repairedAt usage > -- > > Key: CASSANDRA-13430 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13430 > Project: Cassandra > Issue Type: Improvement >Reporter: Blake Eggleston >Assignee: Blake Eggleston > Fix For: 4.0 > > > Post CASSANDRA-9143, there's no longer a reason to pass around > {{isIncremental}} or {{repairedAt}} in streaming sessions, as well as some > places in repair. The {{pendingRepair}} & {{repairedAt}} values should only > be set at the beginning/finalize stages of incremental repair and just follow > sstables around as they're streamed. Keeping these values with sstables also > fixes an edge case where you could leak repaired data back into unrepaired if > you run full and incremental repairs concurrently. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13307) The specification of protocol version in cqlsh means the python driver doesn't automatically downgrade protocol version.
[ https://issues.apache.org/jira/browse/CASSANDRA-13307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mck updated CASSANDRA-13307: Resolution: Fixed Fix Version/s: 4.0 Status: Resolved (was: Ready to Commit) committed now in both cassandra-3.11 branch and trunk. > The specification of protocol version in cqlsh means the python driver > doesn't automatically downgrade protocol version. > > > Key: CASSANDRA-13307 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13307 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Matt Byrd >Assignee: Matt Byrd >Priority: Minor > Labels: doc-impacting > Fix For: 4.0, 3.11.x > > > Hi, > Looks like we've regressed on the issue described in: > https://issues.apache.org/jira/browse/CASSANDRA-9467 > In that we're no longer able to connect from newer cqlsh versions > (e.g trunk) to older versions of Cassandra with a lower version of the > protocol (e.g 2.1 with protocol version 3) > The problem seems to be that we're relying on the ability for the client to > automatically downgrade protocol version implemented in Cassandra here: > https://issues.apache.org/jira/browse/CASSANDRA-12838 > and utilised in the python client here: > https://datastax-oss.atlassian.net/browse/PYTHON-240 > The problem however comes when we implemented: > https://datastax-oss.atlassian.net/browse/PYTHON-537 > "Don't downgrade protocol version if explicitly set" > (included when we bumped from 3.5.0 to 3.7.0 of the python driver as part of > fixing: https://issues.apache.org/jira/browse/CASSANDRA-11534) > Since we do explicitly specify the protocol version in the bin/cqlsh.py. > I've got a patch which just adds an option to explicitly specify the protocol > version (for those who want to do that) and then otherwise defaults to not > setting the protocol version, i.e using the protocol version from the client > which we ship, which should by default be the same protocol as the server. > Then it should downgrade gracefully as was intended. > Let me know if that seems reasonable. > Thanks, > Matt -- This message was sent by Atlassian JIRA (v6.3.15#6346)
cassandra git commit: Fix cqlsh automatic protocol downgrade regression Patch by Matt Byrd; reviewed by Mick Semb Wever for CASSANDRA-13307
Repository: cassandra Updated Branches: refs/heads/cassandra-3.11 65c1fddbc -> 9c54d02f7 Fix cqlsh automatic protocol downgrade regression Patch by Matt Byrd; reviewed by Mick Semb Wever for CASSANDRA-13307 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/9c54d02f Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/9c54d02f Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/9c54d02f Branch: refs/heads/cassandra-3.11 Commit: 9c54d02f73245d3a9a05d37f7d0002421abb852f Parents: 65c1fdd Author: Matt ByrdAuthored: Wed Mar 8 13:55:01 2017 -0800 Committer: Mick Semb Wever Committed: Wed Apr 19 16:15:37 2017 +1000 -- CHANGES.txt | 1 + bin/cqlsh.py | 19 +-- 2 files changed, 14 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/9c54d02f/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 19d8162..1757266 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -24,6 +24,7 @@ * NoReplicationTokenAllocator should work with zero replication factor (CASSANDRA-12983) * Address message coalescing regression (CASSANDRA-12676) * Delete illegal character from StandardTokenizerImpl.jflex (CASSANDRA-13417) + * Fix cqlsh automatic protocol downgrade regression (CASSANDRA-13307) Merged from 3.0: * Handling partially written hint files (CASSANDRA-12728) * Fix NPE issue in StorageService (CASSANDRA-13060) http://git-wip-us.apache.org/repos/asf/cassandra/blob/9c54d02f/bin/cqlsh.py -- diff --git a/bin/cqlsh.py b/bin/cqlsh.py index 2387342..e765dee 100644 --- a/bin/cqlsh.py +++ b/bin/cqlsh.py @@ -178,7 +178,6 @@ from cqlshlib.util import get_file_encoding_bomsize, trim_if_present DEFAULT_HOST = '127.0.0.1' DEFAULT_PORT = 9042 DEFAULT_SSL = False -DEFAULT_PROTOCOL_VERSION = 4 DEFAULT_CONNECT_TIMEOUT_SECONDS = 5 DEFAULT_REQUEST_TIMEOUT_SECONDS = 10 @@ -223,6 +222,9 @@ parser.add_option('--cqlversion', default=None, help='Specify a particular CQL version, ' 'by default the highest version supported by the server will be used.' ' Examples: "3.0.3", "3.1.0"') +parser.add_option("--protocol-version", type="int", default=None, + help='Specify a specific protcol version otherwise the client will default and downgrade as necessary') + parser.add_option("-e", "--execute", help='Execute the statement and quit.') parser.add_option("--connect-timeout", default=DEFAULT_CONNECT_TIMEOUT_SECONDS, dest='connect_timeout', help='Specify the connection timeout in seconds (default: %default seconds).') @@ -449,7 +451,7 @@ class Shell(cmd.Cmd): ssl=False, single_statement=None, request_timeout=DEFAULT_REQUEST_TIMEOUT_SECONDS, - protocol_version=DEFAULT_PROTOCOL_VERSION, + protocol_version=None, connect_timeout=DEFAULT_CONNECT_TIMEOUT_SECONDS): cmd.Cmd.__init__(self, completekey=completekey) self.hostname = hostname @@ -468,13 +470,16 @@ class Shell(cmd.Cmd): if use_conn: self.conn = use_conn else: +kwargs = {} +if protocol_version is not None: +kwargs['protocol_version'] = protocol_version self.conn = Cluster(contact_points=(self.hostname,), port=self.port, cql_version=cqlver, -protocol_version=protocol_version, auth_provider=self.auth_provider, ssl_options=sslhandling.ssl_settings(hostname, CONFIG_FILE) if ssl else None, load_balancing_policy=WhiteListRoundRobinPolicy([self.hostname]), control_connection_timeout=connect_timeout, -connect_timeout=connect_timeout) +connect_timeout=connect_timeout, +**kwargs) self.owns_connection = not use_conn if keyspace: @@ -1673,9 +1678,9 @@ class Shell(cmd.Cmd): direction = parsed.get_binding('dir').upper() if direction == 'FROM': -task = ImportTask(self, ks, table, columns, fname, opts, DEFAULT_PROTOCOL_VERSION, CONFIG_FILE) +task = ImportTask(self, ks, table, columns, fname, opts, self.conn.protocol_version, CONFIG_FILE) elif direction == 'TO': -task = ExportTask(self, ks, table, columns, fname, opts, DEFAULT_PROTOCOL_VERSION, CONFIG_FILE) +task = ExportTask(self, ks,
[jira] [Commented] (CASSANDRA-13307) The specification of protocol version in cqlsh means the python driver doesn't automatically downgrade protocol version.
[ https://issues.apache.org/jira/browse/CASSANDRA-13307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974141#comment-15974141 ] mck commented on CASSANDRA-13307: - || Branch || Unit Tests || DTests || | [3.11.x|https://github.com/michaelsembwever/cassandra/commit/32835b0919c5d89b565f0adff15a845fe392c270] | [circleci|https://circleci.com/gh/michaelsembwever/cassandra/14] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/24] | | [trunk|https://github.com/apache/cassandra/pull/96/commits/c36a4e5547af3967976144f7b553d70873503f77] | [asf jenkins|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-testall/3] \\ [circleci|https://circleci.com/gh/michaelsembwever/cassandra/3] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/15/] | > The specification of protocol version in cqlsh means the python driver > doesn't automatically downgrade protocol version. > > > Key: CASSANDRA-13307 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13307 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Matt Byrd >Assignee: Matt Byrd >Priority: Minor > Labels: doc-impacting > Fix For: 3.11.x > > > Hi, > Looks like we've regressed on the issue described in: > https://issues.apache.org/jira/browse/CASSANDRA-9467 > In that we're no longer able to connect from newer cqlsh versions > (e.g trunk) to older versions of Cassandra with a lower version of the > protocol (e.g 2.1 with protocol version 3) > The problem seems to be that we're relying on the ability for the client to > automatically downgrade protocol version implemented in Cassandra here: > https://issues.apache.org/jira/browse/CASSANDRA-12838 > and utilised in the python client here: > https://datastax-oss.atlassian.net/browse/PYTHON-240 > The problem however comes when we implemented: > https://datastax-oss.atlassian.net/browse/PYTHON-537 > "Don't downgrade protocol version if explicitly set" > (included when we bumped from 3.5.0 to 3.7.0 of the python driver as part of > fixing: https://issues.apache.org/jira/browse/CASSANDRA-11534) > Since we do explicitly specify the protocol version in the bin/cqlsh.py. > I've got a patch which just adds an option to explicitly specify the protocol > version (for those who want to do that) and then otherwise defaults to not > setting the protocol version, i.e using the protocol version from the client > which we ship, which should by default be the same protocol as the server. > Then it should downgrade gracefully as was intended. > Let me know if that seems reasonable. > Thanks, > Matt -- This message was sent by Atlassian JIRA (v6.3.15#6346)