[jira] [Updated] (CASSANDRA-15131) Data Race between force remove and remove
[ https://issues.apache.org/jira/browse/CASSANDRA-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated CASSANDRA-15131: -- Description: Reproduce: # start a three nodes cluster(A, B, C) by : ./bin/cassandra -f # shutdown node A # In Node B, removing node A by:./bin/nodetool removenode 2331c0c1-f799-4f35-9323-c57ad020732b # But this process is too slow, so in node B , we force remove A by:./bin/nodetool removenode force # But we meet NPE # {code:java} RemovalStatus: Removing token (-9206149340638432876). Waiting for replication confirmation from [/10.3.1.11,/10.3.1.14]. error: null -- StackTrace -- java.lang.NullPointerException at org.apache.cassandra.gms.VersionedValue$VersionedValueFactory.removedNonlocal(VersionedValue.java:214) at org.apache.cassandra.gms.Gossiper.advertiseTokenRemoved(Gossiper.java:556) at org.apache.cassandra.service.StorageService.forceRemoveCompletion(StorageService.java:4353) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1471) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1312) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1404) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:832) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323) at sun.rmi.transport.Transport$1.run(Transport.java:200) at sun.rmi.transport.Transport$1.run(Transport.java:197) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:196) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$81(TCPTransport.java:683) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Code Analysis 1. removeNode will mark the node as Leaving {code:java} tokenMetadata.addLeavingEndpoint(endpoint); {code} 2. so forceRemoveNode can step into remove(line3 - line12) {code:java} 1. if (!replicatingNodes.isEmpty() || !tokenMetadata.getLeavingEndpoints().isEmpty()) 2. { 3. logger.warn("Removal not confirmed for for {}", StringUtils.join(this.replicatingNodes, ",")); 4. for (InetAddress endpoint : tokenMetadata.getLeavingEndpoints()) 5. { 6. UUID hostId = tokenMetadata.getHostId(endpoint); 7. Gossiper.instance.advertiseTokenRemoved(endpoint, hostId); 8. excise(tokenMetadata.getTokens(endpoint), endpoint); 10 } 11 replicatingNodes.clear(); 12 removingNode = null; 13 } {code} 3 .At code line#6, forceRemoveNode will get hostId , but if removeNode just remove the host just now, the hostId at line6 will be null. 4. code line#7 will call *hostId.toString(),* hence NPE happens. If we have two or more nodes that should be force removed, this NPE will make them be skipped and still exist in cluster. was: Reproduce: # start a three nodes cluster(A, B, C) by : ./bin/cassandra -f # shutdown node A # In Node B, removing no
[jira] [Updated] (CASSANDRA-15136) Incorrect error message in legacy reader
[ https://issues.apache.org/jira/browse/CASSANDRA-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent White updated CASSANDRA-15136: -- Description: Just fixes the order in the exception message. ||3.0.x||3.11.x|| |[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom30]|[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom]| was: Just fixes the order in the exception message. ||3.0.x||3.11|| |[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom30]|[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom]| > Incorrect error message in legacy reader > > > Key: CASSANDRA-15136 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15136 > Project: Cassandra > Issue Type: Bug > Components: Observability/Logging >Reporter: Vincent White >Assignee: Vincent White >Priority: Normal > > Just fixes the order in the exception message. > ||3.0.x||3.11.x|| > |[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom30]|[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom]| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15136) Incorrect error message in legacy reader
[ https://issues.apache.org/jira/browse/CASSANDRA-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent White updated CASSANDRA-15136: -- Description: Just fixes the order in the exception message. ||3.0.x||3.11|| |[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom30]|[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom]| was: Just fixes the order in the exception message. ||3.0.x|3.11|| |[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom30]|[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom]| > Incorrect error message in legacy reader > > > Key: CASSANDRA-15136 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15136 > Project: Cassandra > Issue Type: Bug > Components: Observability/Logging >Reporter: Vincent White >Assignee: Vincent White >Priority: Normal > > Just fixes the order in the exception message. > ||3.0.x||3.11|| > |[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom30]|[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom]| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15131) Data Race between force remove and remove
[ https://issues.apache.org/jira/browse/CASSANDRA-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated CASSANDRA-15131: -- Description: Reproduce: # start a three nodes cluster(A, B, C) by : ./bin/cassandra -f # shutdown node A # In Node B, removing node A by:./bin/nodetool removenode 2331c0c1-f799-4f35-9323-c57ad020732b # But this process is too slow, so we force remove A by:./bin/nodetool removenode force # NPE happens in client # {code:java} RemovalStatus: Removing token (-9206149340638432876). Waiting for replication confirmation from [/10.3.1.11,/10.3.1.14]. error: null -- StackTrace -- java.lang.NullPointerException at org.apache.cassandra.gms.VersionedValue$VersionedValueFactory.removedNonlocal(VersionedValue.java:214) at org.apache.cassandra.gms.Gossiper.advertiseTokenRemoved(Gossiper.java:556) at org.apache.cassandra.service.StorageService.forceRemoveCompletion(StorageService.java:4353) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1471) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1312) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1404) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:832) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323) at sun.rmi.transport.Transport$1.run(Transport.java:200) at sun.rmi.transport.Transport$1.run(Transport.java:197) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:196) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$81(TCPTransport.java:683) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Code Analysis 1. removeNode will mark the node as Leaving {code:java} tokenMetadata.addLeavingEndpoint(endpoint); {code} 2. forceRemoveNode then step into remove {code:java} 1. if (!replicatingNodes.isEmpty() || !tokenMetadata.getLeavingEndpoints().isEmpty()) 2. { 3. logger.warn("Removal not confirmed for for {}", StringUtils.join(this.replicatingNodes, ",")); 4. for (InetAddress endpoint : tokenMetadata.getLeavingEndpoints()) 5. { 6. UUID hostId = tokenMetadata.getHostId(endpoint); 7. Gossiper.instance.advertiseTokenRemoved(endpoint, hostId); 8. excise(tokenMetadata.getTokens(endpoint), endpoint); 10 } 11 replicatingNodes.clear(); 12 removingNode = null; 13 } {code} 3 .code line#6,will get hostId, but if removeNode execute completely right now and it will remove host : *tokenMetadata.removeEndpoint(endpoint);* So hostId is null. 4. code line#7 will call *hostId.toString(),* hence NPE happens. The NPE will prevent other nodes being force removed. was: Reproduce: # start a three nodes cluster(A, B, C) by : ./bin/cassandra -f # shutdown node A # In Node B, removing node A by:./bin/nodetool removenode 2331c0c1-f799-4f35-9323-c5
[jira] [Updated] (CASSANDRA-15136) Incorrect error message in legacy reader
[ https://issues.apache.org/jira/browse/CASSANDRA-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent White updated CASSANDRA-15136: -- Description: Just fixes the order in the exception message. ||3.0.x|3.11|| |[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom30]|[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom]| was: Just fixes the order in the exception message. ||3.11|| |[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom]| > Incorrect error message in legacy reader > > > Key: CASSANDRA-15136 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15136 > Project: Cassandra > Issue Type: Bug > Components: Observability/Logging >Reporter: Vincent White >Assignee: Vincent White >Priority: Normal > > Just fixes the order in the exception message. > > ||3.0.x|3.11|| > |[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom30]|[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom]| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15131) Data Race between force remove and remove
[ https://issues.apache.org/jira/browse/CASSANDRA-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated CASSANDRA-15131: -- Summary: Data Race between force remove and remove (was: NPE while force remove a node) > Data Race between force remove and remove > - > > Key: CASSANDRA-15131 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15131 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: lujie >Assignee: lujie >Priority: Normal > Labels: pull-request-available > Attachments: 0001-fix-CASSANDRA-15131.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Reproduce: > # start a three nodes cluster(A, B, C) by : ./bin/cassandra -f > # shutdown node A > # In Node B, removing node A by:./bin/nodetool removenode > 2331c0c1-f799-4f35-9323-c57ad020732b > # But this process is too slow, so we force remove A by:./bin/nodetool > removenode force > # NPE happens in client > # > {code:java} > RemovalStatus: Removing token (-9206149340638432876). Waiting for replication > confirmation from [/10.3.1.11,/10.3.1.14]. > error: null > -- StackTrace -- > java.lang.NullPointerException > at > org.apache.cassandra.gms.VersionedValue$VersionedValueFactory.removedNonlocal(VersionedValue.java:214) > at org.apache.cassandra.gms.Gossiper.advertiseTokenRemoved(Gossiper.java:556) > at > org.apache.cassandra.service.StorageService.forceRemoveCompletion(StorageService.java:4353) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) > at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) > at > com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) > at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) > at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) > at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) > at > javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1471) > at > javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) > at > javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1312) > at > javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1404) > at > javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:832) > at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323) > at sun.rmi.transport.Transport$1.run(Transport.java:200) > at sun.rmi.transport.Transport$1.run(Transport.java:197) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Transport.java:196) > at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$81(TCPTransport.java:683) > at java.security.AccessController.doPrivileged(Native Method) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > Code Analysis > 1. removeNode will mark the node as Leaving > {code:java} > tokenMetadata.addLeavingEndpoint(endpoint); > {code} > 2. forceRemoveNode then step into remove > {code:java} > 1. if (!replicatingNodes.isEmpty() || > !tokenMetadata.getLeavingEndpoints().isEmpty()) > 2. { > 3. logger.warn("Removal not confirmed for for {}", > StringUtils.join(this.replicatingNodes, ",
[jira] [Commented] (CASSANDRA-15136) Incorrect error message in legacy reader
[ https://issues.apache.org/jira/browse/CASSANDRA-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16846376#comment-16846376 ] Vincent White commented on CASSANDRA-15136: --- Thanks, should be sorted now. > Incorrect error message in legacy reader > > > Key: CASSANDRA-15136 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15136 > Project: Cassandra > Issue Type: Bug > Components: Observability/Logging >Reporter: Vincent White >Assignee: Vincent White >Priority: Normal > > Just fixes the order in the exception message. > > ||3.11|| > |[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom]| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15136) Incorrect error message in legacy reader
[ https://issues.apache.org/jira/browse/CASSANDRA-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent White updated CASSANDRA-15136: -- Description: Just fixes the order in the exception message. ||3.11|| |[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom]| was: Just fixes the order in the exception message. ||3.11|| |[Patch|https://github.com/vincewhite/cassandra/commit/5a62fdd7aa7463a10a1a0bb546c1322ab15eb9cf]| > Incorrect error message in legacy reader > > > Key: CASSANDRA-15136 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15136 > Project: Cassandra > Issue Type: Bug > Components: Observability/Logging >Reporter: Vincent White >Assignee: Vincent White >Priority: Normal > > Just fixes the order in the exception message. > > ||3.11|| > |[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom]| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15138) A cluster (RF=3) not recovering after two nodes are stopped
[ https://issues.apache.org/jira/browse/CASSANDRA-15138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroyuki Yamada updated CASSANDRA-15138: Description: I faced a weird issue when recovering a cluster after two nodes are stopped. It is easily reproduce-able and looks like a bug or an issue to fix. The following is a step to reproduce it. === STEP TO REPRODUCE === * Create a 3-node cluster with RF=3 - node1(seed), node2, node3 * Start requests to the cluster with cassandra-stress (it continues until the end) - what we did: cassandra-stress mixed cl=QUORUM duration=10m -errors ignore -node node1,node2,node3 -rate threads\>=16 threads\<=256 - (It doesn't have to be this many threads. Can be 1) * Stop node3 normally (with systemctl stop or kill (without -9)) - the system is still available as expected because the quorum of nodes is still available * Stop node2 normally (with systemctl stop or kill (without -9)) - the system is NOT available as expected after it's stopped. - the client gets `UnavailableException: Not enough replicas available for query at consistency QUORUM` - the client gets errors right away (so few ms) - so far it's all expected * Wait for 1 mins * Bring up node2 back - {color:#ff}The issue happens here.{color} - the client gets ReadTimeoutException` or WriteTimeoutException depending on if the request is read or write even after the node2 is up - the client gets errors after about 5000ms or 2000ms, which are request timeout for write and read request - what node1 reports with `nodetool status` and what node2 reports are not consistent. (node2 thinks node1 is down) - It takes very long time to recover from its state === STEPS TO REPRODUCE === Some additional important information to note: * If we don't start cassandra-stress, it doesn't cause the issue. * Restarting node1 and it recovers its state right after it's restarted * Setting lower value in dynamic_snitch_reset_interval_in_ms (to 6 or something) fixes the issue * If we `kill -9` the nodes, then it doesn't cause the issue. * Hints seems not related. I tested with hints disabled, it didn't make any difference. was: I faced a weird issue when recovering a cluster after two nodes are stopped. It is easily reproduce-able and looks like a bug or an issue to fix. The following is a step to reproduce it. === STEP TO REPRODUCE === * Create a 3-node cluster with RF=3 - node1(seed), node2, node3 * Start requests to the cluster with cassandra-stress (it continues until the end) - what we did: cassandra-stress mixed cl=QUORUM duration=10m -errors ignore -node node1,node2,node3 -rate threads\>=16 threads\<=256 - (It doesn't have to be this many threads. Can be 1) * Stop node3 normally (with systemctl stop or kill (not without -9)) - the system is still available as expected because the quorum of nodes is still available * Stop node2 normally (with systemctl stop or kill (not without -9)) - the system is NOT available as expected after it's stopped. - the client gets `UnavailableException: Not enough replicas available for query at consistency QUORUM` - the client gets errors right away (so few ms) - so far it's all expected * Wait for 1 mins * Bring up node2 back - {color:#FF}The issue happens here.{color} - the client gets ReadTimeoutException` or WriteTimeoutException depending on if the request is read or write even after the node2 is up - the client gets errors after about 5000ms or 2000ms, which are request timeout for write and read request - what node1 reports with `nodetool status` and what node2 reports are not consistent. (node2 thinks node1 is down) - It takes very long time to recover from its state === STEPS TO REPRODUCE === Some additional important information to note: * If we don't start cassandra-stress, it doesn't cause the issue. * Restarting node1 and it recovers its state right after it's restarted * Setting lower value in dynamic_snitch_reset_interval_in_ms (to 6 or something) fixes the issue * If we `kill -9` the nodes, then it doesn't cause the issue. * Hints seems not related. I tested with hints disabled, it didn't make any difference. > A cluster (RF=3) not recovering after two nodes are stopped > --- > > Key: CASSANDRA-15138 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15138 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership >Reporter: Hiroyuki Yamada >Priority: Normal > > I faced a weird issue when recovering a cluster after two nodes are stopped. > It is easily reproduce-able and looks like a bug or an issue to fix. > The following is a step to reproduce it. > === STEP TO REPRODUCE === >
[jira] [Updated] (CASSANDRA-15138) A cluster (RF=3) not recovering after two nodes are stopped
[ https://issues.apache.org/jira/browse/CASSANDRA-15138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroyuki Yamada updated CASSANDRA-15138: Discovered By: User Report > A cluster (RF=3) not recovering after two nodes are stopped > --- > > Key: CASSANDRA-15138 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15138 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership >Reporter: Hiroyuki Yamada >Priority: Normal > > I faced a weird issue when recovering a cluster after two nodes are stopped. > It is easily reproduce-able and looks like a bug or an issue to fix. > The following is a step to reproduce it. > === STEP TO REPRODUCE === > * Create a 3-node cluster with RF=3 > - node1(seed), node2, node3 > * Start requests to the cluster with cassandra-stress (it continues > until the end) > - what we did: cassandra-stress mixed cl=QUORUM duration=10m > -errors ignore -node node1,node2,node3 -rate threads\>=16 > threads\<=256 > - (It doesn't have to be this many threads. Can be 1) > * Stop node3 normally (with systemctl stop or kill (not without -9)) > - the system is still available as expected because the quorum of nodes is > still available > * Stop node2 normally (with systemctl stop or kill (not without -9)) > - the system is NOT available as expected after it's stopped. > - the client gets `UnavailableException: Not enough replicas > available for query at consistency QUORUM` > - the client gets errors right away (so few ms) > - so far it's all expected > * Wait for 1 mins > * Bring up node2 back > - {color:#FF}The issue happens here.{color} > - the client gets ReadTimeoutException` or WriteTimeoutException > depending on if the request is read or write even after the node2 is > up > - the client gets errors after about 5000ms or 2000ms, which are > request timeout for write and read request > - what node1 reports with `nodetool status` and what node2 reports > are not consistent. (node2 thinks node1 is down) > - It takes very long time to recover from its state > === STEPS TO REPRODUCE === > Some additional important information to note: > * If we don't start cassandra-stress, it doesn't cause the issue. > * Restarting node1 and it recovers its state right after it's restarted > * Setting lower value in dynamic_snitch_reset_interval_in_ms (to 6 > or something) fixes the issue > * If we `kill -9` the nodes, then it doesn't cause the issue. > * Hints seems not related. I tested with hints disabled, it didn't make any > difference. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15138) A cluster (RF=3) not recovering after two nodes are stopped
Hiroyuki Yamada created CASSANDRA-15138: --- Summary: A cluster (RF=3) not recovering after two nodes are stopped Key: CASSANDRA-15138 URL: https://issues.apache.org/jira/browse/CASSANDRA-15138 Project: Cassandra Issue Type: Bug Components: Cluster/Membership Reporter: Hiroyuki Yamada I faced a weird issue when recovering a cluster after two nodes are stopped. It is easily reproduce-able and looks like a bug or an issue to fix. The following is a step to reproduce it. === STEP TO REPRODUCE === * Create a 3-node cluster with RF=3 - node1(seed), node2, node3 * Start requests to the cluster with cassandra-stress (it continues until the end) - what we did: cassandra-stress mixed cl=QUORUM duration=10m -errors ignore -node node1,node2,node3 -rate threads\>=16 threads\<=256 - (It doesn't have to be this many threads. Can be 1) * Stop node3 normally (with systemctl stop or kill (not without -9)) - the system is still available as expected because the quorum of nodes is still available * Stop node2 normally (with systemctl stop or kill (not without -9)) - the system is NOT available as expected after it's stopped. - the client gets `UnavailableException: Not enough replicas available for query at consistency QUORUM` - the client gets errors right away (so few ms) - so far it's all expected * Wait for 1 mins * Bring up node2 back - {color:#FF}The issue happens here.{color} - the client gets ReadTimeoutException` or WriteTimeoutException depending on if the request is read or write even after the node2 is up - the client gets errors after about 5000ms or 2000ms, which are request timeout for write and read request - what node1 reports with `nodetool status` and what node2 reports are not consistent. (node2 thinks node1 is down) - It takes very long time to recover from its state === STEPS TO REPRODUCE === Some additional important information to note: * If we don't start cassandra-stress, it doesn't cause the issue. * Restarting node1 and it recovers its state right after it's restarted * Setting lower value in dynamic_snitch_reset_interval_in_ms (to 6 or something) fixes the issue * If we `kill -9` the nodes, then it doesn't cause the issue. * Hints seems not related. I tested with hints disabled, it didn't make any difference. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15137) Switch http to https URLs in build.xml
[ https://issues.apache.org/jira/browse/CASSANDRA-15137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Shuler updated CASSANDRA-15137: --- Fix Version/s: 3.11.5 3.0.19 2.2.15 Source Control Link: https://github.com/apache/cassandra/commit/ca0ea40eeace444a2b7b5c81bae41f36de332d95 Status: Resolved (was: Ready to Commit) Resolution: Fixed > Switch http to https URLs in build.xml > -- > > Key: CASSANDRA-15137 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15137 > Project: Cassandra > Issue Type: Task > Components: Build, Dependencies >Reporter: Michael Shuler >Assignee: Michael Shuler >Priority: Normal > Fix For: 2.2.15, 3.0.19, 3.11.5, 4.0 > > Attachments: 0001-Switch-http-to-https-URLs-in-build.xml.patch > > > Switch to using https URLs wherever possible in build.xml. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15137) Switch http to https URLs in build.xml
[ https://issues.apache.org/jira/browse/CASSANDRA-15137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16846174#comment-16846174 ] Michael Shuler commented on CASSANDRA-15137: Committed the same changes to cassandra-2.2 branch and merged up. While I was there, I also fixed the {{scm:}} URLs in build.xml to the new gitbox system, since the new URLs were only in trunk. > Switch http to https URLs in build.xml > -- > > Key: CASSANDRA-15137 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15137 > Project: Cassandra > Issue Type: Task > Components: Build, Dependencies >Reporter: Michael Shuler >Assignee: Michael Shuler >Priority: Normal > Fix For: 4.0 > > Attachments: 0001-Switch-http-to-https-URLs-in-build.xml.patch > > > Switch to using https URLs wherever possible in build.xml. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch cassandra-2.2 updated: Switch http to https URLs in build.xml
This is an automated email from the ASF dual-hosted git repository. mshuler pushed a commit to branch cassandra-2.2 in repository https://gitbox.apache.org/repos/asf/cassandra.git The following commit(s) were added to refs/heads/cassandra-2.2 by this push: new 63ff65a Switch http to https URLs in build.xml 63ff65a is described below commit 63ff65a8dd3a31e500ae5ec6232f1f9eade6fa3d Author: Michael Shuler AuthorDate: Wed May 22 15:03:43 2019 -0400 Switch http to https URLs in build.xml Patch by Michael Shuler; Reviewed by Joshua McKenzie for CASSANDRA-15137 --- build.properties.default | 4 ++-- build.xml| 26 +- 2 files changed, 15 insertions(+), 15 deletions(-) diff --git a/build.properties.default b/build.properties.default index 5291659..11da534 100644 --- a/build.properties.default +++ b/build.properties.default @@ -1,4 +1,4 @@ # Maven2 Repository Locations (you can override these in "build.properties" to point to a local proxy, e.g. Nexus) -artifact.remoteRepository.central: http://repo1.maven.org/maven2 -artifact.remoteRepository.apache: http://repo.maven.apache.org/maven2 +artifact.remoteRepository.central: https://repo1.maven.org/maven2 +artifact.remoteRepository.apache: https://repo.maven.apache.org/maven2 diff --git a/build.xml b/build.xml index d82dea9..ca06b41 100644 --- a/build.xml +++ b/build.xml @@ -8,7 +8,7 @@ ~ "License"); you may not use this file except in compliance ~ with the License. You may obtain a copy of the License at ~ - ~http://www.apache.org/licenses/LICENSE-2.0 + ~https://www.apache.org/licenses/LICENSE-2.0 ~ ~ Unless required by applicable law or agreed to in writing, ~ software distributed under the License is distributed on an @@ -26,9 +26,9 @@ - - -http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=tree"/> +https://gitbox.apache.org/repos/asf/cassandra.git"/> +https://gitbox.apache.org/repos/asf/cassandra.git"/> +https://gitbox.apache.org/repos/asf?p=cassandra.git;a=tree"/> @@ -88,7 +88,7 @@ http://repo2.maven.org/maven2/org/apache/maven/maven-ant-tasks"; /> + value="https://repo.maven.apache.org/maven2/org/apache/maven/maven-ant-tasks"; /> https://repository.apache.org/service/local/staging/deploy/maven2";> @@ -108,14 +108,14 @@ - + - + @@ -332,11 +332,11 @@ artifactId="cassandra-parent" packaging="pom" version="${version}" -url="http://cassandra.apache.org"; +url="https://cassandra.apache.org"; name="Apache Cassandra" inceptionYear="2009" description="The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model."> -http://www.apache.org/licenses/LICENSE-2.0.txt"/> +https://www.apache.org/licenses/LICENSE-2.0.txt"/> @@ -556,7 +556,7 @@ http://cassandra.apache.org"; +url="https://cassandra.apache.org"; name="Apache Cassandra"> http://cassandra.apache.org"; +url="https://cassandra.apache.org"; name="Apache Cassandra"> http://cassandra.apache.org"; +url="https://cassandra.apache.org"; name="Apache Cassandra"> http://cassandra.apache.org"; +url="https://cassandra.apache.org"; name="Apache Cassandra">
[cassandra] 01/01: Merge branch 'cassandra-3.11' into trunk
This is an automated email from the ASF dual-hosted git repository. mshuler pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git commit 8bedb199c9739771ad4bf41249428f8e5511c50e Merge: ca0ea40 2d3157c Author: Michael Shuler AuthorDate: Wed May 22 15:16:17 2019 -0400 Merge branch 'cassandra-3.11' into trunk - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch cassandra-3.11 updated (a6d32bb -> 2d3157c)
This is an automated email from the ASF dual-hosted git repository. mshuler pushed a change to branch cassandra-3.11 in repository https://gitbox.apache.org/repos/asf/cassandra.git. from a6d32bb Merge branch 'cassandra-3.0' into cassandra-3.11 new 63ff65a Switch http to https URLs in build.xml new fa6b40f Merge branch 'cassandra-2.2' into cassandra-3.0 new 2d3157c Merge branch 'cassandra-3.0' into cassandra-3.11 The 3 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: build.properties.default | 4 ++-- build.xml| 22 +++--- 2 files changed, 13 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch trunk updated (ca0ea40 -> 8bedb19)
This is an automated email from the ASF dual-hosted git repository. mshuler pushed a change to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git. from ca0ea40 Switch http to https URLs in build.xml new 63ff65a Switch http to https URLs in build.xml new fa6b40f Merge branch 'cassandra-2.2' into cassandra-3.0 new 2d3157c Merge branch 'cassandra-3.0' into cassandra-3.11 new 8bedb19 Merge branch 'cassandra-3.11' into trunk The 4 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch cassandra-3.0 updated (c07f3c8 -> fa6b40f)
This is an automated email from the ASF dual-hosted git repository. mshuler pushed a change to branch cassandra-3.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git. from c07f3c8 Resource leak when queries apply a RowFilter new 63ff65a Switch http to https URLs in build.xml new fa6b40f Merge branch 'cassandra-2.2' into cassandra-3.0 The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: build.properties.default | 4 ++-- build.xml| 24 2 files changed, 14 insertions(+), 14 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] 01/01: Merge branch 'cassandra-3.0' into cassandra-3.11
This is an automated email from the ASF dual-hosted git repository. mshuler pushed a commit to branch cassandra-3.11 in repository https://gitbox.apache.org/repos/asf/cassandra.git commit 2d3157cb8c419f7e647d15969f92a1fffbc7b1fe Merge: a6d32bb fa6b40f Author: Michael Shuler AuthorDate: Wed May 22 15:15:46 2019 -0400 Merge branch 'cassandra-3.0' into cassandra-3.11 build.properties.default | 4 ++-- build.xml| 22 +++--- 2 files changed, 13 insertions(+), 13 deletions(-) diff --cc build.xml index db97d05,2023e11..d8f833a --- a/build.xml +++ b/build.xml @@@ -25,10 -25,10 +25,10 @@@ - + - - - http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=tree"/> + https://gitbox.apache.org/repos/asf/cassandra.git"/> + https://gitbox.apache.org/repos/asf/cassandra.git"/> + https://gitbox.apache.org/repos/asf?p=cassandra.git;a=tree"/> - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] 01/01: Merge branch 'cassandra-2.2' into cassandra-3.0
This is an automated email from the ASF dual-hosted git repository. mshuler pushed a commit to branch cassandra-3.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git commit fa6b40f4603025ef416afee7e4f76b9b8cd041fc Merge: c07f3c8 63ff65a Author: Michael Shuler AuthorDate: Wed May 22 15:09:25 2019 -0400 Merge branch 'cassandra-2.2' into cassandra-3.0 build.properties.default | 4 ++-- build.xml| 24 2 files changed, 14 insertions(+), 14 deletions(-) diff --cc build.xml index 8ef8c67,ca06b41..2023e11 --- a/build.xml +++ b/build.xml @@@ -25,10 -25,10 +25,10 @@@ - + - - - http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=tree"/> + https://gitbox.apache.org/repos/asf/cassandra.git"/> + https://gitbox.apache.org/repos/asf/cassandra.git"/> + https://gitbox.apache.org/repos/asf?p=cassandra.git;a=tree"/> @@@ -104,10 -108,14 +104,10 @@@ - - - - - - - + + + - + - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15136) Incorrect error message in legacy reader
[ https://issues.apache.org/jira/browse/CASSANDRA-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16846129#comment-16846129 ] Per Otterström commented on CASSANDRA-15136: Thanks for the patch. Issue is present in 3.0 and 3.11 as far as I can tell. Regarding the patch, you got the parenthesis wrong in the end. Also, you may want to create an entry in the CHANGES.txt file as well. > Incorrect error message in legacy reader > > > Key: CASSANDRA-15136 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15136 > Project: Cassandra > Issue Type: Bug > Components: Observability/Logging >Reporter: Vincent White >Assignee: Vincent White >Priority: Normal > > Just fixes the order in the exception message. > > ||3.11|| > |[Patch|https://github.com/vincewhite/cassandra/commit/5a62fdd7aa7463a10a1a0bb546c1322ab15eb9cf]| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org