[jira] [Commented] (CASSANDRA-5815) NPE from migration manager
[ https://issues.apache.org/jira/browse/CASSANDRA-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13802224#comment-13802224 ] Tyler Hobbs commented on CASSANDRA-5815: +1 NPE from migration manager -- Key: CASSANDRA-5815 URL: https://issues.apache.org/jira/browse/CASSANDRA-5815 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.12 Reporter: Vishy Kasar Assignee: Brandon Williams Priority: Minor Fix For: 1.2.12 Attachments: 5185.txt In one of our production clusters we see this error often. Looking through the source, Gossiper.instance.getEndpointStateForEndpoint(endpoint) is returning null for some end point. De we need any config change on our end to resolve this? In any case, cassandra should be updated to protect against this NPE. {noformat} ERROR [OptionalTasks:1] 2013-07-24 13:40:38,972 AbstractCassandraDaemon.java (line 132) Exception in thread Thread[OptionalTasks:1,5,main] java.lang.NullPointerException at org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:134) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {noformat} It turned out that the reason for NPE was we bootstrapped a node with the same token as another node. Cassandra should not throw an NPE here but log a meaningful error message. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-5815) NPE from migration manager
[ https://issues.apache.org/jira/browse/CASSANDRA-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788228#comment-13788228 ] Chris Burroughs commented on CASSANDRA-5815: I'm seeing an NPE in migration manager in 1.2.9 and what I think is the same spot (line numbers changed slightly since July). This occurs on at least one node every time (about 10 attempts) I try to bootstrap with a 2 dc production cluster using the GPFS w/ reconnecting. {noformat} ERROR [OptionalTasks:1] 2013-10-07 08:06:05,658 CassandraDaemon.java (line 194) Exception in thread Thread[OptionalTasks:1,5,main] java.lang.NullPointerException at org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:130) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {noformat} I added a log message to confirm that Gossiper really really thinks it's not there (off of the 1.2.10 tag if that matters). I'm suspicious of this being a timing problem the reconnect dance, but I'm not sure how to prove or disprove that. {noformat} logger.warn([csb] Trying to get endpoint state for {} ; exists {}, new Object[] {endpoint, Gossiper.instance.isKnownEndpoint(endpoint)}); INFO [GossipTasks:1] 2013-10-07 11:19:10,565 Gossiper.java (line 803) InetAddress /208.49.103.36 is now DOWN INFO [GossipTasks:1] 2013-10-07 11:19:13,572 Gossiper.java (line 608) FatClient /208.49.103.36 has been silent for 3ms, removing from gossip INFO [HANDSHAKE-/208.49.103.36] 2013-10-07 11:19:13,863 OutboundTcpConnection.java (line 399) Handshaking version with /208.49.103.36 INFO [HANDSHAKE-/208.49.103.36] 2013-10-07 11:19:15,275 OutboundTcpConnection.java (line 399) Handshaking version with /208.49.103.36 WARN [OptionalTasks:1] 2013-10-07 11:19:36,696 MigrationManager.java (line 130) [csb] Trying to get endpoint state for /208.49.103.36 ; exists false ERROR [OptionalTasks:1] 2013-10-07 11:19:36,696 CassandraDaemon.java (line 193) Exception in thread Thread[OptionalTasks:1,5,main] java.lang.NullPointerException at org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:131) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {noformat} NPE from migration manager -- Key: CASSANDRA-5815 URL: https://issues.apache.org/jira/browse/CASSANDRA-5815 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.12 Reporter: Vishy Kasar Assignee: Brandon Williams Priority: Minor In one of our production clusters we see this error often. Looking through the source, Gossiper.instance.getEndpointStateForEndpoint(endpoint) is returning null for some end point. De we need any config change on our end to resolve this? In any case, cassandra should be updated to protect against this NPE. ERROR [OptionalTasks:1] 2013-07-24 13:40:38,972 AbstractCassandraDaemon.java (line 132) Exception in thread Thread[OptionalTasks:1,5,main] java.lang.NullPointerException at org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:134) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at
[jira] [Commented] (CASSANDRA-5815) NPE from migration manager
[ https://issues.apache.org/jira/browse/CASSANDRA-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788237#comment-13788237 ] Brandon Williams commented on CASSANDRA-5815: - It looks the same to me. The good news is the error is purely cosmetic at this point, there's nothing left to do if the gossiper has removed the node (not to mention it's a fat client) NPE from migration manager -- Key: CASSANDRA-5815 URL: https://issues.apache.org/jira/browse/CASSANDRA-5815 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.12 Reporter: Vishy Kasar Assignee: Brandon Williams Priority: Minor In one of our production clusters we see this error often. Looking through the source, Gossiper.instance.getEndpointStateForEndpoint(endpoint) is returning null for some end point. De we need any config change on our end to resolve this? In any case, cassandra should be updated to protect against this NPE. ERROR [OptionalTasks:1] 2013-07-24 13:40:38,972 AbstractCassandraDaemon.java (line 132) Exception in thread Thread[OptionalTasks:1,5,main] java.lang.NullPointerException at org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:134) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) It turned out that the reason for NPE was we bootstrapped a node with the same token as another node. Cassandra should not throw an NPE here but log a meaningful error message. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-5815) NPE from migration manager
[ https://issues.apache.org/jira/browse/CASSANDRA-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788284#comment-13788284 ] Chris Burroughs commented on CASSANDRA-5815: Whoops, missed the important part for the case I am seeing but might not be part of the original (bootstrapping with the same token would presumably fail anyway). The situation I am seeing post NPE is: * Bootstrapping node expects steams from NPE-node * NPE-node says it has no outstanding streams And thus bootstrap never completes. NPE from migration manager -- Key: CASSANDRA-5815 URL: https://issues.apache.org/jira/browse/CASSANDRA-5815 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.12 Reporter: Vishy Kasar Assignee: Brandon Williams Priority: Minor In one of our production clusters we see this error often. Looking through the source, Gossiper.instance.getEndpointStateForEndpoint(endpoint) is returning null for some end point. De we need any config change on our end to resolve this? In any case, cassandra should be updated to protect against this NPE. ERROR [OptionalTasks:1] 2013-07-24 13:40:38,972 AbstractCassandraDaemon.java (line 132) Exception in thread Thread[OptionalTasks:1,5,main] java.lang.NullPointerException at org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:134) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) It turned out that the reason for NPE was we bootstrapped a node with the same token as another node. Cassandra should not throw an NPE here but log a meaningful error message. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-5815) NPE from migration manager
[ https://issues.apache.org/jira/browse/CASSANDRA-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788296#comment-13788296 ] Brandon Williams commented on CASSANDRA-5815: - [~cburroughs] I think your problem is something else, since the bootstrapping node has not only been marked down, but it's been down long enough to get removed (which is the race between the gossiper and MM causing this NPE) I will note for myself though that the fat client removal should also wait until the node has been marked down before beginning the 30s countdown to removal. If the node has connected but the gossiper doesn't know about it, they haven't gossiped yet, so there's really nothing for MM to do yet anyway. NPE from migration manager -- Key: CASSANDRA-5815 URL: https://issues.apache.org/jira/browse/CASSANDRA-5815 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.12 Reporter: Vishy Kasar Assignee: Brandon Williams Priority: Minor In one of our production clusters we see this error often. Looking through the source, Gossiper.instance.getEndpointStateForEndpoint(endpoint) is returning null for some end point. De we need any config change on our end to resolve this? In any case, cassandra should be updated to protect against this NPE. ERROR [OptionalTasks:1] 2013-07-24 13:40:38,972 AbstractCassandraDaemon.java (line 132) Exception in thread Thread[OptionalTasks:1,5,main] java.lang.NullPointerException at org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:134) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) It turned out that the reason for NPE was we bootstrapped a node with the same token as another node. Cassandra should not throw an NPE here but log a meaningful error message. -- This message was sent by Atlassian JIRA (v6.1#6144)