[jira] [Commented] (CASSANDRA-5815) NPE from migration manager

2013-10-22 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13802224#comment-13802224
 ] 

Tyler Hobbs commented on CASSANDRA-5815:


+1

 NPE from migration manager
 --

 Key: CASSANDRA-5815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5815
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.12
Reporter: Vishy Kasar
Assignee: Brandon Williams
Priority: Minor
 Fix For: 1.2.12

 Attachments: 5185.txt


 In one of our production clusters we see this error often. Looking through 
 the source, Gossiper.instance.getEndpointStateForEndpoint(endpoint) is 
 returning null for some end point. De we need any config change on our end to 
 resolve this? In any case, cassandra should be updated to protect against 
 this NPE.
 {noformat}
 ERROR [OptionalTasks:1] 2013-07-24 13:40:38,972 AbstractCassandraDaemon.java 
 (line 132) Exception in thread Thread[OptionalTasks:1,5,main] 
 java.lang.NullPointerException 
 at 
 org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:134)
  
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
 at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
  
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
  
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 It turned out that the reason for NPE was we bootstrapped a node with the 
 same token as another node. Cassandra should not throw an NPE here but log a 
 meaningful error message. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5815) NPE from migration manager

2013-10-07 Thread Chris Burroughs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788228#comment-13788228
 ] 

Chris Burroughs commented on CASSANDRA-5815:


I'm seeing an NPE in migration manager in 1.2.9 and what I think is the same 
spot (line numbers changed slightly since July).  This occurs on at least one 
node every time (about 10 attempts) I try to bootstrap with a 2 dc production 
cluster using the GPFS w/ reconnecting.

{noformat}
ERROR [OptionalTasks:1] 2013-10-07 08:06:05,658 CassandraDaemon.java (line 194) 
Exception in thread Thread[OptionalTasks:1,5,main]
java.lang.NullPointerException
at 
org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:130)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{noformat}

I added a log message to confirm that Gossiper really really thinks it's not 
there (off of the 1.2.10 tag if that matters).  I'm suspicious of this being a 
timing problem the reconnect dance, but I'm not sure how to prove or disprove 
that.

{noformat}
logger.warn([csb] Trying to get endpoint state for {} ; 
exists {}, new Object[] {endpoint, 
Gossiper.instance.isKnownEndpoint(endpoint)});

 INFO [GossipTasks:1] 2013-10-07 11:19:10,565 Gossiper.java (line 803) 
InetAddress /208.49.103.36 is now DOWN
 INFO [GossipTasks:1] 2013-10-07 11:19:13,572 Gossiper.java (line 608) 
FatClient /208.49.103.36 has been silent for 3ms, removing from gossip
 INFO [HANDSHAKE-/208.49.103.36] 2013-10-07 11:19:13,863 
OutboundTcpConnection.java (line 399) Handshaking version with /208.49.103.36
 INFO [HANDSHAKE-/208.49.103.36] 2013-10-07 11:19:15,275 
OutboundTcpConnection.java (line 399) Handshaking version with /208.49.103.36
 WARN [OptionalTasks:1] 2013-10-07 11:19:36,696 MigrationManager.java (line 
130) [csb] Trying to get endpoint state for /208.49.103.36 ; exists false
ERROR [OptionalTasks:1] 2013-10-07 11:19:36,696 CassandraDaemon.java (line 193) 
Exception in thread Thread[OptionalTasks:1,5,main]
java.lang.NullPointerException
at 
org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:131)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{noformat}

 NPE from migration manager
 --

 Key: CASSANDRA-5815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5815
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.12
Reporter: Vishy Kasar
Assignee: Brandon Williams
Priority: Minor

 In one of our production clusters we see this error often. Looking through 
 the source, Gossiper.instance.getEndpointStateForEndpoint(endpoint) is 
 returning null for some end point. De we need any config change on our end to 
 resolve this? In any case, cassandra should be updated to protect against 
 this NPE.
 ERROR [OptionalTasks:1] 2013-07-24 13:40:38,972 AbstractCassandraDaemon.java 
 (line 132) Exception in thread Thread[OptionalTasks:1,5,main] 
 java.lang.NullPointerException 
 at 
 org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:134)
  
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
 at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
  
 at 
 

[jira] [Commented] (CASSANDRA-5815) NPE from migration manager

2013-10-07 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788237#comment-13788237
 ] 

Brandon Williams commented on CASSANDRA-5815:
-

It looks the same to me.  The good news is the error is purely cosmetic at this 
point, there's nothing left to do if the gossiper has removed the node (not to 
mention it's a fat client)

 NPE from migration manager
 --

 Key: CASSANDRA-5815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5815
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.12
Reporter: Vishy Kasar
Assignee: Brandon Williams
Priority: Minor

 In one of our production clusters we see this error often. Looking through 
 the source, Gossiper.instance.getEndpointStateForEndpoint(endpoint) is 
 returning null for some end point. De we need any config change on our end to 
 resolve this? In any case, cassandra should be updated to protect against 
 this NPE.
 ERROR [OptionalTasks:1] 2013-07-24 13:40:38,972 AbstractCassandraDaemon.java 
 (line 132) Exception in thread Thread[OptionalTasks:1,5,main] 
 java.lang.NullPointerException 
 at 
 org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:134)
  
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
 at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
  
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
  
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  
 at java.lang.Thread.run(Thread.java:662)
 It turned out that the reason for NPE was we bootstrapped a node with the 
 same token as another node. Cassandra should not throw an NPE here but log a 
 meaningful error message. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5815) NPE from migration manager

2013-10-07 Thread Chris Burroughs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788284#comment-13788284
 ] 

Chris Burroughs commented on CASSANDRA-5815:


Whoops, missed the important part for the case I am seeing but might not be 
part of the original (bootstrapping with the same token would presumably fail 
anyway).  The situation I am seeing post NPE is:
 * Bootstrapping node expects steams from NPE-node
 * NPE-node says it has no outstanding streams

And thus bootstrap never completes.

 NPE from migration manager
 --

 Key: CASSANDRA-5815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5815
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.12
Reporter: Vishy Kasar
Assignee: Brandon Williams
Priority: Minor

 In one of our production clusters we see this error often. Looking through 
 the source, Gossiper.instance.getEndpointStateForEndpoint(endpoint) is 
 returning null for some end point. De we need any config change on our end to 
 resolve this? In any case, cassandra should be updated to protect against 
 this NPE.
 ERROR [OptionalTasks:1] 2013-07-24 13:40:38,972 AbstractCassandraDaemon.java 
 (line 132) Exception in thread Thread[OptionalTasks:1,5,main] 
 java.lang.NullPointerException 
 at 
 org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:134)
  
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
 at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
  
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
  
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  
 at java.lang.Thread.run(Thread.java:662)
 It turned out that the reason for NPE was we bootstrapped a node with the 
 same token as another node. Cassandra should not throw an NPE here but log a 
 meaningful error message. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5815) NPE from migration manager

2013-10-07 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788296#comment-13788296
 ] 

Brandon Williams commented on CASSANDRA-5815:
-

[~cburroughs] I think your problem is something else, since the bootstrapping 
node has not only been marked down, but it's been down long enough to get 
removed (which is the race between the gossiper and MM causing this NPE)  I 
will note for myself though that the fat client removal should also wait until 
the node has been marked down before beginning the 30s countdown to removal.

If the node has connected but the gossiper doesn't know about it, they haven't 
gossiped yet, so there's really nothing for MM to do yet anyway.

 NPE from migration manager
 --

 Key: CASSANDRA-5815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5815
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.12
Reporter: Vishy Kasar
Assignee: Brandon Williams
Priority: Minor

 In one of our production clusters we see this error often. Looking through 
 the source, Gossiper.instance.getEndpointStateForEndpoint(endpoint) is 
 returning null for some end point. De we need any config change on our end to 
 resolve this? In any case, cassandra should be updated to protect against 
 this NPE.
 ERROR [OptionalTasks:1] 2013-07-24 13:40:38,972 AbstractCassandraDaemon.java 
 (line 132) Exception in thread Thread[OptionalTasks:1,5,main] 
 java.lang.NullPointerException 
 at 
 org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:134)
  
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
 at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
  
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
  
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  
 at java.lang.Thread.run(Thread.java:662)
 It turned out that the reason for NPE was we bootstrapped a node with the 
 same token as another node. Cassandra should not throw an NPE here but log a 
 meaningful error message. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)