[jira] [Commented] (CASSANDRA-9100) Gossip is inadequately tested

2015-05-11 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538653#comment-14538653
 ] 

Ariel Weisberg commented on CASSANDRA-9100:
---

More gossip related pain? Maybe not super productive, but I am interested in 
how much we are investing in gossip related issues.

 Gossip is inadequately tested
 -

 Key: CASSANDRA-9100
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9100
 Project: Cassandra
  Issue Type: Test
  Components: Core
Reporter: Ariel Weisberg

 We found a few unit tests, but nothing that exercises Gossip under 
 challenging conditions. Maybe consider a long test that hooks up some 
 gossipers over a fake network and then do fault injection on that fake 
 network. Uni-directional and bi-directional partitions, delayed delivery, out 
 of order delivery if that is something that they can see in practice. 
 Connects/disconnects.
 Also play with bad clocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9100) Gossip is inadequately tested

2015-04-24 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511858#comment-14511858
 ] 

Jason Brown commented on CASSANDRA-9100:


TL;DR I think some dtests/ccm are the way to go for now.

Last summer, I built a simulator for our gossip so I could understand it 
further and see where it starts to break down. It took me about 2.5 weeks just 
to pull apart the gossip components from the rest of the system so I could run 
them in isolation - meaning, have more than one Gossiper executing in a siungle 
JVM. The changes included a series hack that broke many other components, like 
MessasingService (but that was acceptable for the simulator), and I'm not sure 
the rest of cassandra was totally legit with the hacks, either (except 
Gossiper, of course). I did have a workable simulator after the effort, but 
didn't have much time to work on it beyond that (maybe prep work for my various 
gossip talks) to invest into the simulator.

This being said, I think it's an incredibly non-trivial effort to tease gossip 
out for testing due to all the singletons, as [~brandon.williams] mentioned. I 
think some good wins, however, could be gained by adding in some dtests - but 
then, the question is what to monitor for indications of sucess/failure?. I'm 
not sure there's a fantastic answer here. The (limited) possibilities include 
nodetool output, log file scraping, and ... ? I'd be most inclined for nodetool 
output, but we already scrape log files in dtests (I think), so that's not 
without precendent; but it also depends on what is being tested.

Thinking on it more, and, if it's even possible, it might be neat to script 
some iptables manipulation into dtests to block IPs/ports from communicating, 
then observe that gossip behaves as expected. Think of it as mini-Jepsen, and 
testing gossip in the face of network partitions seems like apropos place for 
that kind of testing.


 Gossip is inadequately tested
 -

 Key: CASSANDRA-9100
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9100
 Project: Cassandra
  Issue Type: Test
  Components: Core
Reporter: Ariel Weisberg

 We found a few unit tests, but nothing that exercises Gossip under 
 challenging conditions. Maybe consider a long test that hooks up some 
 gossipers over a fake network and then do fault injection on that fake 
 network. Uni-directional and bi-directional partitions, delayed delivery, out 
 of order delivery if that is something that they can see in practice. 
 Connects/disconnects.
 Also play with bad clocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9100) Gossip is inadequately tested

2015-04-23 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510112#comment-14510112
 ] 

Jonathan Ellis commented on CASSANDRA-9100:
---

[~jasobrown] is it worth trying to make gossip testable, before ripping it 
apart?

 Gossip is inadequately tested
 -

 Key: CASSANDRA-9100
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9100
 Project: Cassandra
  Issue Type: Test
  Components: Core
Reporter: Ariel Weisberg

 We found a few unit tests, but nothing that exercises Gossip under 
 challenging conditions. Maybe consider a long test that hooks up some 
 gossipers over a fake network and then do fault injection on that fake 
 network. Uni-directional and bi-directional partitions, delayed delivery, out 
 of order delivery if that is something that they can see in practice. 
 Connects/disconnects.
 Also play with bad clocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9100) Gossip is inadequately tested

2015-04-20 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503268#comment-14503268
 ] 

Brandon Williams commented on CASSANDRA-9100:
-

Unfortunately, this is where singletons begin to bite us.  We can't just spin 
up a bunch of gossipers in a unit test, we have to spin up a bunch of JVMs in a 
dtest :(

 Gossip is inadequately tested
 -

 Key: CASSANDRA-9100
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9100
 Project: Cassandra
  Issue Type: Test
  Components: Core
Reporter: Ariel Weisberg

 We found a few unit tests, but nothing that exercises Gossip under 
 challenging conditions. Maybe consider a long test that hooks up some 
 gossipers over a fake network and then do fault injection on that fake 
 network. Uni-directional and bi-directional partitions, delayed delivery, out 
 of order delivery if that is something that they can see in practice. 
 Connects/disconnects.
 Also play with bad clocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)