Re: Unit testing Master-Worker Message Passing

2014-10-16 Thread Josh Rosen
Hi Matt, I’m not sure whether those tests will actually find this specific issue.  The tests that I linked to test Spark’s Zookeeper-based multi-master mode, whereas it sounds like you’re seeing this issue in regular standalone cluster.  In those tests, the workers disconnect from the master

Re: Unit testing Master-Worker Message Passing

2014-10-15 Thread Matthew Cheah
What's happening when I do this is that the Worker tries to get the Master actor by calling context.actorSelection(), and the RegisterWorker message gets sent to the dead letters mailbox instead of being picked up by expectMsg. I'm new to Akka and I've tried various ways to registering a mock

Re: Unit testing Master-Worker Message Passing

2014-10-15 Thread Chester Chen
You can call resolve method on ActorSelection.resolveOne() to see if the actor is still there or the path is correct. The method returns a future and you can wait for it with timeout. This way, you know the actor is live or already dead or incorrect. Another way, is to send Identify method to

Re: Unit testing Master-Worker Message Passing

2014-10-15 Thread Matthew Cheah
I think on a higher level I also want to ask why such unit testing has not actually been done in this codebase. If it's not a common practice to test message passing then I'm fine with leaving out the unit test, however I'm more curious as to why such testing was not done before. On Wed, Oct 15,

Re: Unit testing Master-Worker Message Passing

2014-10-15 Thread Josh Rosen
There are some end-to-end integration tests of Master - Worker fault-tolerance in  https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala I’ve actually been working to develop a more generalized Docker-based integration-testing framework

Re: Unit testing Master-Worker Message Passing

2014-10-15 Thread Matthew Cheah
Thanks Josh! These tests seem to cover the cases I'm looking for already =). What's interesting though is that we still ran into SPARK-3736 despite such integration tests being in place to catch it - specifically, the case when the master disconnects and reconnects, the workers should reconnect

Unit testing Master-Worker Message Passing

2014-10-14 Thread Matthew Cheah
Hi everyone, I’m adding some new message passing between the Master and Worker actors in order to address https://issues.apache.org/jira/browse/SPARK-3736 . I was wondering if these kinds of interactions are tested in the automated Jenkins test suite, and if so, where I could find some examples