Hi Thiago, Let me address your questions one by one.
On Wed, Aug 22, 2012 at 1:01 AM, Thiago Negri <evoh...@gmail.com> wrote: > Hello everyone. I'm taking my first steps in Cloud Haskell and got > some unexpected behaviors. > > I used the code from Raspberry Pi in a Haskell Cloud [1] as a first > example. Did try to switch the code to use Template Haskell with no > luck, stick with the verbose style. I have pasted a version of your code that uses Template Haskell at http://hpaste.org/73520. Where did you get stuck? > I changed some of the code, from ProcessId-based messaging to typed > channel to receive the Pong; using "startSlave" to start the worker > nodes; and changed the master node to loop forever sending pings to > the worker nodes. > > The unexpected behaviors: > - Dropping a worker node while the master is running makes the master > node to crash. There are two things going on here: 1. A bug in the SimpleLocalnet backend meant that if you dropped a worker node findSlaves might not return. I have fixed this and uploaded it to Hackage as version 0.2.0.5. 2. But even with this fix, you will still need to take into account that workers may disappear once they have been reported by findSlaves. spawn will actually throw an exception if the specified node is unreachable (it is debatable whether this is the right behaviour -- see below). > - Master node do not see worker nodes started after the master process. Yes, startMaster is merely a convenience function. I have modified the documentation to specify more clearly what startMaster does: -- | 'startMaster' finds all slaves /currently/ available on the local network, -- redirects all log messages to itself, and then calls the specified process, -- passing the list of slaves nodes. -- -- Terminates when the specified process terminates. If you want to terminate -- the slaves when the master terminates, you should manually call -- 'terminateAllSlaves'. -- -- If you start more slave nodes after having started the master node, you can -- discover them with later calls to 'findSlaves', but be aware that you will -- need to call 'redirectLogHere' to redirect their logs to the master node. -- -- Note that you can use functionality of "SimpleLocalnet" directly (through -- 'Backend'), instead of using 'startMaster'/'startSlave', if the master/slave -- distinction does not suit your application. Note that with these modifications there is still something slightly unfortunate: if you delete a worker, and then restart it *at the same port*, the master will not see it. There is a very good reason for this: Cloud Haskell guarantees reliable ordered message passing, and we want a clear semantics for this (unlike, say, in Erlang, where you might send messages M1, M2 and M3 from P to Q, and Q might receive M1, M3 but not M2, under certain circumstances). We (developers of Cloud Haskell, Simon Peyton-Jones and some others) are still debating over what the best approach is here; in the meantime, if you restart a worker node, just give a different port number. Let me know if you have any other questions, and feel free to open an issue at https://github.com/haskell-distributed/distributed-process/issues?state=open if you think you found a bug. Edsko _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe