Challenges implementing an "actor system" in Nim (long post!)

monster Sun, 29 Oct 2017 17:45:01 +0100

TL;DR: I want to create an “actor system” in Nim, and I need some feedback, in 
particular, on how to move data across threads.


After reading the docs, multiple posts on the forum, and trying to code a bit 
myself, I believe I am ready to present my long-term goal, and I’m asking for 
some brain-storming from the community. I actually wanted to wait with this 
until I have some basic prototype to show, but atm I don’t see a clear path 
forward, so I thought it was preferable to ask for ideas first, before I start 
investing my time into coding something that might be a dead-end, or duplicates 
existing code.

At the highest level, I want to program a networked game, with a 
client/server-cluster setup. AFAIK, if you start with a simple 
client/single-server setup, and wait until everything is implemented and 
working fine to try and add cluster support, the refactoring is probably so 
large that the idea is usually abandoned (i.e. it’s cheaper to just buy a 
bigger server). By cluster support, I mean _one single seamless world_, 
dynamically split between multiple servers, with no single-point-of-failure, 
not the usual one-“shard”-per-server design, which is a no-brainer. That is why 
I want to have scalable cluster support from the get-go. And that is also why I 
think _actors_ is the best design for this.

I’ve done most of my professional coding on the JVM, and I’m of the opinion 
that it is less-than-ideal for a soft-real-time game, with high memory demands, 
because of the GC mostly, because you cannot do low-level memory manipulation, 
and because all objects are allocated on the heap. While there are work-arounds 
for some of this, and improvements are coming in the future, I believe it was 
time for me to try something new. I also wanted to be more than just a “Java 
programmer”; as a student, I could code anything from Prolog to RISC Assembler, 
and now I feel “degenerated”, only programming in one (well, except for some 
python) language.

I literally spent years looking around, trying to decide which 
language/platform to use, instead of the JVM. I settled for Nim because of the 
following features:

  1. Thread-local heap (aka no JVM-style 
stop-the-world-GC-better-go-have-a-coffee-now)
  2. Transpiling to C/C++ (I’d like to use UE4 as my game engine)
  3. Lisp-like power, with the macros (who wouldn’t want that?)
  4. It’s fast!
  5. I like the Python syntax anyway



I’ve spent several years cooperating with someone else on the web over a Java 
actor system. In the end, I moved away due to personal reasons, and he moved to 
Clojure, but that experience gave me “strong opinions” on how a good (IMO) 
actor system should be designed. The ideal actor system for me would have the 
following requirements:

  1. Statically types actors (compile-time safety and faster runtime execution)
  2. Two-way messaging (when does anyone ever send a message, and doesn’t care 
if it arrives, or if they will ever get an answer? IMHO, everyone wants two-way 
messaging, and the actor system should not force the user to implement that 
themselves on top of one-way messaging).
  3. Cooperative multi-tasking within “actor teams” (highly “coupled” actors 
run together on the same thread, as a group, therefore saving the “context 
switch” when messaging each other)
  4. Blocking/IO/long-running tasks are “forbidden” within actors. This is a 
requirement for #3.
  5. Transactionality: If a message fails to be processed, the actor state is 
restored (note that this does not require the usual low-level STM 
implementation, because the state is accessed from a single thread only). 
Small-footprint actors can be implemented “functional style” (state is entirely 
replaced), while larger ones need “hand made” transactions (or some API that 
does it for me).
  6. Actor systems can communicate with each other over the network. Ideally, 
this should be transparent, since “sending actors” are designed to assume a 
possible failure of the “receiving actor” anyway.
  7. The possibility of transferring individual actors, or whole “actor teams”, 
across threads, and across the network.
  8. Batch processing of messages (presumably, thread global rather than per 
"actor team"); while processing a message batch, all outgoing messages are 
buffered, even if they belong to the same “team”, and sent only after the 
incoming batch is fully processed. This prevents “tight loops” within an “actor 
team”, and puts an upper bound on the processing time of a single batch, such 
that the cooperative multi-tasking is “fairer” between teams on the same 
thread, and synchronization between threads is reduced (one “synchronization” 
per batch, instead of per message).
  9. Extensible actors: the “actor” is the state, and the “events” are calls to 
procs, taking the state as parameter. So it should be possible to _extend_ an 
actor by creating new procs that work on the same state. One way to also add 
“new state” to an actor would be to model the _root_ of the actor state as a 
table, with “untyped” values (ptr?)
  10. Since asynchronous responses must be supported (the message handler is 
not required to create a reply within the call), then all messages are 
asynchronous. The message handler receives the message and a “callback”. It can 
either execute the callback immediately, or keep it for later. One can achieve 
a better performance by having separate support for synchronous and 
asynchronous messaging, but the drawback is that changing the implementation of 
a message handler from one to the other requires updating the callers too, 
causing updates to ripple all over the codebase.
  11. Messages are immutable. This should enable zero-copy messaging, when the 
message handler replies immediately/synchronously (hopefully, the most common 
case), and maybe also for all messages within the same “team”.
  12. Since this is mainly for a game, and real-time client/server games 
usually use some fixed number of “cycle” per second (20, typically?), messages 
might be targeted at a specific future cycle, such that they cannot be executed 
until that time comes.



I think some of this is similar to the [C++ Actor 
Framework](http://actor-framework.org/).

I would have called this actor system “reactor(s)” (REquest/REply ACTORs), but 
that name is already taken in Nimble, so I’m still searching for a good name 
for this.

I think most of this can be done in a reasonable amount of time, using Nim, 
threads, and channels (I’m not sure if channels are the best way to model 
inter-thread communication im this case, in particular since I’m going to have 
an unspecified number of different messages to exchange, and channels are 
fixed-type, but it’s really just an “optimization” problem, that can be 
postponed to V2).

But, there are a few things that are less than obvious (to me), atm, and this 
is why I’m posting this. Here is the list of the main problems I see atm:

  1. Need to implement “generic” communication through channels (using pointers 
to “shared” heap buffers maybe?)
  2. Enforce requirement #4 (no blocking) at compile-time.
  3. If some actor has a state into the megabytes, you can’t just clone the 
whole thing every time a change needs to be performed (requirement #3). So 
individual changes to the state need to be tracked somehow, and undone if 
needed. I might also piggy-back on that feature, to record state changes in the 
server, which are than transported to the client. (anyone done that yet?)
  4. Requirement #6 means serializable messages.
  5. But within the same process, I don’t want to serialize the messages.
  6. Requirement #7 means serializable actors.
  7. But within the same process, I don’t want to serialize the actor state.



I’m currently assuming most messages are going to be “simple” structurally. 
Therefore I expect problem #1 can be solved with some kind of shared-heap 
“generic list”, where by generic I mean it can store any type “inline”. This 
might even be possible with zero-copy, using “type IDs”, and pre-allocating 
space for the whole message before writing to it.

Problem #2 is probably a nice-to-have for V2.

Problem #3 is definitely going to be tricky, and I don’t think there is 
anything out there I can readily use to solve this. It’s like turning every 
actor state into a small in-memory embedded DB. I have spent time thinking 
about this, and I have some ideas, but idk if it’s going to be fast enough. If 
there IS anything that already solves that problem, I would like to know.

For problem #4 and #6, I could just use message-pack but this might get tricky 
if I have to write the mapping manually. Presumably, a clever macro could 
generate the mapping automatically.

Problem #5 is mostly related to batching in the cross-thread scenario. Within 
the same thread, I can allocate messages using the local heap, and have the CG 
take care of their lifetime. But across threads, I want to batch them into 
“buffers”, to reduce the number of copies and synchronization. If all messages 
are answered “immediately” (synchronously), I can free the buffer after 
delivering the replies. But if any actor chooses to reply asynchronously, I 
would have to either keep the whole buffer until the last reply is sent and 
received, or require that the message be copied. If the message needs to be 
copied, both the receiver AND the sender needs to use the copy (since the 
message is passed back to the sender together with the reply). If, OTOH, I 
decide to not use buffers, and allocate each message individually, the 
messaging becomes much simpler, at the cost of a much greater overhead per 
message (the individual shared heap allocation). Maybe I should save buffering 
for V2?

And, the biggest problem of all, AFAIK, is #7. I can’t just transfer state of a 
local heap actor to another thread without copying, and some of those actors 
(or actor teams) might grow into the multi-megabyte range, and I don’t even 
know how to do the copying with fixed-type channels, since each actor (team) 
might have a different type.

What I would like here, to solve #7, is to have multiple local heaps per 
thread, each independent of the others, and always accessed through an 
individual lock. Like that, each “actor team” would have it’s own private heap, 
and the whole thing could be passed to another thread without copying; just 
tell the other thread to use the “team lock”. I have found something similar in 
an old forum post: [Lightweight threading 
(Goroutines)](https://forum.nim-lang.org/t/736) but my use-case is somewhat 
different. Firstly, I’m not sure how goroutines work, because I never used (nor 
ever intend to use) Go. Secondly, this post seems to be about running many 
short-lived “threads”, while what I will have is few long-lived ones (aiming 
for 10-20 per real thread). The old forum post did not seem to have a 
“solution”, but maybe things changed?

I believe problem #7 can be worked-around if I manage memory “manually”. It 
will be a pain, but if destructors support got to the point where one could do 
RAII like in C++, it would be doable. That is why I was particularly excited 
about the work being done in that direction. In the end, the thread-local heap 
is just the same physical memory as the shared heap, and the thread-local GC is 
there mainly to make it’s usage more convenient, but not faster (or so I 
assume; GCs add some overhead so how could they make memory faster?). What does 
make the thread-local heap faster is the thread-local _allocator_ (I haven’t 
seen any specific reference to the alloator yet, but this is the usual way of 
“making memory management faster”, so I assume that is what Nim does). If the 
thread-local allocator cannot be subverted to support multiple independent 
local heaps, I could try to use one of the other OSS allocators out there. But 
then I feel like I’ve given up on one of the most important part of Nim; the 
built-in thread local heap, the very feature which convinced me to try Nim in 
the first place.

Slightly off topic, but related to problem #3, is finally the problem of 
running a local simulation on the client (Problem #8). More specifically, the 
server (cluster) will know about the entire state of the game world, but the 
client will only know about parts of it (since the entire world will not fit 
inside a single computer). Therefore, even with the best efforts, the client 
will sometimes produce different simulation results than the server. So the 
server has to regularly update the client with the state changes on the server. 
But the client is running the simulation locally, and therefore changing the 
state locally too. If the client wants to apply changes from the server, it 
must not only be able to replace the values that changed on the client with 
server changes, but also undo changes on the client for values that did not 
change on the server (and you don't want the server to send the _entire_ state, 
including the unchanged parts). It's kind of like running _two_ transactions on 
top of each other; the per-event transaction, and the all-the-client-changes 
transaction, the later being rollbacked before applying the server transaction 
log.

Challenges implementing an "actor system" in Nim (long post!)

Reply via email to