TL;DR: I want to create an “actor system” in Nim, and I need some feedback, in particular, on how to move data across threads.
After reading the docs, multiple posts on the forum, and trying to code a bit myself, I believe I am ready to present my long-term goal, and I’m asking for some brain-storming from the community. I actually wanted to wait with this until I have some basic prototype to show, but atm I don’t see a clear path forward, so I thought it was preferable to ask for ideas first, before I start investing my time into coding something that might be a dead-end, or duplicates existing code. At the highest level, I want to program a networked game, with a client/server-cluster setup. AFAIK, if you start with a simple client/single-server setup, and wait until everything is implemented and working fine to try and add cluster support, the refactoring is probably so large that the idea is usually abandoned (i.e. it’s cheaper to just buy a bigger server). By cluster support, I mean _one single seamless world_, dynamically split between multiple servers, with no single-point-of-failure, not the usual one-“shard”-per-server design, which is a no-brainer. That is why I want to have scalable cluster support from the get-go. And that is also why I think _actors_ is the best design for this. I’ve done most of my professional coding on the JVM, and I’m of the opinion that it is less-than-ideal for a soft-real-time game, with high memory demands, because of the GC mostly, because you cannot do low-level memory manipulation, and because all objects are allocated on the heap. While there are work-arounds for some of this, and improvements are coming in the future, I believe it was time for me to try something new. I also wanted to be more than just a “Java programmer”; as a student, I could code anything from Prolog to RISC Assembler, and now I feel “degenerated”, only programming in one (well, except for some python) language. I literally spent years looking around, trying to decide which language/platform to use, instead of the JVM. I settled for Nim because of the following features: 1. Thread-local heap (aka no JVM-style stop-the-world-GC-better-go-have-a-coffee-now) 2. Transpiling to C/C++ (I’d like to use UE4 as my game engine) 3. Lisp-like power, with the macros (who wouldn’t want that?) 4. It’s fast! 5. I like the Python syntax anyway I’ve spent several years cooperating with someone else on the web over a Java actor system. In the end, I moved away due to personal reasons, and he moved to Clojure, but that experience gave me “strong opinions” on how a good (IMO) actor system should be designed. The ideal actor system for me would have the following requirements: 1. Statically types actors (compile-time safety and faster runtime execution) 2. Two-way messaging (when does anyone ever send a message, and doesn’t care if it arrives, or if they will ever get an answer? IMHO, everyone wants two-way messaging, and the actor system should not force the user to implement that themselves on top of one-way messaging). 3. Cooperative multi-tasking within “actor teams” (highly “coupled” actors run together on the same thread, as a group, therefore saving the “context switch” when messaging each other) 4. Blocking/IO/long-running tasks are “forbidden” within actors. This is a requirement for #3. 5. Transactionality: If a message fails to be processed, the actor state is restored (note that this does not require the usual low-level STM implementation, because the state is accessed from a single thread only). Small-footprint actors can be implemented “functional style” (state is entirely replaced), while larger ones need “hand made” transactions (or some API that does it for me). 6. Actor systems can communicate with each other over the network. Ideally, this should be transparent, since “sending actors” are designed to assume a possible failure of the “receiving actor” anyway. 7. The possibility of transferring individual actors, or whole “actor teams”, across threads, and across the network. 8. Batch processing of messages (presumably, thread global rather than per "actor team"); while processing a message batch, all outgoing messages are buffered, even if they belong to the same “team”, and sent only after the incoming batch is fully processed. This prevents “tight loops” within an “actor team”, and puts an upper bound on the processing time of a single batch, such that the cooperative multi-tasking is “fairer” between teams on the same thread, and synchronization between threads is reduced (one “synchronization” per batch, instead of per message). 9. Extensible actors: the “actor” is the state, and the “events” are calls to procs, taking the state as parameter. So it should be possible to _extend_ an actor by creating new procs that work on the same state. One way to also add “new state” to an actor would be to model the _root_ of the actor state as a table, with “untyped” values (ptr?) 10. Since asynchronous responses must be supported (the message handler is not required to create a reply within the call), then all messages are asynchronous. The message handler receives the message and a “callback”. It can either execute the callback immediately, or keep it for later. One can achieve a better performance by having separate support for synchronous and asynchronous messaging, but the drawback is that changing the implementation of a message handler from one to the other requires updating the callers too, causing updates to ripple all over the codebase. 11. Messages are immutable. This should enable zero-copy messaging, when the message handler replies immediately/synchronously (hopefully, the most common case), and maybe also for all messages within the same “team”. 12. Since this is mainly for a game, and real-time client/server games usually use some fixed number of “cycle” per second (20, typically?), messages might be targeted at a specific future cycle, such that they cannot be executed until that time comes. I think some of this is similar to the [C++ Actor Framework](http://actor-framework.org/). I would have called this actor system “reactor(s)” (REquest/REply ACTORs), but that name is already taken in Nimble, so I’m still searching for a good name for this. I think most of this can be done in a reasonable amount of time, using Nim, threads, and channels (I’m not sure if channels are the best way to model inter-thread communication im this case, in particular since I’m going to have an unspecified number of different messages to exchange, and channels are fixed-type, but it’s really just an “optimization” problem, that can be postponed to V2). But, there are a few things that are less than obvious (to me), atm, and this is why I’m posting this. Here is the list of the main problems I see atm: 1. Need to implement “generic” communication through channels (using pointers to “shared” heap buffers maybe?) 2. Enforce requirement #4 (no blocking) at compile-time. 3. If some actor has a state into the megabytes, you can’t just clone the whole thing every time a change needs to be performed (requirement #3). So individual changes to the state need to be tracked somehow, and undone if needed. I might also piggy-back on that feature, to record state changes in the server, which are than transported to the client. (anyone done that yet?) 4. Requirement #6 means serializable messages. 5. But within the same process, I don’t want to serialize the messages. 6. Requirement #7 means serializable actors. 7. But within the same process, I don’t want to serialize the actor state. I’m currently assuming most messages are going to be “simple” structurally. Therefore I expect problem #1 can be solved with some kind of shared-heap “generic list”, where by generic I mean it can store any type “inline”. This might even be possible with zero-copy, using “type IDs”, and pre-allocating space for the whole message before writing to it. Problem #2 is probably a nice-to-have for V2. Problem #3 is definitely going to be tricky, and I don’t think there is anything out there I can readily use to solve this. It’s like turning every actor state into a small in-memory embedded DB. I have spent time thinking about this, and I have some ideas, but idk if it’s going to be fast enough. If there IS anything that already solves that problem, I would like to know. For problem #4 and #6, I could just use message-pack but this might get tricky if I have to write the mapping manually. Presumably, a clever macro could generate the mapping automatically. Problem #5 is mostly related to batching in the cross-thread scenario. Within the same thread, I can allocate messages using the local heap, and have the CG take care of their lifetime. But across threads, I want to batch them into “buffers”, to reduce the number of copies and synchronization. If all messages are answered “immediately” (synchronously), I can free the buffer after delivering the replies. But if any actor chooses to reply asynchronously, I would have to either keep the whole buffer until the last reply is sent and received, or require that the message be copied. If the message needs to be copied, both the receiver AND the sender needs to use the copy (since the message is passed back to the sender together with the reply). If, OTOH, I decide to not use buffers, and allocate each message individually, the messaging becomes much simpler, at the cost of a much greater overhead per message (the individual shared heap allocation). Maybe I should save buffering for V2? And, the biggest problem of all, AFAIK, is #7. I can’t just transfer state of a local heap actor to another thread without copying, and some of those actors (or actor teams) might grow into the multi-megabyte range, and I don’t even know how to do the copying with fixed-type channels, since each actor (team) might have a different type. What I would like here, to solve #7, is to have multiple local heaps per thread, each independent of the others, and always accessed through an individual lock. Like that, each “actor team” would have it’s own private heap, and the whole thing could be passed to another thread without copying; just tell the other thread to use the “team lock”. I have found something similar in an old forum post: [Lightweight threading (Goroutines)](https://forum.nim-lang.org/t/736) but my use-case is somewhat different. Firstly, I’m not sure how goroutines work, because I never used (nor ever intend to use) Go. Secondly, this post seems to be about running many short-lived “threads”, while what I will have is few long-lived ones (aiming for 10-20 per real thread). The old forum post did not seem to have a “solution”, but maybe things changed? I believe problem #7 can be worked-around if I manage memory “manually”. It will be a pain, but if destructors support got to the point where one could do RAII like in C++, it would be doable. That is why I was particularly excited about the work being done in that direction. In the end, the thread-local heap is just the same physical memory as the shared heap, and the thread-local GC is there mainly to make it’s usage more convenient, but not faster (or so I assume; GCs add some overhead so how could they make memory faster?). What does make the thread-local heap faster is the thread-local _allocator_ (I haven’t seen any specific reference to the alloator yet, but this is the usual way of “making memory management faster”, so I assume that is what Nim does). If the thread-local allocator cannot be subverted to support multiple independent local heaps, I could try to use one of the other OSS allocators out there. But then I feel like I’ve given up on one of the most important part of Nim; the built-in thread local heap, the very feature which convinced me to try Nim in the first place. Slightly off topic, but related to problem #3, is finally the problem of running a local simulation on the client (Problem #8). More specifically, the server (cluster) will know about the entire state of the game world, but the client will only know about parts of it (since the entire world will not fit inside a single computer). Therefore, even with the best efforts, the client will sometimes produce different simulation results than the server. So the server has to regularly update the client with the state changes on the server. But the client is running the simulation locally, and therefore changing the state locally too. If the client wants to apply changes from the server, it must not only be able to replace the values that changed on the client with server changes, but also undo changes on the client for values that did not change on the server (and you don't want the server to send the _entire_ state, including the unchanged parts). It's kind of like running _two_ transactions on top of each other; the per-event transaction, and the all-the-client-changes transaction, the later being rollbacked before applying the server transaction log.