Re: Send data structures between threads?
Thanks mratsim!
Re: Send data structures between threads?
I don't understand it, but nice Have added an issue on the Nim Cookbook for this to be added there.
Re: Send data structures between threads?
In case it helps, my shared memory, Garbage Collected, data structure (and 64 byte aligned) is this: const FORCE_ALIGN = 64 type BlasBufferArray[T] = object dataRef: ref[ptr T] data*: ptr UncheckedArray[T] len*: int proc deallocBlasBufferArray[T](dataRef: ref[ptr T]) = if not dataRef[].isNil: deallocShared(dataRef[]) dataRef[] = nil proc newBlasBuffer[T](size: int): BlasBufferArray[T] = ## Create a heap array aligned with FORCE_ALIGN new(result.dataRef, deallocBlasBufferArray) # Allocate memory, we will move the pointer, if it does not fall at a modulo FORCE_ALIGN boundary let address = cast[ByteAddress](allocShared0(sizeof(T) * size + FORCE_ALIGN - 1)) result.dataRef[] = cast[ptr T](address) result.len = size if (address and (FORCE_ALIGN - 1)) == 0: result.data = cast[ptr UncheckedArray[T]](address) else: let offset = FORCE_ALIGN - (address and (FORCE_ALIGN - 1)) let data_start = cast[ptr UncheckedArray[T]](address +% offset) result.data = data_start I use it for shared-memory parallelism via OpenMP [here](https://github.com/mratsim/Arraymancer/blob/master/src/tensor/fallback/blas_l3_gemm_data_structure.nim#L30-L34). To use it with the convenient array indexing syntax, data is a `ptr UncheckedArray[T]` instead of just ptr T. dataRef is just there to make the object GC-managed, I don't use it otherwise.
Re: Send data structures between threads?
Hi @peheje. I'm really not the expert on this, as I'm a newbie at Nim myself, but I'm _pretty sure_ I read that everything you send over a Channel is _copied_. The refs are local to the thread, so if you send data with ref, both the data, and what is pointed by refs, gets cloned "magically". This is why I'm looking into making my own "shared heap" replacements for seq/array, and eventually sets and tables (although, of course, I would rather use an _existing_ "shared heap" implementation of those structures, if one is available in some nimble package).
Re: Send data structures between threads?
Anything new on this? I'm trying to implement a simple genetic algorithm where each thread shall have access to a shared seq[seq[int]], its easy to partition the data into chunks so the threads know which items to work on, but Nim seems IMPLICITLY copy the data, whether I try channels, parallel statements, spawn statements or even "bare" threads. Am I doing something wrong? I thought that because the object is a ref object only the ref: the address, would be copied? Code here: [https://github.com/peheje/nim_genetic/blob/master/mutable_state.nim](https://github.com/peheje/nim_genetic/blob/master/mutable_state.nim) I appreciate language features like parallel: with spawn does some magic in the background implicitly but I think being too creative might take many programmers by "unpleasent" surprise.
Re: Send data structures between threads?
@Araq This sounds like object mapping, in which data from a database is mapped into a set of objects. That's quite common, and makes working with data much easier.
Re: Send data structures between threads?
> So what do you do: do you tell these teams not to use refs? Or do you tell > them to go ahead and use refs and garbage collection, and deal with the > consequences? To be honest I would fire teams who model business taxonomies with C++ classes (OMG), keeping all the data in RAM (OMG) and not using a (noSQL / SQL) database (OMG). But in the context of Nim, I would likely live with refs, thread local GCs and when somebody queries my data, I would return a copy. Note that copying a query result doesn't mean to "keep full copies of the data around in RAM" which you sort of implied.
Re: Send data structures between threads?
I used the word "database" in the generic sense. Basically, go to any good shopping website (eg, wayfair, or amazon, or google shopping, for example), and see what that website "knows" about coffee tables. If you take a little time and study the website, I think you'll realize that it's a surprisingly large amount of knowledge. There's a whole expert system in there about coffee tables - and another expert system about bedspreads, and so forth. At the very base of it all, is a taxonomy: Home Goods > Tools > Drills and Rotary Tools Some product categories fit into the taxonomy in multiple places: Apparel > Shirts > Football Jerseys Sports > Sports Memorabilia > Football Jerseys Then, for each category of products, you have data about which products in the category are most popular, data to drive a machine-learning classifier that classifies queries to determine if they belong in that category, etcetera, etcetera. Just take a little time studying one of these websites, and I think you'll come to appreciate how much is there. So when I say a "database" here, I don't mean that it's SQL. I'm just saying it's complicated data. It's plain old C++ classes, loaded into RAM - but it's legitimately _lots_ of C++ classes, and it takes up _lots_ of RAM. So imagine, now, that you've got a dozen teams: one is responsible for implementing a machine-learning classifier that differentiates queries about drills from queries about rotary hammers. Another is building a software system whose job is to estimate the popularity of products according to a dozen different metrics. Another is building a recommendation system that uses statistics to recommend "related products." And so forth. Now, you know that at some point, you're going to need to integrate all this software into a web-server, which will probably be multithreaded. So what do you do: do you tell these teams not to use refs? Or do you tell them to go ahead and use refs and garbage collection, and deal with the consequences?
Re: Send data structures between threads?
@Araq By "implemented", I believe jyelon meant "design"; using existing database software (postgresql). Anyway, your reply still doesn't answer the question - how would he use json functions in one thread to act on data from another thread?
Re: Send data structures between threads?
> Now, I think: what if I had tried to implement this in Nim? The team that > implemented the product category database would probably have been written it > in the "normal" Nim style, with refs and garbage collection. So you write a "database" without concurrency in mind? No locking infrastructure? No multi version concurrency control? I find this hard to believe. Plus you haven't explained why your team had to create a database from scratch to begin with. You talk like it happens often and everywhere. Also usually when you have a database you don't pass pointers around (neither ref nor ptr) but IDs, so you can serialize things and can ensure old IDs are not recycled -- solving the memory safety problem on a different level. > I would urge the Nim designers to think of this as the real challenge for > multithreading: if I already have a library, can I then take the data from > that library and put it in shared memory? I dunno about "shared memory" but I use all kinds of Nim libraries in my multithreaded programs. > If the latter, then that's seriously limiting to how realistic it is to craft > large multithreaded programs. Pathos doesn't help to get your point across. Erlang too is based on thread local heaps and message passing copies. I'm not saying that I love this model too much, but it's definitely a valid design for version 1 of Nim. > I can currently circumvent it with clever use of typecasts, but will future > versions of the compiler be even more aggressive in trying to stop me? Will > it always be feasible to use real shared memory in Nim? It's a single typecast as far as I can see that you can wrap in a template. And yes, it will always be feasible to use real shared memory in Nim and we expect the situation to improve in the future, not to get worse. ;)
Re: Send data structures between threads?
When I think about how this plays out in a large multithreaded program, it worries me. For example, I used to work on a program that served queries pertaining to products for a shopping website. This program used to load tons of data at program initialization time: it would load up the database of product categories, it loaded tables of statistics about how often various products were purchased, it loaded a script that helped guide serving, and so forth. There was a lot of data, in hundreds of classes, and it took up gigabytes, so keeping multiple copies in RAM wouldn't have been an option. The libraries that handled this data weren't necessarily written _for_ this multithreaded server. For example, the product category database was used in dozens of programs, most of which were single threaded. The scripting language was just a scripting language - it was used in many programs, again, most of which were single-threaded. Now, I think: what if I had tried to implement this in Nim? The team that implemented the product category database would probably have been written it in the "normal" Nim style, with refs and garbage collection. Likewise for the scripting language. Then, the team that wrote the product server would have looked at these libraries and realized that they couldn't use them in a multithreaded server. Of course, I can do true sharing using Boehm (ie, using established libraries in a multithreaded program), but the compiler is trying very hard to stop me from doing this. I can currently circumvent it with clever use of typecasts, but will future versions of the compiler be even more aggressive in trying to stop me? Will it always be feasible to use real shared memory in Nim? I would urge the Nim designers to think of this as the real challenge for multithreading: if I already have a library, can I then take the data from that library and put it in shared memory? Or do I have to rewrite the library? If the latter, then that's seriously limiting to how realistic it is to craft large multithreaded programs.
Re: Send data structures between threads?
In those cases, do you really need to access the exact same memory, or will a copy do?
Re: Send data structures between threads?
So, this only sounds half-usable: yes, I can pass the json to some other thread, but then when it arrives at the other thread, I can't call library functions like "getFields" or even 'x == y' to examine the json (these functions take refs). So now I have a ptr to a json object that I can't pass to the json library. That's not really so helpful.
Re: Send data structures between threads?
> How do I avoid accidentally trashing the refcounts? protect and dispose only give you a `pointer` that you can cast to `ptr JsonObj` and so RCs are not affected. > But as you mentioned before, I can force it with a cast. Is that part of the > solution you're suggesting? Well yes. But you really need to be careful and even then it doesn't support multiple threads creating data and adding it to an existing datastructure wihout copies, so Boehm may indeed be what you want.
Re: Send data structures between threads?
Say, I never did get an answer to either of the questions above. I'm still curious: 1\. If the json is stored in a thread-local heap, then I let other threads reference the json, those other threads are going to tend to touch the refcounts. But the refcounts, presumably, aren't atomic ints. How do I avoid accidentally trashing the refcounts? 2\. The compiler is trying very hard to prevent me from passing a ref to the json from the thread that created it to any other thread. (That's what gc-safe is all about). But as you mentioned before, I can force it with a cast. Is that part of the solution you're suggesting?
Re: Send data structures between threads?
I don't know what "protect" the json means. I looked in the manual, it doesn't mention a protect statement. I'm also not entirely sure about "dispose", the word "dispose" doesn't appear in the manual, but I know there's a dispose statement that ignores a return value. But I don't see where that comes in. I also don't know what it means to "wrap a dispose in a ref." Long story short: you lost me. But there are two more things I don't understand: 1\. If the json is stored in a thread-local heap, then I let other threads reference the json, those other threads are going to tend to touch the refcounts. But the refcounts, presumably, aren't atomic ints. How do I avoid accidentally trashing the refcounts? 2\. The compiler is trying very hard to prevent me from passing a ref to the json from the thread that created it to any other thread. (That's what gc-safe is all about). But as you mentioned before, I can force it with a cast. Is that part of the solution you're suggesting? Edit: I found a protect and a dispose in the system module. They're clearly intended for something having to do with referencing data across heaps, but I just haven't been able to intuit the details. Apparently, dispose doesn't do what I thought it did.
Re: Send data structures between threads?
Boehm is not a precise GC and thus "dirty". > Without boehm, I have to clone the code for the json parser, then alter it to > use createShared. My data structures won't be garbage collected. Is that > really cleaner? No, you only have to `protect` and `dispose` the Json. You can wrap the dispose in another ref with a finalizer and have 100% automatic memory management with thread local GCs. That still doesn't make it the most beautiful memory management design out there, but it's not too bad.
Re: Send data structures between threads?
Thanks for the casting idea. I really don't think you should think of boehm as "dirty." I mean, compared to what? Real multi-threaded programs need to store structured data in shared memory. Just to give an example, let's say that this structured data is json. With boehm, I can load the shared json using the existing json module, and the json will be garbage collected when it's no longer referenced. Without boehm, I have to clone the code for the json parser, then alter it to use createShared. My data structures won't be garbage collected. Is that really cleaner? Or maybe, you just feel it's dirty to use shared structured data at all. If that's the case, then I don't know what to say, other than: I've spoken to many pure functional programmers who felt it was dirty to use mutation, because you can really shoot yourself in the foot with mutation. In a way, they're not wrong. But I'm not giving up on imperative programming.
Re: Send data structures between threads?
We don't really want to have a Nim dialect per --gc option so the frontend checking is not aware of `--gc:boehm`. You can `cast` GC safety into existance though and be as dirty as Boehm allows for. Sketch: type EnforcedGcSafe = proc() {.gcsafe.} proc myproc = discard "... access shared heap here..." spawn cast[EnforcedGcSafe](myproc)