Re: Send data structures between threads?

2017-11-02 Thread peheje
Thanks mratsim!


Re: Send data structures between threads?

2017-10-29 Thread wizzardx
I don't understand it, but nice  Have added an issue on the Nim Cookbook for 
this to be added there.


Re: Send data structures between threads?

2017-10-26 Thread mratsim
In case it helps, my shared memory, Garbage Collected, data structure (and 64 
byte aligned) is this:


const FORCE_ALIGN = 64

type
  BlasBufferArray[T]  = object
dataRef: ref[ptr T]
data*: ptr UncheckedArray[T]
len*: int

proc deallocBlasBufferArray[T](dataRef: ref[ptr T]) =
  if not dataRef[].isNil:
deallocShared(dataRef[])
dataRef[] = nil

proc newBlasBuffer[T](size: int): BlasBufferArray[T] =
  ## Create a heap array aligned with FORCE_ALIGN
  new(result.dataRef, deallocBlasBufferArray)
  
  # Allocate memory, we will move the pointer, if it does not fall at a 
modulo FORCE_ALIGN boundary
  let address = cast[ByteAddress](allocShared0(sizeof(T) * size + 
FORCE_ALIGN - 1))
  
  result.dataRef[] = cast[ptr T](address)
  result.len = size
  
  if (address and (FORCE_ALIGN - 1)) == 0:
result.data = cast[ptr UncheckedArray[T]](address)
  else:
let offset = FORCE_ALIGN - (address and (FORCE_ALIGN - 1))
let data_start = cast[ptr UncheckedArray[T]](address +% offset)
result.data = data_start


I use it for shared-memory parallelism via OpenMP 
[here](https://github.com/mratsim/Arraymancer/blob/master/src/tensor/fallback/blas_l3_gemm_data_structure.nim#L30-L34).

To use it with the convenient array indexing syntax, data is a `ptr 
UncheckedArray[T]` instead of just ptr T.

dataRef is just there to make the object GC-managed, I don't use it otherwise.


Re: Send data structures between threads?

2017-10-26 Thread monster
Hi @peheje. I'm really not the expert on this, as I'm a newbie at Nim myself, 
but I'm _pretty sure_ I read that everything you send over a Channel is 
_copied_. The refs are local to the thread, so if you send data with ref, both 
the data, and what is pointed by refs, gets cloned "magically". This is why I'm 
looking into making my own "shared heap" replacements for seq/array, and 
eventually sets and tables (although, of course, I would rather use an 
_existing_ "shared heap" implementation of those structures, if one is 
available in some nimble package).


Re: Send data structures between threads?

2017-10-26 Thread peheje
Anything new on this? I'm trying to implement a simple genetic algorithm where 
each thread shall have access to a shared seq[seq[int]], its easy to partition 
the data into chunks so the threads know which items to work on, but Nim seems 
IMPLICITLY copy the data, whether I try channels, parallel statements, spawn 
statements or even "bare" threads. Am I doing something wrong?

I thought that because the object is a ref object only the ref: the address, 
would be copied?

Code here: 
[https://github.com/peheje/nim_genetic/blob/master/mutable_state.nim](https://github.com/peheje/nim_genetic/blob/master/mutable_state.nim)

I appreciate language features like parallel: with spawn does some magic in the 
background implicitly but I think being too creative might take many 
programmers by "unpleasent" surprise.


Re: Send data structures between threads?

2016-08-28 Thread Varriount
@Araq This sounds like object mapping, in which data from a database is mapped 
into a set of objects. That's quite common, and makes working with data much 
easier.


Re: Send data structures between threads?

2016-08-28 Thread Araq
> So what do you do: do you tell these teams not to use refs? Or do you tell 
> them to go ahead and use refs and garbage collection, and deal with the 
> consequences?

To be honest I would fire teams who model business taxonomies with C++ classes 
(OMG), keeping all the data in RAM (OMG) and not using a (noSQL / SQL) database 
(OMG).

But in the context of Nim, I would likely live with refs, thread local GCs and 
when somebody queries my data, I would return a copy. Note that copying a query 
result doesn't mean to "keep full copies of the data around in RAM" which you 
sort of implied.


Re: Send data structures between threads?

2016-08-28 Thread jyelon
I used the word "database" in the generic sense.

Basically, go to any good shopping website (eg, wayfair, or amazon, or google 
shopping, for example), and see what that website "knows" about coffee tables. 
If you take a little time and study the website, I think you'll realize that 
it's a surprisingly large amount of knowledge. There's a whole expert system in 
there about coffee tables - and another expert system about bedspreads, and so 
forth.

At the very base of it all, is a taxonomy:


Home Goods >
  Tools >
Drills and Rotary Tools


Some product categories fit into the taxonomy in multiple places:


Apparel >
  Shirts >
Football Jerseys

Sports >
  Sports Memorabilia >
Football Jerseys


Then, for each category of products, you have data about which products in the 
category are most popular, data to drive a machine-learning classifier that 
classifies queries to determine if they belong in that category, etcetera, 
etcetera. Just take a little time studying one of these websites, and I think 
you'll come to appreciate how much is there.

So when I say a "database" here, I don't mean that it's SQL. I'm just saying 
it's complicated data. It's plain old C++ classes, loaded into RAM - but it's 
legitimately _lots_ of C++ classes, and it takes up _lots_ of RAM.

So imagine, now, that you've got a dozen teams: one is responsible for 
implementing a machine-learning classifier that differentiates queries about 
drills from queries about rotary hammers. Another is building a software system 
whose job is to estimate the popularity of products according to a dozen 
different metrics. Another is building a recommendation system that uses 
statistics to recommend "related products." And so forth.

Now, you know that at some point, you're going to need to integrate all this 
software into a web-server, which will probably be multithreaded.

So what do you do: do you tell these teams not to use refs? Or do you tell them 
to go ahead and use refs and garbage collection, and deal with the consequences?


Re: Send data structures between threads?

2016-08-27 Thread Varriount
@Araq By "implemented", I believe jyelon meant "design"; using existing 
database software (postgresql).

Anyway, your reply still doesn't answer the question - how would he use json 
functions in one thread to act on data from another thread?


Re: Send data structures between threads?

2016-08-27 Thread Araq
> Now, I think: what if I had tried to implement this in Nim? The team that 
> implemented the product category database would probably have been written it 
> in the "normal" Nim style, with refs and garbage collection.

So you write a "database" without concurrency in mind? No locking 
infrastructure? No multi version concurrency control? I find this hard to 
believe. Plus you haven't explained why your team had to create a database from 
scratch to begin with. You talk like it happens often and everywhere. Also 
usually when you have a database you don't pass pointers around (neither ref 
nor ptr) but IDs, so you can serialize things and can ensure old IDs are not 
recycled -- solving the memory safety problem on a different level.

> I would urge the Nim designers to think of this as the real challenge for 
> multithreading: if I already have a library, can I then take the data from 
> that library and put it in shared memory?

I dunno about "shared memory" but I use all kinds of Nim libraries in my 
multithreaded programs.

> If the latter, then that's seriously limiting to how realistic it is to craft 
> large multithreaded programs.

Pathos doesn't help to get your point across. Erlang too is based on thread 
local heaps and message passing copies. I'm not saying that I love this model 
too much, but it's definitely a valid design for version 1 of Nim.

> I can currently circumvent it with clever use of typecasts, but will future 
> versions of the compiler be even more aggressive in trying to stop me? Will 
> it always be feasible to use real shared memory in Nim?

It's a single typecast as far as I can see that you can wrap in a template. And 
yes, it will always be feasible to use real shared memory in Nim and we expect 
the situation to improve in the future, not to get worse. ;) 


Re: Send data structures between threads?

2016-08-27 Thread jyelon
When I think about how this plays out in a large multithreaded program, it 
worries me.

For example, I used to work on a program that served queries pertaining to 
products for a shopping website. This program used to load tons of data at 
program initialization time: it would load up the database of product 
categories, it loaded tables of statistics about how often various products 
were purchased, it loaded a script that helped guide serving, and so forth. 
There was a lot of data, in hundreds of classes, and it took up gigabytes, so 
keeping multiple copies in RAM wouldn't have been an option.

The libraries that handled this data weren't necessarily written _for_ this 
multithreaded server. For example, the product category database was used in 
dozens of programs, most of which were single threaded. The scripting language 
was just a scripting language - it was used in many programs, again, most of 
which were single-threaded.

Now, I think: what if I had tried to implement this in Nim? The team that 
implemented the product category database would probably have been written it 
in the "normal" Nim style, with refs and garbage collection. Likewise for the 
scripting language. Then, the team that wrote the product server would have 
looked at these libraries and realized that they couldn't use them in a 
multithreaded server.

Of course, I can do true sharing using Boehm (ie, using established libraries 
in a multithreaded program), but the compiler is trying very hard to stop me 
from doing this. I can currently circumvent it with clever use of typecasts, 
but will future versions of the compiler be even more aggressive in trying to 
stop me? Will it always be feasible to use real shared memory in Nim?

I would urge the Nim designers to think of this as the real challenge for 
multithreading: if I already have a library, can I then take the data from that 
library and put it in shared memory? Or do I have to rewrite the library? If 
the latter, then that's seriously limiting to how realistic it is to craft 
large multithreaded programs. 


Re: Send data structures between threads?

2016-08-27 Thread Varriount
In those cases, do you really need to access the exact same memory, or will a 
copy do?


Re: Send data structures between threads?

2016-08-26 Thread jyelon
So, this only sounds half-usable: yes, I can pass the json to some other 
thread, but then when it arrives at the other thread, I can't call library 
functions like "getFields" or even 'x == y' to examine the json (these 
functions take refs). So now I have a ptr to a json object that I can't pass to 
the json library. That's not really so helpful.


Re: Send data structures between threads?

2016-08-25 Thread Araq
> How do I avoid accidentally trashing the refcounts?

protect and dispose only give you a `pointer` that you can cast to `ptr 
JsonObj` and so RCs are not affected.

> But as you mentioned before, I can force it with a cast. Is that part of the 
> solution you're suggesting?

Well yes. But you really need to be careful and even then it doesn't support 
multiple threads creating data and adding it to an existing datastructure 
wihout copies, so Boehm may indeed be what you want.


Re: Send data structures between threads?

2016-08-25 Thread jyelon
Say, I never did get an answer to either of the questions above. I'm still 
curious:

1\. If the json is stored in a thread-local heap, then I let other threads 
reference the json, those other threads are going to tend to touch the 
refcounts. But the refcounts, presumably, aren't atomic ints. How do I avoid 
accidentally trashing the refcounts?

2\. The compiler is trying very hard to prevent me from passing a ref to the 
json from the thread that created it to any other thread. (That's what gc-safe 
is all about). But as you mentioned before, I can force it with a cast. Is that 
part of the solution you're suggesting?


Re: Send data structures between threads?

2016-08-12 Thread jyelon
I don't know what "protect" the json means. I looked in the manual, it doesn't 
mention a protect statement. I'm also not entirely sure about "dispose", the 
word "dispose" doesn't appear in the manual, but I know there's a dispose 
statement that ignores a return value. But I don't see where that comes in. I 
also don't know what it means to "wrap a dispose in a ref." Long story short: 
you lost me.

But there are two more things I don't understand:

1\. If the json is stored in a thread-local heap, then I let other threads 
reference the json, those other threads are going to tend to touch the 
refcounts. But the refcounts, presumably, aren't atomic ints. How do I avoid 
accidentally trashing the refcounts?

2\. The compiler is trying very hard to prevent me from passing a ref to the 
json from the thread that created it to any other thread. (That's what gc-safe 
is all about). But as you mentioned before, I can force it with a cast. Is that 
part of the solution you're suggesting?

Edit: I found a protect and a dispose in the system module. They're clearly 
intended for something having to do with referencing data across heaps, but I 
just haven't been able to intuit the details. Apparently, dispose doesn't do 
what I thought it did. 


Re: Send data structures between threads?

2016-08-12 Thread Araq
Boehm is not a precise GC and thus "dirty".

> Without boehm, I have to clone the code for the json parser, then alter it to 
> use createShared. My data structures won't be garbage collected. Is that 
> really cleaner?

No, you only have to `protect` and `dispose` the Json. You can wrap the dispose 
in another ref with a finalizer and have 100% automatic memory management with 
thread local GCs. That still doesn't make it the most beautiful memory 
management design out there, but it's not too bad.


Re: Send data structures between threads?

2016-08-11 Thread jyelon
Thanks for the casting idea.

I really don't think you should think of boehm as "dirty." I mean, compared to 
what?

Real multi-threaded programs need to store structured data in shared memory. 
Just to give an example, let's say that this structured data is json. With 
boehm, I can load the shared json using the existing json module, and the json 
will be garbage collected when it's no longer referenced. Without boehm, I have 
to clone the code for the json parser, then alter it to use createShared. My 
data structures won't be garbage collected. Is that really cleaner?

Or maybe, you just feel it's dirty to use shared structured data at all. If 
that's the case, then I don't know what to say, other than: I've spoken to many 
pure functional programmers who felt it was dirty to use mutation, because you 
can really shoot yourself in the foot with mutation. In a way, they're not 
wrong. But I'm not giving up on imperative programming.


Re: Send data structures between threads?

2016-08-11 Thread Araq
We don't really want to have a Nim dialect per --gc option so the frontend 
checking is not aware of `--gc:boehm`. You can `cast` GC safety into existance 
though and be as dirty as Boehm allows for.

Sketch:


type EnforcedGcSafe = proc() {.gcsafe.}

proc myproc =
  discard "... access shared heap here..."

spawn cast[EnforcedGcSafe](myproc)