@araq: > > My preferred option is to replace the whole file with something that works > > Yes.
I haven't dropped this project, but have spent extra time in stress testing and considering the `std/channels` library's use for its eventual application, especially in a new `std/threadpools` library. Current status is as follows: 1. The **serious data race problems have been eliminated** as far as I can tell with extensive stress testing and code review. To do this, I changed the whole algorithm away from the original file which was based on an implementation of a lazy list from the paper (The paper implementation worked almost identically to the `MVar` based `Chan`'s used by Haskell) because it had been changed to use buffers and that made the double-wait-condition algorithm unworkable and the reason for the data race. 2. I **kept the ability to move the channel freely between threads** as a reference based (actually pointer based) object that is tracked by atomic reference counting. 3. Since it is now a reference based object, **it needs to be "new upped" with the required buffer size** (defaults to infinite, can be buffer size of one for unbuffered) and can't be initialized as a value struct as the original built-in version. 4. I **dropped the `open`/`close` proc 's** which originally would initialize the value based object from the API because they no longer can do anything and if we did use them to set a meaningful flag to stop sending/receiving, it would be dangerous when the channel could possibly be referenced by multiple threads as the API has no means of indicating whether a send/receive completely failed (even the "try..." proc's only indicate whether there is room/data). 5. I **eliminated the whole idea of and the required items in the API to control a global memory pool** from which the channel objects and channel buffers were formerly allocated as per your instruction. This was a premature optimization that would have no effect on the actual use of the library. It likely was effective in the paper that the former code referenced as the allocation/deallocation of a chain of lazy list elements would take significant time, but my implementation uses either a fixed buffer or allocates a series of significantly sized buffers (for the "infinite" buffer), which takes insignificant time as compared to normal operation time. More on this at the end. 6. This **project is complete and I will post the PR** once I complete the documentation. > > However, the need for allocation from a sized memory pool is frequent > > enough that I think that there should exist a `sys/mempool` library > > I think it's too early for that. We can always add it later and the initial > channel implementation can start without such an optimization too. **Your intuition on this was absolutely correct: The only thing that the memory pool optimization did was increase the speed of "new ups" of channels and channel buffers, and that is already not the bottleneck** with a speed in the order of ten million a second for buffered channels; for the use of these channels for inter-thread communication, the speed of sending and receiving a single element (non-buffered channel), which is the same thing as a new `FlowVar[T]`, is in the order of 150,000 per second, which is the bottleneck on channel's use and which a memory pool would not help as it is just caused by the overhead of more frequent use of condition variables. **So moving on to considerations about the design of the new `std/threadpools` library** , as follows: 1. Most **other languages have thread pools that are built into the language runtime** , either enabled by a flag such as our `--threads:on` or Haskell's `-threaded` or always available, and then accessed as required by importing a library or two rather than having those libraries automatically imported as we do currently. Other languages then allow the thread pool to run in global space and just be accessed as required. **It seems to me that we might consider not automatically importing any thread or channel libraries for version 2.0** and just import them as required. **Having the thread pool implementation in the runtime when `--threads:on` then would make them available globally** and they could be resized as necessary (or an infinite channel could be used as the pool of awaiting threads). 2. **An alternate is to have the thread pool library create a local thread pool** , which could even be passed around as necessary and as many thread pools as desired could be created for different purposes. 3. Another consideration is that currently, **we do not make exceptions that occurred in a thread to be caught and made available to the `spawn` 'ing thread, nor do we build in a means of cancellation of threads** in the thread pool (of course, co-operatively, as the running thread needs to check for cancellation periodically). 4. Some contributors seem to like **the idea of outputting `async` `Future[T]` from `spawn` 'ed threads including exception status**, but I don't really agree with this, as using `async` will mean they should only be used with `--gc:orc` as the `async` library has cyclic references (at least in order to avoid memory leaks); I propose that our new concept of `FlowVar[T]` can also pass back exception information, and then it's easy enough for anyone who wants a `Future[T]` to just wrap the `FlowVar[T]` (with exceptions) - we could even provide yet another library to make this easy for those who frequently require it. One thing that should be made clear in the documentation is the proper use of `spawn` and/or `parallel`: The Internet (StackOverflow, etc.) is full of newbie questions wondering why, when they `spawn` a series of "two-plus-two" calculations across a million threads in the thread pool (also applies to Golang go routines, even though Golang is about twenty times faster at spawning than even my new implementation), the resulting performance is no faster or even slower than not multi-threading. We really should explain that one needs to assign enough work to be worth the overhead, such as ten thousand "two-plus-two" calculations per `spawn`, else they need to look into using work stealing/sharing schemes such as @mratsim's Weave/Picasso project. **I have written a Proof of Concept that shows that the new thread pool implementation works with my new `std/channels` implementation at about the same speed as the old built-in libraries or maybe a little faster, but would appreciate your consideration** on the above items so my work fits your "vision"...
