On 2018-07-16 05:24, Chris Angelico wrote:
On Mon, Jul 16, 2018 at 1:21 PM, Nathaniel Smith <n...@pobox.com> wrote:
On Sun, Jul 15, 2018 at 6:00 PM, Chris Angelico <ros...@gmail.com> wrote:
On Mon, Jul 16, 2018 at 10:31 AM, Nathaniel Smith <n...@pobox.com> wrote:
On Sun, Jul 8, 2018 at 11:27 AM, David Foster <davidf...@gmail.com> wrote:
* The Actor model can be used with some effort via the “multiprocessing”
module, but it doesn’t seem that streamlined and forces there to be a
separate OS process per line of execution, which is relatively expensive.

What do you mean by "the Actor model"? Just shared-nothing
concurrency? (My understanding is that in academia it means
shared-nothing + every thread/process/whatever gets an associated
queue + queues are globally addressable + queues have unbounded
buffering + every thread/process/whatever is implemented as a loop
that reads messages from its queue and responds to them, with no
internal concurrency. I don't know why this particular bundle of
features is considered special. Lots of people seem to use it in
looser sense though.)

Shared-nothing concurrency is, of course, the very easiest way to
parallelize. But let's suppose you're trying to create an online
multiplayer game. Since it's a popular genre at the moment, I'll go
for a battle royale game (think PUBG, H1Z1, Fortnite, etc). A hundred
people enter; one leaves. The game has to let those hundred people
interact, which means that all hundred people have to be connected to
the same server. And you have to process everyone's movements,
gunshots, projectiles, etc, etc, etc, fast enough to be able to run a
server "tick" enough times per second - I would say 32 ticks per
second is an absolute minimum, 64 is definitely better. So what
happens when the processing required takes more than one CPU core for
1/32 seconds? A shared-nothing model is either fundamentally
impossible, or a meaningless abstraction (if you interpret it to mean
"explicit queues/pipes for everything"). What would the "Actor" model
do here?

"Shared-nothing" is a bit of jargon that means there's no *implicit*
sharing; your threads can still communicate, the communication just
has to be explicit. I don't know exactly what algorithms your
hypothetical game needs, but they might be totally fine in a
shared-nothing approach. It's not just for embarrassingly parallel
problems.

Right, so basically it's the exact model that Python *already* has for
multiprocessing - once you go to separate processes, nothing is
implicitly shared, and everything has to be done with queues.

Ideally, I would like to be able to write my code as a set of
functions, then easily spin them off as separate threads, and have
them able to magically run across separate CPUs. Unicorns not being a
thing, I'm okay with warping my code a bit around the need for
parallelism, but I'm not sure how best to do that. Assume here that we
can't cheat by getting most of the processing work done with the GIL
released (eg in Numpy), and it actually does require Python-level
parallelism of CPU-heavy work.

If you need shared-memory threads, on multiple cores, for CPU-bound
logic, where the logic is implemented in Python, then yeah, you
basically need a free-threaded implementation of Python. Jython is
such an implementation. PyPy could be if anyone were interested in
funding it [1], but apparently no-one is. Probably removing the GIL
from CPython is impossible. (I'd be happy to be proven wrong.) Sorry I
don't have anything better to report.

(This was a purely hypothetical example.)

There could be some interesting results from using the GIL only for
truly global objects, and then having other objects guarded by arena
locks. The trouble is that, in CPython, as soon as you reference any
read-only object from the globals, you need to raise its refcount.
ISTR someone mentioned something along the lines of
sys.eternalize(obj) to flag something as "never GC this thing, it no
longer has a refcount", which would then allow global objects to be
referenced in a truly read-only way (eg to call a function). Sadly,
I'm not expert enough to actually look into implementing it, but it
does seem like a very cool concept. It also fits into the "warping my
code a bit" category (eg eternalizing a small handful of key objects,
and paying the price of "well, now they can never be garbage
collected"), with the potential to then parallelize more easily.

Could you explicitly share an object in a similar way to how you explicitly open a file?

The shared object's refcount would be incremented and the sharing function would return a proxy to the shared object.

Refcounting in the thread/process would be done on the proxy.

When the proxy is closed or garbage-collected, the shared object's refcount would be decremented.

The shared object could be garbage-collected when its refcount drops to zero.

The good news is that there are many, many situations where you don't
actually need "shared-memory threads, on multiple cores, for CPU-bound
logic, where the logic is implemented in Python".

Oh absolutely. MOST of my parallelism requirements involve regular
Python threads, because they spend most of their time blocked on
something. That one is easy. The hassle comes when something MIGHT
need parallelism and might not, based on (say) how much data it has to
work with; for those kinds of programs, I would like to be able to
code it the simple way with minimal code overhead, but still able to
split over cores. And yes, I'm aware that it's never going to be
perfect, but the closer the better.

_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to