Re: Execute in a multiprocessing child dynamic code loaded by the parent process

2022-03-08 Thread Martin Di Paola

Then, you must put the initialization (dynamically loading the modules)
into the function executed in the foreign process.

You could wrap the payload function into a class instances to achieve this.
In the foreign process, you call the instance which first performs
the initialization and then executes the payload.


That's what I have in mind: loading the modules first, and then unpickle
and call the real target function.
--
https://mail.python.org/mailman/listinfo/python-list


Re: Execute in a multiprocessing child dynamic code loaded by the parent process

2022-03-07 Thread Dieter Maurer
Martin Di Paola wrote at 2022-3-6 20:42 +:
>>Try to use `fork` as "start method" (instead of "spawn").
>
>Yes but no. Indeed with `fork` there is no need to pickle anything. In
>particular the child process will be a copy of the parent so it will
>have all the modules loaded, including the dynamic ones. Perfect.
>
>The problem is that `fork` is the default only in Linux. It works in
>MacOS but it may lead to crashes if the parent process is multithreaded
>(and the my is!) and `fork` does not work in Windows.

Then, you must put the initialization (dynamically loading the modules)
into the function executed in the foreign process.

You could wrap the payload function into a class instances to achieve this.
In the foreign process, you call the instance which first performs
the initialization and then executes the payload.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Execute in a multiprocessing child dynamic code loaded by the parent process

2022-03-07 Thread Martin Di Paola

I understand that yes, pickle.loads() imports any necessary module but
only if they can be find in sys.path (like in any "import" statement).

Dynamic code loaded from a plugin (which we presume it is *not* in
sys.path) will not be loaded.

Quick check. Run in one console the following:

import multiprocessing
import multiprocessing.reduction

import pickle
pickle.dumps(multiprocessing.reduction.ForkingPickler)


In a separated Python console run the following:

import pickle
import sys

'multiprocessing' in sys.modules
False

pickle.loads()

'multiprocessing' in sys.modules
True

So the last check proves that pickle.loads imports any necessary module.

Martin.

On Mon, Mar 07, 2022 at 08:28:15AM +, Barry wrote:




On 7 Mar 2022, at 02:33, Martin Di Paola  wrote:

Yes but I think that unpickle (pickle.loads()) does that plus
importing any module needed


Are you sure that unpickle will import code? I thought it did not do that.

Barry

--
https://mail.python.org/mailman/listinfo/python-list


Re: Execute in a multiprocessing child dynamic code loaded by the parent process

2022-03-07 Thread Barry



> On 7 Mar 2022, at 02:33, Martin Di Paola  wrote:
> 
> Yes but I think that unpickle (pickle.loads()) does that plus
> importing any module needed

Are you sure that unpickle will import code? I thought it did not do that.

Barry
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Execute in a multiprocessing child dynamic code loaded by the parent process

2022-03-06 Thread Martin Di Paola





Yeup, that would be my first choice but the catch is that "sayhi" may
not be a function of the given module. It could be a static method of
some class or any other callable.


Ah, fair. Are you able to define it by a "path", where each step in
the path is a getattr() call?


Yes but I think that unpickle (pickle.loads()) does that plus
importing any module needed
in the path which it is handy because I can preload the plugins
(modules) before the unpickle but the path may contain others
more-standard modules as well.

Something like "myplugin.re.match". unpickle should import 're' module
automatically will it is loading the function "match".


Fair. I guess, then, that the best thing to do is to preload the
modules, then unpickle. So, basically what you already have, but with
more caveats.


Yes, this will not be transparent for the user, just trying to minimize
the changes needed.

And it will require some documentation for those caveats. And tests.

Thanks for the brainstorming!
Martin.
--
https://mail.python.org/mailman/listinfo/python-list


Re: Execute in a multiprocessing child dynamic code loaded by the parent process

2022-03-06 Thread Martin Di Paola





I'm not so sure about that. The author of the plugin knows they're
writing code that will be dynamically loaded, and can therefore
expect the kind of problem they're having. It could be argued that
it's their responsibility to ensure that all the needed code is loaded
into the subprocess.


Yes but I try to always make my libs/programs as much as usable as
possible. "Ergonomic" would be the word.

In the case of the plugin-engine I'm trying to hide any side-effect or
unexpected behaviour of the engine so the developer of the plugin
does not have take that into account.

I agree that if the developer uses multiprocessing he/she needs to know
its implications. But if I can "smooth" any rough corner, I will try to
do it.

For example, the main project (developed by me) uses threads for
concurrency. It would be simpler to load the plugins and instantiate
them *once* and ask the plugins developers to take care of any
race condition (RC) within their implementation.

Because the plugins were instantiated *once*, it is almost guaranteed
that the plugins will suffer from race conditions and they will require
some sort of locking.

This is quite risky: you may forget to protect something and you will
end up with a RC and/or you may put the lock in the wrong place and the
whole thing will not work concurrently.

My decision back then was to instantiate each plugin N+1 times: once in
the main thread and then once per worker thread.

With this, no single plugin instance will be shared so there is no risk
of RC and no need for locking. (Yes, I know, the developer just needs to
use a module variable or a class attribute and it will get a RC and
these are shared but it is definitely not the default scenario).

If sharing is required I provide an object that minimizes the locking
needed.

It was much complex for me at the design and at the implementation level
but I think that it is safer and requires less from the plugin
developer.

Reference: https://byexamples.github.io/byexample/contrib/concurrency-model
--
https://mail.python.org/mailman/listinfo/python-list


Re: Execute in a multiprocessing child dynamic code loaded by the parent process

2022-03-06 Thread Greg Ewing

On 7/03/22 9:36 am, Martin Di Paola wrote:

It *would* be my fault if multiprocessing.Process fails only because I'm
loading the code dynamically.


I'm not so sure about that. The author of the plugin knows they're
writing code that will be dynamically loaded, and can therefore
expect the kind of problem they're having. It could be argued that
it's their responsibility to ensure that all the needed code is loaded
into the subprocess.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Execute in a multiprocessing child dynamic code loaded by the parent process

2022-03-06 Thread Chris Angelico
On Mon, 7 Mar 2022 at 07:37, Martin Di Paola  wrote:
>
>
>
> >
> >The way you've described it, it's a hack. Allow me to slightly redescribe it.
> >
> >modules = loader()
> >objs = init(modules)
> >
> >def invoke(mod, func):
> ># I'm assuming that the loader is smart enough to not load
> ># a module that's already loaded. Alternatively, load just the
> ># module you need, if that's a possibility.
> >loader()
> >target = getattr(modules[mod], func)
> >target()
> >
> >ch = multiprocessing.Process(target=invoke, args=("some_module", "sayhi"))
> >ch.start()
> >
>
> Yeup, that would be my first choice but the catch is that "sayhi" may
> not be a function of the given module. It could be a static method of
> some class or any other callable.

Ah, fair. Are you able to define it by a "path", where each step in
the path is a getattr() call?

The trouble is, arbitrary callables might not be available in a
reconstructed version of the module.

> Using multiprocessing.reduction was a practical decision: if the user
> wants to call something non-pickleable, it is not my fault, it is
> multiprocessing's fault.
>
> It *would* be my fault if multiprocessing.Process fails only because I'm
> loading the code dynamically.

Fair. I guess, then, that the best thing to do is to preload the
modules, then unpickle. So, basically what you already have, but with
more caveats.

> Do you have some in mind? Or may be a project that I could read?

Not handy, but there are always many different ways to do things. For
instance, instead of saying "spawn a subprocess and call this
function", you could invert it, and have the function register itself
as the target. Then it's just "spawn a subprocess and load this
module", and that calls the registered invocation. It all depends on
what the rest of your project is doing. Mainly, though, I'm just not
ruling out the possibility of other options :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Execute in a multiprocessing child dynamic code loaded by the parent process

2022-03-06 Thread Martin Di Paola

Try to use `fork` as "start method" (instead of "spawn").


Yes but no. Indeed with `fork` there is no need to pickle anything. In
particular the child process will be a copy of the parent so it will
have all the modules loaded, including the dynamic ones. Perfect.

The problem is that `fork` is the default only in Linux. It works in
MacOS but it may lead to crashes if the parent process is multithreaded
(and the my is!) and `fork` does not work in Windows.
--
https://mail.python.org/mailman/listinfo/python-list


Re: Execute in a multiprocessing child dynamic code loaded by the parent process

2022-03-06 Thread Martin Di Paola






The way you've described it, it's a hack. Allow me to slightly redescribe it.

modules = loader()
objs = init(modules)

def invoke(mod, func):
   # I'm assuming that the loader is smart enough to not load
   # a module that's already loaded. Alternatively, load just the
   # module you need, if that's a possibility.
   loader()
   target = getattr(modules[mod], func)
   target()

ch = multiprocessing.Process(target=invoke, args=("some_module", "sayhi"))
ch.start()



Yeup, that would be my first choice but the catch is that "sayhi" may
not be a function of the given module. It could be a static method of
some class or any other callable.

And doing the lookup by hand sounds complex.

The thing is that the use of multiprocessing is not something required by me
(by my plugin-engine), it was a decision of the developer of a particular
plugin so I don't have any control on that.

Using multiprocessing.reduction was a practical decision: if the user
wants to call something non-pickleable, it is not my fault, it is
multiprocessing's fault.

It *would* be my fault if multiprocessing.Process fails only because I'm
loading the code dynamically.


[...] I won't say "the" correct way, as there are other valid
ways, but there's certainly nothing wrong with this idea.


Do you have some in mind? Or may be a project that I could read?

Thanks!
Martin
--
https://mail.python.org/mailman/listinfo/python-list


Re: Execute in a multiprocessing child dynamic code loaded by the parent process

2022-03-06 Thread Dieter Maurer
Martin Di Paola wrote at 2022-3-6 12:42 +:
>Hi everyone. I implemented time ago a small plugin engine to load code
>dynamically.
>
>So far it worked well but a few days ago an user told me that he wasn't
>able to run in parallel a piece of code in MacOS.
>
>He was using multiprocessing.Process to run the code and in MacOS, the
>default start method for such process is using "spawn". My understanding
>is that Python spawns an independent Python server (the child) which
>receives what to execute (the target function) from the parent process.
>
>In pseudo code this would be like:
>
>modules = loader() # load the plugins (Python modules at the end)
>objs = init(modules) # initialize the plugins
>
># One of the plugins wants to execute part of its code in parallel
># In MacOS this fails
>ch = multiprocessing.Process(target=objs[0].sayhi)
>ch.start()
>
>The code fails with "ModuleNotFoundError: No module named 'foo'" (where
>'foo' is the name of the loaded plugin).
>
>This is because the parent program sends to the serve (the child) what
>needs to execute (objs[0].sayhi) using pickle as the serialization
>mechanism.
>
>Because Python does not really serialize code but only enough
>information to reload it, the serialization of "objs[0].sayhi" just
>points to its module, "foo".
>
>Module which it cannot be imported by the child process.
>
>So the question is, what would be the alternatives and workarounds?

Try to use `fork` as "start method" (instead of "spawn").
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Execute in a multiprocessing child dynamic code loaded by the parent process

2022-03-06 Thread Chris Angelico
On Sun, 6 Mar 2022 at 23:43, Martin Di Paola  wrote:
>
> Hi everyone. I implemented time ago a small plugin engine to load code
> dynamically.
>
> So far it worked well but a few days ago an user told me that he wasn't
> able to run in parallel a piece of code in MacOS.
>
> He was using multiprocessing.Process to run the code and in MacOS, the
> default start method for such process is using "spawn". My understanding
> is that Python spawns an independent Python server (the child) which
> receives what to execute (the target function) from the parent process.

> Because Python does not really serialize code but only enough
> information to reload it, the serialization of "objs[0].sayhi" just
> points to its module, "foo".
>

Hmm. This is a route that has some tricky hazards on it. Generally, in
Python code, we can assume that a module is itself, no matter what; it
won't be a perfect clone of itself, it will actually be the same
module.

If you want to support multiprocessing, I would recommend
disconnecting yourself from the concept of loaded modules, and instead
identify the target by its module name.

> I came with a hack: use a trampoline() function to load the plugins
> in the child before executing the target function.
>
> In pseudo code it is:
>
> modules = loader() # load the plugins (Python modules at the end)
> objs = init(modules) # initialize the plugins
>
> def trampoline(target_str):
> loader() # load the plugins now that we are in the child process
>
> # deserialize the target and call it
> target = reduction.loads(target_str)
> target()
>
> # Serialize the real target function, but call in the child
> # trampoline(). Because it can be accessed by the child it will
> # not fail
> target_str = reduction.dumps(objs[0].sayhi)
> ch = multiprocessing.Process(target=trampoline, args=(target_str,))
> ch.start()
>
> The hack works but is this the correct way to do it?
>

The way you've described it, it's a hack. Allow me to slightly redescribe it.

modules = loader()
objs = init(modules)

def invoke(mod, func):
# I'm assuming that the loader is smart enough to not load
# a module that's already loaded. Alternatively, load just the
# module you need, if that's a possibility.
loader()
target = getattr(modules[mod], func)
target()

ch = multiprocessing.Process(target=invoke, args=("some_module", "sayhi"))
ch.start()


Written like this, it achieves the same goal, but looks a lot less
hacky, and as such, I would say that yes, this absolutely IS a correct
way to do it. (I won't say "the" correct way, as there are other valid
ways, but there's certainly nothing wrong with this idea.)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Execute in a multiprocessing child dynamic code loaded by the parent process

2022-03-06 Thread Martin Di Paola

Hi everyone. I implemented time ago a small plugin engine to load code
dynamically.

So far it worked well but a few days ago an user told me that he wasn't
able to run in parallel a piece of code in MacOS.

He was using multiprocessing.Process to run the code and in MacOS, the
default start method for such process is using "spawn". My understanding
is that Python spawns an independent Python server (the child) which
receives what to execute (the target function) from the parent process.

In pseudo code this would be like:

modules = loader() # load the plugins (Python modules at the end)
objs = init(modules) # initialize the plugins

# One of the plugins wants to execute part of its code in parallel
# In MacOS this fails
ch = multiprocessing.Process(target=objs[0].sayhi)
ch.start()

The code fails with "ModuleNotFoundError: No module named 'foo'" (where
'foo' is the name of the loaded plugin).

This is because the parent program sends to the serve (the child) what
needs to execute (objs[0].sayhi) using pickle as the serialization
mechanism.

Because Python does not really serialize code but only enough
information to reload it, the serialization of "objs[0].sayhi" just
points to its module, "foo".

Module which it cannot be imported by the child process.

So the question is, what would be the alternatives and workarounds?

I came with a hack: use a trampoline() function to load the plugins
in the child before executing the target function.

In pseudo code it is:

modules = loader() # load the plugins (Python modules at the end)
objs = init(modules) # initialize the plugins

def trampoline(target_str):
   loader() # load the plugins now that we are in the child process

   # deserialize the target and call it
   target = reduction.loads(target_str)
   target()

# Serialize the real target function, but call in the child
# trampoline(). Because it can be accessed by the child it will
# not fail
target_str = reduction.dumps(objs[0].sayhi)
ch = multiprocessing.Process(target=trampoline, args=(target_str,))
ch.start()

The hack works but is this the correct way to do it?

The following gist has the minimal example code that triggers the issue
and its workaround:
https://gist.github.com/eldipa/d9b02875a13537e72fbce4cdb8e3f282

Thanks!
Martin.
--
https://mail.python.org/mailman/listinfo/python-list