Re: Execute in a multiprocessing child dynamic code loaded by the parent process
Then, you must put the initialization (dynamically loading the modules) into the function executed in the foreign process. You could wrap the payload function into a class instances to achieve this. In the foreign process, you call the instance which first performs the initialization and then executes the payload. That's what I have in mind: loading the modules first, and then unpickle and call the real target function. -- https://mail.python.org/mailman/listinfo/python-list
Re: Execute in a multiprocessing child dynamic code loaded by the parent process
Martin Di Paola wrote at 2022-3-6 20:42 +: >>Try to use `fork` as "start method" (instead of "spawn"). > >Yes but no. Indeed with `fork` there is no need to pickle anything. In >particular the child process will be a copy of the parent so it will >have all the modules loaded, including the dynamic ones. Perfect. > >The problem is that `fork` is the default only in Linux. It works in >MacOS but it may lead to crashes if the parent process is multithreaded >(and the my is!) and `fork` does not work in Windows. Then, you must put the initialization (dynamically loading the modules) into the function executed in the foreign process. You could wrap the payload function into a class instances to achieve this. In the foreign process, you call the instance which first performs the initialization and then executes the payload. -- https://mail.python.org/mailman/listinfo/python-list
Re: Execute in a multiprocessing child dynamic code loaded by the parent process
I understand that yes, pickle.loads() imports any necessary module but only if they can be find in sys.path (like in any "import" statement). Dynamic code loaded from a plugin (which we presume it is *not* in sys.path) will not be loaded. Quick check. Run in one console the following: import multiprocessing import multiprocessing.reduction import pickle pickle.dumps(multiprocessing.reduction.ForkingPickler) In a separated Python console run the following: import pickle import sys 'multiprocessing' in sys.modules False pickle.loads() 'multiprocessing' in sys.modules True So the last check proves that pickle.loads imports any necessary module. Martin. On Mon, Mar 07, 2022 at 08:28:15AM +, Barry wrote: On 7 Mar 2022, at 02:33, Martin Di Paola wrote: Yes but I think that unpickle (pickle.loads()) does that plus importing any module needed Are you sure that unpickle will import code? I thought it did not do that. Barry -- https://mail.python.org/mailman/listinfo/python-list
Re: Execute in a multiprocessing child dynamic code loaded by the parent process
> On 7 Mar 2022, at 02:33, Martin Di Paola wrote: > > Yes but I think that unpickle (pickle.loads()) does that plus > importing any module needed Are you sure that unpickle will import code? I thought it did not do that. Barry -- https://mail.python.org/mailman/listinfo/python-list
Re: Execute in a multiprocessing child dynamic code loaded by the parent process
Yeup, that would be my first choice but the catch is that "sayhi" may not be a function of the given module. It could be a static method of some class or any other callable. Ah, fair. Are you able to define it by a "path", where each step in the path is a getattr() call? Yes but I think that unpickle (pickle.loads()) does that plus importing any module needed in the path which it is handy because I can preload the plugins (modules) before the unpickle but the path may contain others more-standard modules as well. Something like "myplugin.re.match". unpickle should import 're' module automatically will it is loading the function "match". Fair. I guess, then, that the best thing to do is to preload the modules, then unpickle. So, basically what you already have, but with more caveats. Yes, this will not be transparent for the user, just trying to minimize the changes needed. And it will require some documentation for those caveats. And tests. Thanks for the brainstorming! Martin. -- https://mail.python.org/mailman/listinfo/python-list
Re: Execute in a multiprocessing child dynamic code loaded by the parent process
I'm not so sure about that. The author of the plugin knows they're writing code that will be dynamically loaded, and can therefore expect the kind of problem they're having. It could be argued that it's their responsibility to ensure that all the needed code is loaded into the subprocess. Yes but I try to always make my libs/programs as much as usable as possible. "Ergonomic" would be the word. In the case of the plugin-engine I'm trying to hide any side-effect or unexpected behaviour of the engine so the developer of the plugin does not have take that into account. I agree that if the developer uses multiprocessing he/she needs to know its implications. But if I can "smooth" any rough corner, I will try to do it. For example, the main project (developed by me) uses threads for concurrency. It would be simpler to load the plugins and instantiate them *once* and ask the plugins developers to take care of any race condition (RC) within their implementation. Because the plugins were instantiated *once*, it is almost guaranteed that the plugins will suffer from race conditions and they will require some sort of locking. This is quite risky: you may forget to protect something and you will end up with a RC and/or you may put the lock in the wrong place and the whole thing will not work concurrently. My decision back then was to instantiate each plugin N+1 times: once in the main thread and then once per worker thread. With this, no single plugin instance will be shared so there is no risk of RC and no need for locking. (Yes, I know, the developer just needs to use a module variable or a class attribute and it will get a RC and these are shared but it is definitely not the default scenario). If sharing is required I provide an object that minimizes the locking needed. It was much complex for me at the design and at the implementation level but I think that it is safer and requires less from the plugin developer. Reference: https://byexamples.github.io/byexample/contrib/concurrency-model -- https://mail.python.org/mailman/listinfo/python-list
Re: Execute in a multiprocessing child dynamic code loaded by the parent process
On 7/03/22 9:36 am, Martin Di Paola wrote: It *would* be my fault if multiprocessing.Process fails only because I'm loading the code dynamically. I'm not so sure about that. The author of the plugin knows they're writing code that will be dynamically loaded, and can therefore expect the kind of problem they're having. It could be argued that it's their responsibility to ensure that all the needed code is loaded into the subprocess. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Execute in a multiprocessing child dynamic code loaded by the parent process
On Mon, 7 Mar 2022 at 07:37, Martin Di Paola wrote: > > > > > > >The way you've described it, it's a hack. Allow me to slightly redescribe it. > > > >modules = loader() > >objs = init(modules) > > > >def invoke(mod, func): > ># I'm assuming that the loader is smart enough to not load > ># a module that's already loaded. Alternatively, load just the > ># module you need, if that's a possibility. > >loader() > >target = getattr(modules[mod], func) > >target() > > > >ch = multiprocessing.Process(target=invoke, args=("some_module", "sayhi")) > >ch.start() > > > > Yeup, that would be my first choice but the catch is that "sayhi" may > not be a function of the given module. It could be a static method of > some class or any other callable. Ah, fair. Are you able to define it by a "path", where each step in the path is a getattr() call? The trouble is, arbitrary callables might not be available in a reconstructed version of the module. > Using multiprocessing.reduction was a practical decision: if the user > wants to call something non-pickleable, it is not my fault, it is > multiprocessing's fault. > > It *would* be my fault if multiprocessing.Process fails only because I'm > loading the code dynamically. Fair. I guess, then, that the best thing to do is to preload the modules, then unpickle. So, basically what you already have, but with more caveats. > Do you have some in mind? Or may be a project that I could read? Not handy, but there are always many different ways to do things. For instance, instead of saying "spawn a subprocess and call this function", you could invert it, and have the function register itself as the target. Then it's just "spawn a subprocess and load this module", and that calls the registered invocation. It all depends on what the rest of your project is doing. Mainly, though, I'm just not ruling out the possibility of other options :) ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Execute in a multiprocessing child dynamic code loaded by the parent process
Try to use `fork` as "start method" (instead of "spawn"). Yes but no. Indeed with `fork` there is no need to pickle anything. In particular the child process will be a copy of the parent so it will have all the modules loaded, including the dynamic ones. Perfect. The problem is that `fork` is the default only in Linux. It works in MacOS but it may lead to crashes if the parent process is multithreaded (and the my is!) and `fork` does not work in Windows. -- https://mail.python.org/mailman/listinfo/python-list
Re: Execute in a multiprocessing child dynamic code loaded by the parent process
The way you've described it, it's a hack. Allow me to slightly redescribe it. modules = loader() objs = init(modules) def invoke(mod, func): # I'm assuming that the loader is smart enough to not load # a module that's already loaded. Alternatively, load just the # module you need, if that's a possibility. loader() target = getattr(modules[mod], func) target() ch = multiprocessing.Process(target=invoke, args=("some_module", "sayhi")) ch.start() Yeup, that would be my first choice but the catch is that "sayhi" may not be a function of the given module. It could be a static method of some class or any other callable. And doing the lookup by hand sounds complex. The thing is that the use of multiprocessing is not something required by me (by my plugin-engine), it was a decision of the developer of a particular plugin so I don't have any control on that. Using multiprocessing.reduction was a practical decision: if the user wants to call something non-pickleable, it is not my fault, it is multiprocessing's fault. It *would* be my fault if multiprocessing.Process fails only because I'm loading the code dynamically. [...] I won't say "the" correct way, as there are other valid ways, but there's certainly nothing wrong with this idea. Do you have some in mind? Or may be a project that I could read? Thanks! Martin -- https://mail.python.org/mailman/listinfo/python-list
Re: Execute in a multiprocessing child dynamic code loaded by the parent process
Martin Di Paola wrote at 2022-3-6 12:42 +: >Hi everyone. I implemented time ago a small plugin engine to load code >dynamically. > >So far it worked well but a few days ago an user told me that he wasn't >able to run in parallel a piece of code in MacOS. > >He was using multiprocessing.Process to run the code and in MacOS, the >default start method for such process is using "spawn". My understanding >is that Python spawns an independent Python server (the child) which >receives what to execute (the target function) from the parent process. > >In pseudo code this would be like: > >modules = loader() # load the plugins (Python modules at the end) >objs = init(modules) # initialize the plugins > ># One of the plugins wants to execute part of its code in parallel ># In MacOS this fails >ch = multiprocessing.Process(target=objs[0].sayhi) >ch.start() > >The code fails with "ModuleNotFoundError: No module named 'foo'" (where >'foo' is the name of the loaded plugin). > >This is because the parent program sends to the serve (the child) what >needs to execute (objs[0].sayhi) using pickle as the serialization >mechanism. > >Because Python does not really serialize code but only enough >information to reload it, the serialization of "objs[0].sayhi" just >points to its module, "foo". > >Module which it cannot be imported by the child process. > >So the question is, what would be the alternatives and workarounds? Try to use `fork` as "start method" (instead of "spawn"). -- https://mail.python.org/mailman/listinfo/python-list
Re: Execute in a multiprocessing child dynamic code loaded by the parent process
On Sun, 6 Mar 2022 at 23:43, Martin Di Paola wrote: > > Hi everyone. I implemented time ago a small plugin engine to load code > dynamically. > > So far it worked well but a few days ago an user told me that he wasn't > able to run in parallel a piece of code in MacOS. > > He was using multiprocessing.Process to run the code and in MacOS, the > default start method for such process is using "spawn". My understanding > is that Python spawns an independent Python server (the child) which > receives what to execute (the target function) from the parent process. > Because Python does not really serialize code but only enough > information to reload it, the serialization of "objs[0].sayhi" just > points to its module, "foo". > Hmm. This is a route that has some tricky hazards on it. Generally, in Python code, we can assume that a module is itself, no matter what; it won't be a perfect clone of itself, it will actually be the same module. If you want to support multiprocessing, I would recommend disconnecting yourself from the concept of loaded modules, and instead identify the target by its module name. > I came with a hack: use a trampoline() function to load the plugins > in the child before executing the target function. > > In pseudo code it is: > > modules = loader() # load the plugins (Python modules at the end) > objs = init(modules) # initialize the plugins > > def trampoline(target_str): > loader() # load the plugins now that we are in the child process > > # deserialize the target and call it > target = reduction.loads(target_str) > target() > > # Serialize the real target function, but call in the child > # trampoline(). Because it can be accessed by the child it will > # not fail > target_str = reduction.dumps(objs[0].sayhi) > ch = multiprocessing.Process(target=trampoline, args=(target_str,)) > ch.start() > > The hack works but is this the correct way to do it? > The way you've described it, it's a hack. Allow me to slightly redescribe it. modules = loader() objs = init(modules) def invoke(mod, func): # I'm assuming that the loader is smart enough to not load # a module that's already loaded. Alternatively, load just the # module you need, if that's a possibility. loader() target = getattr(modules[mod], func) target() ch = multiprocessing.Process(target=invoke, args=("some_module", "sayhi")) ch.start() Written like this, it achieves the same goal, but looks a lot less hacky, and as such, I would say that yes, this absolutely IS a correct way to do it. (I won't say "the" correct way, as there are other valid ways, but there's certainly nothing wrong with this idea.) ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Execute in a multiprocessing child dynamic code loaded by the parent process
Hi everyone. I implemented time ago a small plugin engine to load code dynamically. So far it worked well but a few days ago an user told me that he wasn't able to run in parallel a piece of code in MacOS. He was using multiprocessing.Process to run the code and in MacOS, the default start method for such process is using "spawn". My understanding is that Python spawns an independent Python server (the child) which receives what to execute (the target function) from the parent process. In pseudo code this would be like: modules = loader() # load the plugins (Python modules at the end) objs = init(modules) # initialize the plugins # One of the plugins wants to execute part of its code in parallel # In MacOS this fails ch = multiprocessing.Process(target=objs[0].sayhi) ch.start() The code fails with "ModuleNotFoundError: No module named 'foo'" (where 'foo' is the name of the loaded plugin). This is because the parent program sends to the serve (the child) what needs to execute (objs[0].sayhi) using pickle as the serialization mechanism. Because Python does not really serialize code but only enough information to reload it, the serialization of "objs[0].sayhi" just points to its module, "foo". Module which it cannot be imported by the child process. So the question is, what would be the alternatives and workarounds? I came with a hack: use a trampoline() function to load the plugins in the child before executing the target function. In pseudo code it is: modules = loader() # load the plugins (Python modules at the end) objs = init(modules) # initialize the plugins def trampoline(target_str): loader() # load the plugins now that we are in the child process # deserialize the target and call it target = reduction.loads(target_str) target() # Serialize the real target function, but call in the child # trampoline(). Because it can be accessed by the child it will # not fail target_str = reduction.dumps(objs[0].sayhi) ch = multiprocessing.Process(target=trampoline, args=(target_str,)) ch.start() The hack works but is this the correct way to do it? The following gist has the minimal example code that triggers the issue and its workaround: https://gist.github.com/eldipa/d9b02875a13537e72fbce4cdb8e3f282 Thanks! Martin. -- https://mail.python.org/mailman/listinfo/python-list