This is done by cloudpickle. They pickle global variables referred within the func together, and register it to the global imported modules.
On Wed, 20 Jul 2022 at 00:55, Li Jin <ice.xell...@gmail.com> wrote: > Hi, > > I have a question about how does "imports" get send to the python worker. > > For example, I have > > def foo(x): > return np.abs(x) > > If I run this code directly, it obviously failed (because np is undefined > on the driver process): > > sc.paralleilize([1, 2, 3]).map(foo).collect() > > However, if I add the import statement "import numpy as np" on the driver, > it works. So somehow driver is sending that "imports" to the worker when > executing foo on the worker but I cannot seem t o find the code that does > this - Can someone please send me a pointer? > > Thanks, > Li >