Aha I see. Thanks Hyukjin! On Tue, Jul 19, 2022 at 9:09 PM Hyukjin Kwon <gurwls...@gmail.com> wrote:
> This is done by cloudpickle. They pickle global variables referred within > the func together, and register it to the global imported modules. > > On Wed, 20 Jul 2022 at 00:55, Li Jin <ice.xell...@gmail.com> wrote: > >> Hi, >> >> I have a question about how does "imports" get send to the python worker. >> >> For example, I have >> >> def foo(x): >> return np.abs(x) >> >> If I run this code directly, it obviously failed (because np is undefined >> on the driver process): >> >> sc.paralleilize([1, 2, 3]).map(foo).collect() >> >> However, if I add the import statement "import numpy as np" on the >> driver, it works. So somehow driver is sending that "imports" to the worker >> when executing foo on the worker but I cannot seem t o find the code that >> does this - Can someone please send me a pointer? >> >> Thanks, >> Li >> >