Re: How does PySpark send "import" to the worker when executing Python UDFs?

2022-07-19 Thread Li Jin
Aha I see. Thanks Hyukjin! On Tue, Jul 19, 2022 at 9:09 PM Hyukjin Kwon wrote: > This is done by cloudpickle. They pickle global variables referred within > the func together, and register it to the global imported modules. > > On Wed, 20 Jul 2022 at 00:55, Li Jin wrote: > >> Hi, >> >> I have

Re: How does PySpark send "import" to the worker when executing Python UDFs?

2022-07-19 Thread Hyukjin Kwon
This is done by cloudpickle. They pickle global variables referred within the func together, and register it to the global imported modules. On Wed, 20 Jul 2022 at 00:55, Li Jin wrote: > Hi, > > I have a question about how does "imports" get send to the python worker. > > For example, I have >

How does PySpark send "import" to the worker when executing Python UDFs?

2022-07-19 Thread Li Jin
Hi, I have a question about how does "imports" get send to the python worker. For example, I have def foo(x): return np.abs(x) If I run this code directly, it obviously failed (because np is undefined on the driver process): sc.paralleilize([1, 2, 3]).map(foo).collect() However, if I add