I've tried adding task.py to pyFiles during SparkContext creation and it
worked perfectly. Thanks for your help!
If you need some more information for further investigation, here's what
I've noticed. Without explicitly adding file to SparkContext, only
functions that are defined in main module
Hi Andrei,
Could you please post the stderr logfile from the failed executor? You can
find this in the work subdirectory of the worker that had the failed
task. You'll need the executor id to find the corresonding stderr file.
Thanks,
-Jey
On Friday, November 15, 2013, Andrei wrote:
I have 2
Hi,
thanks for your replies. I'm out of office now, so I will check it out on
Monday morning, but guess about serialization/deserialization looks
plausible.
Thanks,
Andrei
On Sat, Nov 16, 2013 at 11:11 AM, Jey Kottalam j...@cs.berkeley.edu wrote:
Hi Andrei,
Could you please post the stderr
I have 2 Python modules/scripts - task.py and runner.py. First one
(task.py) is a little Spark job and works perfectly well by itself.
However, when called from runner.py with exactly the same arguments, it
fails with only useless message (both - in terminal and worker logs).
I'll take a look at this tomorrow, but my initial hunch is that this
problem might be serialization/pickling-related: maybe the UDF is being
serialized differently when it's defined in a module that's not __main__.
To confirm this, try looking at the logs on the worker that ran the failed
task,