Re: PySpark script works itself, but fails when called from other script

2013-11-18 Thread Andrei
I've tried adding task.py to pyFiles during SparkContext creation and it worked perfectly. Thanks for your help! If you need some more information for further investigation, here's what I've noticed. Without explicitly adding file to SparkContext, only functions that are defined in main module

Re: PySpark script works itself, but fails when called from other script

2013-11-16 Thread Jey Kottalam
Hi Andrei, Could you please post the stderr logfile from the failed executor? You can find this in the work subdirectory of the worker that had the failed task. You'll need the executor id to find the corresonding stderr file. Thanks, -Jey On Friday, November 15, 2013, Andrei wrote: I have 2

Re: PySpark script works itself, but fails when called from other script

2013-11-16 Thread Andrei
Hi, thanks for your replies. I'm out of office now, so I will check it out on Monday morning, but guess about serialization/deserialization looks plausible. Thanks, Andrei On Sat, Nov 16, 2013 at 11:11 AM, Jey Kottalam j...@cs.berkeley.edu wrote: Hi Andrei, Could you please post the stderr

PySpark script works itself, but fails when called from other script

2013-11-15 Thread Andrei
I have 2 Python modules/scripts - task.py and runner.py. First one (task.py) is a little Spark job and works perfectly well by itself. However, when called from runner.py with exactly the same arguments, it fails with only useless message (both - in terminal and worker logs).

Re: PySpark script works itself, but fails when called from other script

2013-11-15 Thread Josh Rosen
I'll take a look at this tomorrow, but my initial hunch is that this problem might be serialization/pickling-related: maybe the UDF is being serialized differently when it's defined in a module that's not __main__. To confirm this, try looking at the logs on the worker that ran the failed task,