How to execute a custom python library on spark

2014-11-25 Thread Chengi Liu
Hi,
  I have written few datastructures as classes like following..

So, here is my code structure:

project/foo/foo.py , __init__.py
  /bar/bar.py, __init__.py  bar.py  imports foo as from foo.foo
import *
 /execute/execute.py  imports bar as from bar.bar import *

Ultimately I am executing execute.py as

pyspark execute.py

And this works fine locally.. but as soon I submit it on cluster... I see
modules missing error..
I tried to send each and every file using --py-files flag (foo.py bar.py )
and other helper files..

But even then it complaints that module is not found So, the question
is.. When one is building a library which is suppose to execute on top of
spark, how should the imports and library be structured so that it works
fine on spark.
When to use pyspark and when to use spark submit to execute python
scripts/module
Bonus points if one can point an example library and how to run it :)
Thanks


Re: How to execute a custom python library on spark

2014-11-25 Thread jay vyas
a quick thought on this: I think this is distro dependent also, right?
We ran into a similar issue in
https://issues.apache.org/jira/browse/BIGTOP-1546 where it looked like the
python libraries might be overwritten on launch.

On Tue, Nov 25, 2014 at 3:09 PM, Chengi Liu chengi.liu...@gmail.com wrote:

 Hi,
   I have written few datastructures as classes like following..

 So, here is my code structure:

 project/foo/foo.py , __init__.py
   /bar/bar.py, __init__.py  bar.py  imports foo as from foo.foo
 import *
  /execute/execute.py  imports bar as from bar.bar import *

 Ultimately I am executing execute.py as

 pyspark execute.py

 And this works fine locally.. but as soon I submit it on cluster... I see
 modules missing error..
 I tried to send each and every file using --py-files flag (foo.py bar.py )
 and other helper files..

 But even then it complaints that module is not found So, the question
 is.. When one is building a library which is suppose to execute on top of
 spark, how should the imports and library be structured so that it works
 fine on spark.
 When to use pyspark and when to use spark submit to execute python
 scripts/module
 Bonus points if one can point an example library and how to run it :)
 Thanks




-- 
jay vyas