a quick thought on this: I think this is distro dependent also, right?
We ran into a similar issue in
https://issues.apache.org/jira/browse/BIGTOP-1546 where it looked like the
python libraries might be overwritten on launch.

On Tue, Nov 25, 2014 at 3:09 PM, Chengi Liu <chengi.liu...@gmail.com> wrote:

> Hi,
>   I have written few datastructures as classes like following..
>
> So, here is my code structure:
>
> project/foo/foo.py , __init__.py
>           /bar/bar.py, __init__.py  bar.py  imports foo as from foo.foo
> import *
>          /execute/execute.py  imports bar as from bar.bar import *
>
> Ultimately I am executing execute.py as
>
> pyspark execute.py
>
> And this works fine locally.. but as soon I submit it on cluster... I see
> modules missing error..
> I tried to send each and every file using --py-files flag (foo.py bar.py )
> and other helper files..
>
> But even then it complaints that module is not found.... So, the question
> is.. When one is building a library which is suppose to execute on top of
> spark, how should the imports and library be structured so that it works
> fine on spark.
> When to use pyspark and when to use spark submit to execute python
> scripts/module
> Bonus points if one can point an example library and how to run it :)
> Thanks
>



-- 
jay vyas

Reply via email to