Re: Submit custom python packages from current project

2016-02-19 Thread Eike von Seggern
Hello,

2016-02-16 11:03 GMT+01:00 Mohannad Ali :

> Hello Everyone,
>
> I have code inside my project organized in packages and modules, however I
> keep getting the error "ImportError: No module named " when
> I run spark on YARN.
>
> My directory structure is something like this:
>
> project/
>  package/
>  module.py
>  __init__.py
>  bin/
>  docs/
>  setup.py
>  main_script.py
>  requirements.txt
>  tests/
>   package/
>module_test.py
>__init__.py
>  __init__.py
>
>
> So when I pass `main_script.py` to spark-submit with master set to
> "yarn-client", the packages aren't found and I get the error above.
>
> With a code structure like this adding everything as pyfile to the spark
> context seems counter intuitive.
>
> I just want to organize my code as much as possible to make it more
> readable and maintainable. Is there a better way to achieve good code
> organization without running into such problems?
>

According to the docs[1], you should be able to zip your "project/" (or
"package/"?) directory and pass the zip-file to spark-submit via --py-files

Best,
Eike

[1]
 --py-files PY_FILES Comma-separated list of .zip, .egg, or .py
files to place
  on the PYTHONPATH for Python apps.


Re: Submit custom python packages from current project

2016-02-16 Thread Mohannad Ali
Hello Ramanathan,

Unfortunately I tried this already and it doesn't work.

Mo

On Tue, Feb 16, 2016 at 2:13 PM, Ramanathan R 
wrote:

> Have you tried setting PYTHONPATH?
> $ export PYTHONPATH="/path/to/project"
> $ spark-submit --master yarn-client /path/to/project/main_script.py
>
> Regards,
> Ram
>
>
> On 16 February 2016 at 15:33, Mohannad Ali  wrote:
>
>> Hello Everyone,
>>
>> I have code inside my project organized in packages and modules, however
>> I keep getting the error "ImportError: No module named "
>> when I run spark on YARN.
>>
>> My directory structure is something like this:
>>
>> project/
>>  package/
>>  module.py
>>  __init__.py
>>  bin/
>>  docs/
>>  setup.py
>>  main_script.py
>>  requirements.txt
>>  tests/
>>   package/
>>module_test.py
>>__init__.py
>>  __init__.py
>>
>>
>> So when I pass `main_script.py` to spark-submit with master set to
>> "yarn-client", the packages aren't found and I get the error above.
>>
>> With a code structure like this adding everything as pyfile to the spark
>> context seems counter intuitive.
>>
>> I just want to organize my code as much as possible to make it more
>> readable and maintainable. Is there a better way to achieve good code
>> organization without running into such problems?
>>
>> Best Regards,
>> Mo
>>
>
>


Re: Submit custom python packages from current project

2016-02-16 Thread Ramanathan R
Have you tried setting PYTHONPATH?
$ export PYTHONPATH="/path/to/project"
$ spark-submit --master yarn-client /path/to/project/main_script.py

Regards,
Ram


On 16 February 2016 at 15:33, Mohannad Ali  wrote:

> Hello Everyone,
>
> I have code inside my project organized in packages and modules, however I
> keep getting the error "ImportError: No module named " when
> I run spark on YARN.
>
> My directory structure is something like this:
>
> project/
>  package/
>  module.py
>  __init__.py
>  bin/
>  docs/
>  setup.py
>  main_script.py
>  requirements.txt
>  tests/
>   package/
>module_test.py
>__init__.py
>  __init__.py
>
>
> So when I pass `main_script.py` to spark-submit with master set to
> "yarn-client", the packages aren't found and I get the error above.
>
> With a code structure like this adding everything as pyfile to the spark
> context seems counter intuitive.
>
> I just want to organize my code as much as possible to make it more
> readable and maintainable. Is there a better way to achieve good code
> organization without running into such problems?
>
> Best Regards,
> Mo
>