Re: Submit custom python packages from current project
Hello, 2016-02-16 11:03 GMT+01:00 Mohannad Ali: > Hello Everyone, > > I have code inside my project organized in packages and modules, however I > keep getting the error "ImportError: No module named " when > I run spark on YARN. > > My directory structure is something like this: > > project/ > package/ > module.py > __init__.py > bin/ > docs/ > setup.py > main_script.py > requirements.txt > tests/ > package/ >module_test.py >__init__.py > __init__.py > > > So when I pass `main_script.py` to spark-submit with master set to > "yarn-client", the packages aren't found and I get the error above. > > With a code structure like this adding everything as pyfile to the spark > context seems counter intuitive. > > I just want to organize my code as much as possible to make it more > readable and maintainable. Is there a better way to achieve good code > organization without running into such problems? > According to the docs[1], you should be able to zip your "project/" (or "package/"?) directory and pass the zip-file to spark-submit via --py-files Best, Eike [1] --py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place on the PYTHONPATH for Python apps.
Re: Submit custom python packages from current project
Hello Ramanathan, Unfortunately I tried this already and it doesn't work. Mo On Tue, Feb 16, 2016 at 2:13 PM, Ramanathan Rwrote: > Have you tried setting PYTHONPATH? > $ export PYTHONPATH="/path/to/project" > $ spark-submit --master yarn-client /path/to/project/main_script.py > > Regards, > Ram > > > On 16 February 2016 at 15:33, Mohannad Ali wrote: > >> Hello Everyone, >> >> I have code inside my project organized in packages and modules, however >> I keep getting the error "ImportError: No module named " >> when I run spark on YARN. >> >> My directory structure is something like this: >> >> project/ >> package/ >> module.py >> __init__.py >> bin/ >> docs/ >> setup.py >> main_script.py >> requirements.txt >> tests/ >> package/ >>module_test.py >>__init__.py >> __init__.py >> >> >> So when I pass `main_script.py` to spark-submit with master set to >> "yarn-client", the packages aren't found and I get the error above. >> >> With a code structure like this adding everything as pyfile to the spark >> context seems counter intuitive. >> >> I just want to organize my code as much as possible to make it more >> readable and maintainable. Is there a better way to achieve good code >> organization without running into such problems? >> >> Best Regards, >> Mo >> > >
Re: Submit custom python packages from current project
Have you tried setting PYTHONPATH? $ export PYTHONPATH="/path/to/project" $ spark-submit --master yarn-client /path/to/project/main_script.py Regards, Ram On 16 February 2016 at 15:33, Mohannad Aliwrote: > Hello Everyone, > > I have code inside my project organized in packages and modules, however I > keep getting the error "ImportError: No module named " when > I run spark on YARN. > > My directory structure is something like this: > > project/ > package/ > module.py > __init__.py > bin/ > docs/ > setup.py > main_script.py > requirements.txt > tests/ > package/ >module_test.py >__init__.py > __init__.py > > > So when I pass `main_script.py` to spark-submit with master set to > "yarn-client", the packages aren't found and I get the error above. > > With a code structure like this adding everything as pyfile to the spark > context seems counter intuitive. > > I just want to organize my code as much as possible to make it more > readable and maintainable. Is there a better way to achieve good code > organization without running into such problems? > > Best Regards, > Mo >