Re: Issue while installing dependencies Python Spark

2020-12-18 Thread Patrick McCarthy
At the risk of repeating myself, this is what I was hoping to avoid when I suggested deploying a full, zipped, conda venv. What is your motivation for running an install process on the nodes and risking the process failing, instead of pushing a validated environment artifact and not having that

Re: Issue while installing dependencies Python Spark

2020-12-18 Thread Sachit Murarka
Hi Patrick/Users, I am exploring wheel file form packages for this , as this seems simple:- https://bytes.grubhub.com/managing-dependencies-and-artifacts-in-pyspark-7641aa89ddb7 However, I am facing another issue:- I am using pandas , which needs numpy. Numpy is giving error! ImportError:

Re: Issue while installing dependencies Python Spark

2020-12-17 Thread Artemis User
Wheel is used for package management and setting up your virtual environment , not used as a library package.  To run spark-submit in a virtual env, use the --py-files option instead.  Usage: --py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place on the

Re: Issue while installing dependencies Python Spark

2020-12-17 Thread Patrick McCarthy
I'm not very familiar with the environments on cloud clusters, but in general I'd be reluctant to lean on setuptools or other python install mechanisms. In the worst case, you might encounter /usr/bin/pip not having permissions to install new packages, or even if you do a package might require

Issue while installing dependencies Python Spark

2020-12-17 Thread Sachit Murarka
Hi Users I have a wheel file , while creating it I have mentioned dependencies in setup.py file. Now I have 2 virtual envs, 1 was already there . another one I created just now. I have switched to new virtual env, I want spark to download the dependencies while doing spark-submit using wheel.