I'm not very familiar with the environments on cloud clusters, but in
general I'd be reluctant to lean on setuptools or other python install
mechanisms. In the worst case, you might encounter /usr/bin/pip not having
permissions to install new packages, or even if you do a package might
require something you can't change like a libc dependency.

Perhaps you can install the .whl and its dependencies to the virtualenv on
a local machine, and then *after* the install process, package that venv?

If possible, I like conda for this approach over a vanilla venv because it
will contain all the non-python dependencies (like libc) if they're needed.


Another thing - I think there are several ways to do this, but I've had the
most success including the .zip containing my environment in
`spark.yarn.dist.archives` and then using a relative path:

os.environ['PYSPARK_PYTHON'] = './py37minimal_env/py37minimal/bin/python'

dist_archives =
'hdfs:///user/pmccarthy/conda/py37minimal.zip#py37minimal_env'

SparkSession.builder.
...
         .config('spark.yarn.dist.archives', dist_archives)


On Thu, Dec 17, 2020 at 10:32 AM Sachit Murarka <connectsac...@gmail.com>
wrote:

> Hi Users
>
> I have a wheel file , while creating it I have mentioned dependencies in
> setup.py file.
> Now I have 2 virtual envs, 1 was already there . another one I created
> just now.
>
> I have switched to new virtual env, I want spark to download the
> dependencies while doing spark-submit using wheel.
>
> Could you please help me on this?
>
> It is not downloading dependencies , instead it is pointing to older
> version of  virtual env and proceeding with the execution of spark job.
>
> Please note I have tried setting the env variables also.
> Also I have tried following options as well in spark submit
>
> --conf spark.pyspark.virtualenv.enabled=true  --conf
> spark.pyspark.virtualenv.type=native --conf
> spark.pyspark.virtualenv.requirements=requirements.txt  --conf
> spark.pyspark.python= /path/to/venv/bin/python3 --conf
> spark.pyspark.driver.python=/path/to/venv/bin/python3
>
> This did not help too..
>
> Kind Regards,
> Sachit Murarka
>


-- 


*Patrick McCarthy  *

Senior Data Scientist, Machine Learning Engineering

Dstillery

470 Park Ave South, 17th Floor, NYC 10016

Reply via email to