Re: [issue] not able to add external libs to pyspark job while using spark-submit

2021-11-24 Thread Mich Talebzadeh
I am not sure about that. However, with Kubernetes and docker image for PySpark, I build the packages into the image itself as below in the dockerfile RUN pip install pyyaml numpy cx_Oracle and that will add those packages that you can reference in your py script import yaml import cx_Oracle

RE: [issue] not able to add external libs to pyspark job while using spark-submit

2021-11-24 Thread Bode, Meikel, NMA-CFD
Can we add Python dependencies as we can do for mvn coordinates? So that we run sth like pip install or download from pypi index? From: Mich Talebzadeh Sent: Mittwoch, 24. November 2021 18:28 Cc: user@spark.apache.org Subject: Re: [issue] not able to add external libs to pyspark job while

Re: [issue] not able to add external libs to pyspark job while using spark-submit

2021-11-24 Thread Mich Talebzadeh
The easiest way to set this up is to create dependencies.zip file. Assuming that you have a virtual environment already set-up, where there is directory called site-packages, go to that directory and just create a minimal a shell script say package_and_zip_dependencies.sh to do it for you

Re: [issue] not able to add external libs to pyspark job while using spark-submit

2021-11-24 Thread Atheer Alabdullatif
Hello Owen, Thank you for your prompt reply! We will check it out. best, Atheer Alabdullatif From: Sean Owen Sent: Wednesday, November 24, 2021 5:06 PM To: Atheer Alabdullatif Cc: user@spark.apache.org ; Data Engineering Subject: Re: [issue] not able to add

Re: [issue] not able to add external libs to pyspark job while using spark-submit

2021-11-24 Thread Sean Owen
That's not how you add a library. From the docs: https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html On Wed, Nov 24, 2021 at 8:02 AM Atheer Alabdullatif wrote: > Dear Spark team, > hope my email finds you well > > > I am using pyspark 3.0 and facing an issue with