Just to clarify, are you referring to module dependencies in PySpark?
With Scala I can create a Uber jar file inclusive of all bits and pieces
built with maven or sbt that works in a cluster and submit to spark-submit
as a uber jar file.
what alternatives would you suggest for PySpark, a zip
THis isn't going to help submitting to a remote cluster though. You need to
explicitly include dependencies in your submit.
On Fri, Jan 8, 2021 at 11:15 AM Mich Talebzadeh
wrote:
> Hi Riccardo
>
> This is the env variables at runtime
>
> PYTHONUNBUFFERED=1;*PYTHONPATH=*
>
Hi Riccardo
This is the env variables at runtime
PYTHONUNBUFFERED=1;*PYTHONPATH=*
Hi Sean,
sparkstuff.py is under packages/sparutils/sparkstuff.py as shown below
[image: image.png]
So within PyCharm, it is picked up OK. However, at terminal level, it is
not picked up.
THis is a snapshot of Pycharm. The module I am trying to run is called
analyze_house_prices_GCP.py
I think spark checks the python path env variable. Need to provide that.
Of course that works in local mode only
On Fri, Jan 8, 2021, 5:28 PM Sean Owen wrote:
> I don't see anywhere that you provide 'sparkstuff'? how would the Spark
> app have this code otherwise?
>
> On Fri, Jan 8, 2021 at
I don't see anywhere that you provide 'sparkstuff'? how would the Spark app
have this code otherwise?
On Fri, Jan 8, 2021 at 10:20 AM Mich Talebzadeh
wrote:
> Thanks Riccardo.
>
> I am well aware of the submission form
>
> However, my question relates to doing submission within PyCharm itself.
Thanks Riccardo.
I am well aware of the submission form
However, my question relates to doing submission within PyCharm itself.
This is what I do at Pycharm *terminal* to invoke the module python
spark-submit --jars ..\lib\spark-bigquery-with-dependencies_2.12-0.18.0.jar
\
--packages
You need to provide your python dependencies as well. See
http://spark.apache.org/docs/latest/submitting-applications.html, look for
--py-files
HTH
On Fri, Jan 8, 2021 at 3:13 PM Mich Talebzadeh
wrote:
> Hi,
>
> I have a module in Pycharm which reads data stored in a Bigquery table and
> does
Hi,
I have a module in Pycharm which reads data stored in a Bigquery table and
does plotting.
At the command line on the terminal I need to add the jar file and the
packet to make it work.
(venv) C:\Users\admin\PycharmProjects\pythonProject2\DS\src>spark-submit
--jars