Re: PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Mich Talebzadeh
Just to clarify, are you referring to module dependencies in PySpark? With Scala I can create a Uber jar file inclusive of all bits and pieces built with maven or sbt that works in a cluster and submit to spark-submit as a uber jar file. what alternatives would you suggest for PySpark, a zip

Re: PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Sean Owen
THis isn't going to help submitting to a remote cluster though. You need to explicitly include dependencies in your submit. On Fri, Jan 8, 2021 at 11:15 AM Mich Talebzadeh wrote: > Hi Riccardo > > This is the env variables at runtime > > PYTHONUNBUFFERED=1;*PYTHONPATH=* >

Re: PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Mich Talebzadeh
Hi Riccardo This is the env variables at runtime PYTHONUNBUFFERED=1;*PYTHONPATH=*

Re: PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Mich Talebzadeh
Hi Sean, sparkstuff.py is under packages/sparutils/sparkstuff.py as shown below [image: image.png] So within PyCharm, it is picked up OK. However, at terminal level, it is not picked up. THis is a snapshot of Pycharm. The module I am trying to run is called analyze_house_prices_GCP.py

Re: PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Riccardo Ferrari
I think spark checks the python path env variable. Need to provide that. Of course that works in local mode only On Fri, Jan 8, 2021, 5:28 PM Sean Owen wrote: > I don't see anywhere that you provide 'sparkstuff'? how would the Spark > app have this code otherwise? > > On Fri, Jan 8, 2021 at

Re: PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Sean Owen
I don't see anywhere that you provide 'sparkstuff'? how would the Spark app have this code otherwise? On Fri, Jan 8, 2021 at 10:20 AM Mich Talebzadeh wrote: > Thanks Riccardo. > > I am well aware of the submission form > > However, my question relates to doing submission within PyCharm itself.

Re: PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Mich Talebzadeh
Thanks Riccardo. I am well aware of the submission form However, my question relates to doing submission within PyCharm itself. This is what I do at Pycharm *terminal* to invoke the module python spark-submit --jars ..\lib\spark-bigquery-with-dependencies_2.12-0.18.0.jar \ --packages

Re: PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Riccardo Ferrari
You need to provide your python dependencies as well. See http://spark.apache.org/docs/latest/submitting-applications.html, look for --py-files HTH On Fri, Jan 8, 2021 at 3:13 PM Mich Talebzadeh wrote: > Hi, > > I have a module in Pycharm which reads data stored in a Bigquery table and > does

PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Mich Talebzadeh
Hi, I have a module in Pycharm which reads data stored in a Bigquery table and does plotting. At the command line on the terminal I need to add the jar file and the packet to make it work. (venv) C:\Users\admin\PycharmProjects\pythonProject2\DS\src>spark-submit --jars