Re: [EXTERNAL] Re: [EXTERNAL] Re: Spark-submit without access to HDFS

2023-12-11 Thread Eugene Miretsky
Hey Mich, Thanks for the detailed response. I get most of these options. However, what we are trying to do is avoid having to upload the source configs and pyspark.zip files to the cluster every time we execute the job using spark-submit. Here is the code that does it: https://github.com/apache

Re: [EXTERNAL] Re: Spark-submit without access to HDFS

2023-12-11 Thread Mich Talebzadeh
ster? >> >> On Fri, Nov 17, 2023 at 8:57 AM Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> Hi, >>> >>> How are you submitting your spark job from your client? >>> >>> Your files can either be on HDFS or HCFS s

Re: [EXTERNAL] Re: Spark-submit without access to HDFS

2023-12-10 Thread Eugene Miretsky
ark job from your client? >> >> Your files can either be on HDFS or HCFS such as gs, s3 etc. >> >> With reference to --py-files hdfs://yarn-master-url hdfs://foo.py', I >> assume you want your >> >> spark-submit --verbose \ >>--de

Re: [EXTERNAL] Re: Spark-submit without access to HDFS

2023-12-10 Thread Eugene Miretsky
Nov 17, 2023 at 8:57 AM Mich Talebzadeh wrote: > Hi, > > How are you submitting your spark job from your client? > > Your files can either be on HDFS or HCFS such as gs, s3 etc. > > With reference to --py-files hdfs://yarn-master-url hdfs://foo.py', I > assume you want yo

Re: Spark-submit without access to HDFS

2023-11-17 Thread Mich Talebzadeh
Hi, How are you submitting your spark job from your client? Your files can either be on HDFS or HCFS such as gs, s3 etc. With reference to --py-files hdfs://yarn-master-url hdfs://foo.py', I assume you want your spark-submit --verbose \ --deploy-mode cluster

Re: Spark-submit without access to HDFS

2023-11-16 Thread Jörn Franke
would recommend against it though for various reasons, such as reliability)Am 15.11.2023 um 22:33 schrieb Eugene Miretsky :Hey All, We are running Pyspark spark-submit from a client outside the cluster. The client has network connectivity only to the Yarn Master, not the HDFS Datanodes. How can we

Re: Re: [EXTERNAL] Re: Spark-submit without access to HDFS

2023-11-15 Thread eab...@163.com
Hi Eugene, As the logs indicate, when executing spark-submit, Spark will package and upload spark/conf to HDFS, along with uploading spark/jars. These files are uploaded to HDFS unless you specify uploading them to another OSS. To do so, you'll need to modify the configuration in hdfs

Re: [EXTERNAL] Re: Spark-submit without access to HDFS

2023-11-15 Thread Eugene Miretsky
ioning properly. > It seems that the issue might be due to insufficient disk space. > > -- > eabour > > > *From:* Eugene Miretsky > *Date:* 2023-11-16 05:31 > *To:* user > *Subject:* Spark-submit without access to HDFS > Hey All, >

Re: Spark-submit without access to HDFS

2023-11-15 Thread eab...@163.com
to insufficient disk space. eabour From: Eugene Miretsky Date: 2023-11-16 05:31 To: user Subject: Spark-submit without access to HDFS Hey All, We are running Pyspark spark-submit from a client outside the cluster. The client has network connectivity only to the Yarn Master, not the HDFS

Spark-submit without access to HDFS

2023-11-15 Thread Eugene Miretsky
Hey All, We are running Pyspark spark-submit from a client outside the cluster. The client has network connectivity only to the Yarn Master, not the HDFS Datanodes. How can we submit the jobs? The idea would be to preload all the dependencies (job code, libraries, etc) to HDFS, and just submit

spark-submit: No "driver-" id printed in standalone mode

2023-03-08 Thread Travis Athougies
Hello, I'm trying to get Airflow to work with spark in cluster mode. I can successfully submit jobs via spark-submit and see them complete successfully. However, 'spark-submit' doesn't seem to print any driver- ID to the console. Clearly the drivers have an ID, as they are listed with one

Re: Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-16 Thread Mich Talebzadeh
You can try this gsutil cp src/StructuredStream-on-gke.py gs://codes/ where you create a bucket on gcs called codes Then in you spark-submit do spark-submit --verbose \ --master k8s://https://$KUBERNETES_MASTER_IP:443 \ --deploy-mode cluster \ --name

Re: Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-15 Thread karan alang
age or destruction. > > > > > On Wed, 15 Feb 2023 at 09:12, Mich Talebzadeh > wrote: > >> Your submit command >> >> spark-submit --master k8s://https://34.74.22.140:7077 --deploy-mode >> cluster --name pyspark-example --conf >> spark.kubernetes.co

Re: Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-15 Thread Mich Talebzadeh
y disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Wed, 15 Feb 2023 at 09:12, Mich Talebzadeh wrote: > Your submit command > > spark-submit --master k8s://https://34.74.22.140:7077 --deploy-mode > cluster --nam

Re: Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-15 Thread Mich Talebzadeh
Your submit command spark-submit --master k8s://https://34.74.22.140:7077 --deploy-mode cluster --name pyspark-example --conf spark.kubernetes.container.image=pyspark-example:0.1 --conf spark.kubernetes.file.upload.path=/myexample src/StructuredStream-on-gke.py pay attention to what it says

Re: Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-14 Thread karan alang
myexample/StructuredStream-on-gke.py > > CMD ["pyspark"] > > ``` > Simple pyspark application : > ``` > > from pyspark.sql import SparkSession > spark = > SparkSession.builder.appName("StructuredStreaming-on-gke").getOrCreate() > > dat

Re: Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-14 Thread Ye Xianjin
;font-weight:bold">'k1', 123000), ('k2', 234000), ('k3', 456000)]df = spark.createDataFrame(data, ('id', 'salary'))df.show(5, False)```Spark-submit command :``` spark-submit --master k8s://https://34.74.22.140:7077 --deploy-mode cluster --name pyspark-example --conf spark.kubernetes.container.image=

Re: Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-14 Thread Khalid Mammadov
y /myexample/StructuredStream-on-gke.py > > CMD ["pyspark"] > > ``` > Simple pyspark application : > ``` > > from pyspark.sql import SparkSession > spark = > SparkSession.builder.appName("StructuredStreaming-on-gke").getOrCreate() >

Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-13 Thread karan alang
'k3', 456000)] df = spark.createDataFrame(data, ('id', 'salary')) df.show(5, False) ``` Spark-submit command : ``` spark-submit --master k8s://https://34.74.22.140:7077 --deploy-mode cluster --name pyspark-example --conf spark.kubernetes.container.image=pyspark-example:0.1 --conf spark.kuberne

Fwd: Spark-submit doesn't load all app classes in the classpath

2023-01-28 Thread Soheil Pourbafrani
out why only after adding this parameter to the spark-submit command, the Hikari classes were loaded in the classpath? Thanks

Re: spark-submit fails in kubernetes 1.24.x cluster

2022-12-27 Thread Saurabh Gulati
ubject: [EXTERNAL] spark-submit fails in kubernetes 1.24.x cluster Caution! This email originated outside of FedEx. Please do not open attachments or click links from an unknown or suspicious origin. Hello, We are facing issue with ingress during spark-submit with kubernetes cluster 1.24.x

spark-submit fails in kubernetes 1.24.x cluster

2022-12-23 Thread Thimme Gowda TP (Nokia)
Hello, We are facing issue with ingress during spark-submit with kubernetes cluster 1.24.x . We are using spark 3.3.0 to do spark-submit. # kubectl version WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json

spark-submit on kubernetes

2022-06-21 Thread Michaela Bogiages
want to use spark-submit and submit the entire application. I also don’t want to use a yaml file pointing to the mainApplication that needs to be submitted. How do I set up a spark cluster in kubernetes which then can be accessed to run specific spark jobs? Would a SparkSession be used instead

Re: [EXTERNAL] Re: Unable to access Google buckets using spark-submit

2022-02-14 Thread Saurabh Gulati
ubject: [EXTERNAL] Re: Unable to access Google buckets using spark-submit Caution! This email originated outside of FedEx. Please do not open attachments or click links from an unknown or suspicious origin. Hi Gaurav, All, I'm doing a spark-submit from my local system to a GCP Dataproc c

Re: Unable to access Google buckets using spark-submit

2022-02-13 Thread karan alang
Hi Gaurav, All, I'm doing a spark-submit from my local system to a GCP Dataproc cluster .. This is more for dev/testing. I can run a -- 'gcloud dataproc jobs submit' command as well, which is what will be done in Production. Hope that clarifies. regds, Karan Alang On Sat, Feb 12, 2022 at 10:31

Re: Unable to access Google buckets using spark-submit

2022-02-13 Thread karan alang
owards. > > On Fri, Feb 11, 2022 at 11:58 PM Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> BTW I also answered you in in stackoverflow : >> >> >> https://stackoverflow.com/questions/71088934/unable-to-access-google-buckets-using-spark-submi

Re: Unable to access Google buckets using spark-submit

2022-02-13 Thread karan alang
Thanks, Mich - will check this and update. regds, Karan Alang On Sat, Feb 12, 2022 at 1:57 AM Mich Talebzadeh wrote: > BTW I also answered you in in stackoverflow : > > > https://stackoverflow.com/questions/71088934/unable-to-access-google-buckets-using-spark-submit > > HT

Re: Unable to access Google buckets using spark-submit

2022-02-13 Thread Mich Talebzadeh
Putting the GS access jar with Spark jars may technically resolve the issue of spark-submit but it is not a recommended practice to create a local copy of jar files. The approach that the thread owner adopted by putting the files in Google cloud bucket is correct. Indeed this is what he states

Re: Unable to access Google buckets using spark-submit

2022-02-12 Thread Gourav Sengupta
Hi, agree with Holden, have faced quite a few issues with FUSE. Also trying to understand "spark-submit from local" . Are you submitting your SPARK jobs from a local laptop or in local mode from a GCP dataproc / system? If you are submitting the job from your lo

Re: Unable to access Google buckets using spark-submit

2022-02-12 Thread Holden Karau
88934/unable-to-access-google-buckets-using-spark-submit > > HTH > > >view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your ow

Re: Unable to access Google buckets using spark-submit

2022-02-12 Thread Mich Talebzadeh
BTW I also answered you in in stackoverflow : https://stackoverflow.com/questions/71088934/unable-to-access-google-buckets-using-spark-submit HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Tale

Re: Unable to access Google buckets using spark-submit

2022-02-12 Thread Mich Talebzadeh
You are trying to access a Google storage bucket gs:// from your local host. It does not see it because spark-submit assumes that it is a local file system on the host which is not. You need to mount gs:// bucket as a local file system. You can use the tool called gcsfuse https

Unable to access Google buckets using spark-submit

2022-02-11 Thread karan alang
Hello All, I'm trying to access gcp buckets while running spark-submit from local, and running into issues. I'm getting error : ``` 22/02/11 20:06:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Exception in thread

Re: Self contained Spark application with local master without spark-submit

2022-01-19 Thread Паша
ail.com>: > Hello, > > I noticed I can run spark applications with a local master via sbt run > and also via the IDE. I'd like to run a single threaded worker > application as a self contained jar. > > What does sbt run employ that allows it to run a loc

Self contained Spark application with local master without spark-submit

2022-01-19 Thread Colin Williams
Hello, I noticed I can run spark applications with a local master via sbt run and also via the IDE. I'd like to run a single threaded worker application as a self contained jar. What does sbt run employ that allows it to run a local master? Can I build an uber jar and run without spark-submit

Re: [issue] not able to add external libs to pyspark job while using spark-submit

2021-11-24 Thread Mich Talebzadeh
coordinates? So that > we run sth like pip install or download from pypi index? > > > > *From:* Mich Talebzadeh > *Sent:* Mittwoch, 24. November 2021 18:28 > *Cc:* user@spark.apache.org > *Subject:* Re: [issue] not able to add external libs to pyspark job while > using s

RE: [issue] not able to add external libs to pyspark job while using spark-submit

2021-11-24 Thread Bode, Meikel, NMA-CFD
using spark-submit The easiest way to set this up is to create dependencies.zip file. Assuming that you have a virtual environment already set-up, where there is directory called site-packages, go to that directory and just create a minimal a shell script say package_and_zip_dependencies.sh to do

Re: [issue] not able to add external libs to pyspark job while using spark-submit

2021-11-24 Thread Mich Talebzadeh
src/Python-3.7.3/airflow_virtualenv/lib/python3.7/dependencies.zip" Then in spark-submit you can do this spark-submit --master yarn --deploy-mode client --driver-memory xG --executor-memory yG --num-executors m --executor-cores n --py-files $DEPENDENCIES --jars $HOME/jars/spark-sql-kafka-0-10

Re: [issue] not able to add external libs to pyspark job while using spark-submit

2021-11-24 Thread Atheer Alabdullatif
external libs to pyspark job while using spark-submit You don't often get email from sro...@gmail.com. Learn why this is important<http://aka.ms/LearnAboutSenderIdentification> External Sender: be CAUTION , Particularly with links and attachments. That's not how you add a library. From th

Re: [issue] not able to add external libs to pyspark job while using spark-submit

2021-11-24 Thread Sean Owen
facing an issue with adding external library > [configparser] while running the job using [spark-submit] & [yarn] > > issue: > > > import configparser > ImportError: No module named configparser21/11/24 08:54:38 INFO > util.ShutdownHookManager: Shutdown hook called > >

[issue] not able to add external libs to pyspark job while using spark-submit

2021-11-24 Thread Atheer Alabdullatif
Dear Spark team, hope my email finds you well I am using pyspark 3.0 and facing an issue with adding external library [configparser] while running the job using [spark-submit] & [yarn] issue: import configparser ImportError: No module named configparser 21/11/24 08:54:38

Spark submit on openshift

2021-08-27 Thread Markus Gierich
Hi! I created a spark cluster on openshift using radanalytics.io I'm trying to execute the SparPi sample using spark-submit --name sparkpi-2 \--master spark://hans:7077 \ --deploy-mode cluster \ --class org.apache.spark.examples.SparkPi \ /opt/spark/examples/jars/spark-examples_2.11-2.4.5

Re: spark-submit not running on macbook pro

2021-08-19 Thread Artemis User
My question to the group:  Does anyone have any luck with Apple's JDK when running Spark or other applications (performance-wise)? Is this the one with native libs for the M1 chipset? -- ND On 8/17/21 1:56 AM, karan alang wrote: Hello Experts, i'm trying to run spark-submit on my macbook pro

spark-submit not running on macbook pro

2021-08-16 Thread karan alang
Hello Experts, i'm trying to run spark-submit on my macbook pro(commandline or using PyCharm), and it seems to be giving error -> Exception: Java gateway process exited before sending its port number i've tried setting values to variable in the program (based on the recommendations by peo

Why PySpark with spark-submit throws error trying to untar --archives pyspark_venv.tar.gz

2021-07-26 Thread Mich Talebzadeh
Hi, Maybe someone can shed some light on this. Running Pyspark job in minikube. Because it is PySpark the following two conf parameters are used: spark-submit --verbose \ --master k8s://$K8S_SERVER \ --deploy-mode cluster \ --name pytest

spark-submit --files causes spark context to fail in Kubernetes

2021-07-24 Thread Mich Talebzadeh
ollows: (shown in blue) export VOLUME_TYPE=hostPath export VOLUME_NAME=minikube-mount export SOURCE_DIR=/d4T/hduser/minikube export MOUNT_PATH=$SOURCE_DIR/mnt spark-submit --verbose \ --master k8s://$K8S_SERVER \ --deploy-mode cluster \ --name

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-15 Thread Mich Talebzadeh
This is an interesting one. I have never tried to add --files ... spark-submit --master yarn --deploy-mode client --files /etc/hive/conf/hive-site.xml,/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml Rather, under $SPARK_HOME/conf, I create soft links to the needed XML files

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-15 Thread KhajaAsmath Mohammed
Thanks everyone. I was able to resolve this. Here is what I did. Just passed conf file using —files option. Mistake that I did was reading the json conf file before creating spark session . Reading if after creating spark session helped it. Thanks once again for your valuable suggestions

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-15 Thread Sean Owen
If code running on the executors need some local file like a config file, then it does have to be passed this way. That much is normal. On Sat, May 15, 2021 at 1:41 AM Gourav Sengupta wrote: > Hi, > > once again lets start with the requirement. Why are you trying to pass xml > and json files to

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-15 Thread Gourav Sengupta
it Joshi > > On Sat, May 15, 2021 at 5:14 AM KhajaAsmath Mohammed < > mdkhajaasm...@gmail.com> wrote: > >> Here is my updated spark submit without any luck., >> >> spark-submit --master yarn --deploy-mode cluster --files >> /appl/common/ftp/conf.json,/etc/hive/c

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-14 Thread Amit Joshi
ammed < mdkhajaasm...@gmail.com> wrote: > Here is my updated spark submit without any luck., > > spark-submit --master yarn --deploy-mode cluster --files > /appl/common/ftp/conf.json,/etc/hive/conf/hive-site.xml,/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml > --

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-14 Thread KhajaAsmath Mohammed
Here is my updated spark submit without any luck., spark-submit --master yarn --deploy-mode cluster --files /appl/common/ftp/conf.json,/etc/hive/conf/hive-site.xml,/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml --num-executors 6 --executor-cores 3 --driver-cores 3 --driver-memory

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-14 Thread KhajaAsmath Mohammed
gt;>> >>> >>> >>> >>> *From: *KhajaAsmath Mohammed >>> *Date: *Friday, May 14, 2021 at 4:50 PM >>> *To: *"user @spark" >>> *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error >>> >>> >>> >>> /appl/common/ftp/conf.json >>> >>

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-14 Thread KhajaAsmath Mohammed
gt;> >> >> *From: *KhajaAsmath Mohammed >> *Date: *Friday, May 14, 2021 at 4:50 PM >> *To: *"user @spark" >> *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error >> >> >> >> /appl/common/ftp/conf.json >> >

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-14 Thread KhajaAsmath Mohammed
cal FS) > /appl/common/ftp/conf.json > > > > > > *From: *KhajaAsmath Mohammed > *Date: *Friday, May 14, 2021 at 4:50 PM > *To: *"user @spark" > *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error > > > > /appl/common/ftp/conf.json >

Urgent Help - Py Spark submit error

2021-05-14 Thread KhajaAsmath Mohammed
Hi, I am having a weird situation where the below command works when the deploy mode is a client and fails if it is a cluster. spark-submit --master yarn --deploy-mode client --files /etc/hive/conf/hive-site.xml,/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml --driver-memory 70g

Re: Spark submit hbase issue

2021-04-14 Thread Mich Talebzadeh
other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Wed, 14 Apr 2021 at 22:40, KhajaAsmath Mohammed wrote: > Hi, > &g

Spark submit hbase issue

2021-04-14 Thread KhajaAsmath Mohammed
Hi, Spark submit is connecting to local host instead of zookeeper mentioned in hbase-site.xml. This same program works in ide which gets connected to hbase-site.xml. What am I missing in spark submit? > >  > spark-submit --driver-class-path > C:\Users\mdkha\bitbucket\clx-spark

Overirde Jackson jar - Spark Submit

2021-04-14 Thread KhajaAsmath Mohammed
Hi, I am having similar issue as mentioned in the below link but was not able to resolve. any other solutions please? https://stackoverflow.com/questions/57329060/exclusion-of-dependency-of-spark-core-in-cdh Thanks, Asmath

Re: K8S spark-submit Loses Successful Driver Completion

2021-02-15 Thread Attila Zsolt Piros
Hi, I am not using Airflow but I assume your application is deployed in cluster mode and in this case the class you are looking for is *org.apache.spark.deploy.k8s.submit.Client* [1]. If we are talking about the first "spark-submit" used to start the application and not "spark-

K8S spark-submit Loses Successful Driver Completion

2021-01-22 Thread Marshall Markham
Hi, I am running Airflow + Spark + AKS (Azure K8s). Sporadically, when I have a spark job complete, my spark-submit process does not notice that the driver has succeeded and continues to track the job as running. Does anyone know how spark-submit process monitors driver processes on k8s? My

Re: PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Mich Talebzadeh
Just to clarify, are you referring to module dependencies in PySpark? With Scala I can create a Uber jar file inclusive of all bits and pieces built with maven or sbt that works in a cluster and submit to spark-submit as a uber jar file. what alternatives would you suggest for PySpark, a zip

Re: PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Sean Owen
up sparkstuff.py > > > (venv) C:\Users\admin\PycharmProjects\pythonProject2\DS\src>*where > sparkstuff.py* > > C:\Users\admin\PycharmProjects\packages\sparkutils\sparkstuff.py > > But in spark-submit within the code it does not > > (venv) C:\Users\admin\PycharmProjects\pytho

Re: PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Mich Talebzadeh
env) C:\Users\admin\PycharmProjects\pythonProject2\DS\src>*where sparkstuff.py* C:\Users\admin\PycharmProjects\packages\sparkutils\sparkstuff.py But in spark-submit within the code it does not (venv) C:\Users\admin\PycharmProjects\pythonProject2\DS\src>spark-submit --jars ..\spark

Re: PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Mich Talebzadeh
; >> This is what I do at Pycharm *terminal* to invoke the module python >> >> spark-submit --jars >> ..\lib\spark-bigquery-with-dependencies_2.12-0.18.0.jar \ >> --packages com.github.samelamin:spark-bigquery_2.11:0.2.6 >> analyze_house_prices_GCP.py >> &g

Re: PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Riccardo Ferrari
, 2021 at 10:20 AM Mich Talebzadeh > wrote: > >> Thanks Riccardo. >> >> I am well aware of the submission form >> >> However, my question relates to doing submission within PyCharm itself. >> >> This is what I do at Pycharm *terminal* to invoke

Re: PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Sean Owen
PyCharm itself. > > This is what I do at Pycharm *terminal* to invoke the module python > > spark-submit --jars > ..\lib\spark-bigquery-with-dependencies_2.12-0.18.0.jar \ > --packages com.github.samelamin:spark-bigquery_2.11:0.2.6 > analyze_house_prices_GCP.py > > H

Re: PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Mich Talebzadeh
Thanks Riccardo. I am well aware of the submission form However, my question relates to doing submission within PyCharm itself. This is what I do at Pycharm *terminal* to invoke the module python spark-submit --jars ..\lib\spark-bigquery-with-dependencies_2.12-0.18.0.jar \ --packages

Re: PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Riccardo Ferrari
nd > does plotting. > > At the command line on the terminal I need to add the jar file and the > packet to make it work. > > (venv) C:\Users\admin\PycharmProjects\pythonProject2\DS\src>spark-submit > --jars ..\lib\spark-bigquery-with-dependencies_2.12-0.18.0.jar &

PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Mich Talebzadeh
Hi, I have a module in Pycharm which reads data stored in a Bigquery table and does plotting. At the command line on the terminal I need to add the jar file and the packet to make it work. (venv) C:\Users\admin\PycharmProjects\pythonProject2\DS\src>spark-submit --jars ..\lib\spark-bigqu

Re: Path of jars added to a Spark Job - spark-submit // // Override jars in spark submit

2020-11-12 Thread Dominique De Vito
oughout the Spark cluster*. I have spent a fair bit of time on this > and I recommend that you follow this procedure to make sure that the > spark-submit job runs ok. Use the spark.yarn.archive configuration option > and set that to the location of an archive (you create on HDFS) containing

Re: Path of jars added to a Spark Job - spark-submit // // Override jars in spark submit

2020-11-12 Thread Dominique De Vito
see some discrepancy with what the Spark doc that says: ""When using spark-submit, the application jar along with any jars included with the --jars option will be automatically transferred to the cluster. URLs supplied after --jars must be separated by commas. That list is included

Re: Path of jars added to a Spark Job - spark-submit // // Override jars in spark submit

2020-11-12 Thread Mich Talebzadeh
that you follow this procedure to make sure that the spark-submit job runs ok. Use the spark.yarn.archive configuration option and set that to the location of an archive (you create on HDFS) containing all the JARs in the $SPARK_HOME/jars/ folder, at the root level of the archive. For example: 1

Re: Path of jars added to a Spark Job - spark-submit // // Override jars in spark submit

2020-11-12 Thread Russell Spitzer
, 2020 at 10:02 AM Dominique De Vito wrote: > Hi, > > I am using Spark 2.1 (BTW) on YARN. > > I am trying to upload JAR on YARN cluster, and to use them to replace > on-site (alreading in-place) JAR. > > I am trying to do so through spark-submit. > > One helpful answ

Path of jars added to a Spark Job - spark-submit // // Override jars in spark submit

2020-11-12 Thread Dominique De Vito
Hi, I am using Spark 2.1 (BTW) on YARN. I am trying to upload JAR on YARN cluster, and to use them to replace on-site (alreading in-place) JAR. I am trying to do so through spark-submit. One helpful answer https://stackoverflow.com/questions/37132559/add-jars-to-a-spark-job-spark-submit

Re: spark-submit parameters about two keytab files to yarn and kafka

2020-11-01 Thread kevin chen
ache=false serviceName="kafka" principal="kafka/x...@example.com"; }; *step 2:* spark-submit command : /usr/local/spark/bin/spark-submit \ --files ./kafka_client_jaas.conf,./kafka.service.keytab \ --driver-java-options "-Djava.security.auth.login.config

Re: spark-submit parameters about two keytab files to yarn and kafka

2020-10-28 Thread Gabor Somogyi
, and they have the > different kerberos authentication. > > We have two keytab files for YARN and Kafka. > > And my questions is how to add parameter for spark-submit command for > this situation? > > Thanks. > > > ---

spark-submit parameters about two keytab files to yarn and kafka

2020-10-28 Thread big data
Hi, We want to submit spark streaming job to YARN and consume Kafka topic. YARN and Kafka are in two different clusters, and they have the different kerberos authentication. We have two keytab files for YARN and Kafka. And my questions is how to add parameter for spark-submit command

Re: Why spark-submit works with package not with jar

2020-10-21 Thread Wim Van Leuven
We actually zipped the full conda environments during our build and ship those On Wed, 21 Oct 2020 at 20:25, Mich Talebzadeh wrote: > How about PySpark? What process can that go through to not depend on > external repo access in production > > > LinkedIn * >

Re: Why spark-submit works with package not with jar

2020-10-21 Thread Mich Talebzadeh
How about PySpark? What process can that go through to not depend on external repo access in production LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Re: Why spark-submit works with package not with jar

2020-10-21 Thread Sean Owen
Yes, it's reasonable to build an uber-jar in development, using Maven/Ivy to resolve dependencies (and of course excluding 'provided' dependencies like Spark), and push that to production. That gives you a static artifact to run that does not depend on external repo access in production. On Wed,

Re: Why spark-submit works with package not with jar

2020-10-21 Thread Wim Van Leuven
ach to sort this >>> out by just using jars and working out their dependencies in ~/.ivy2/jars >>> directory using grep -lRi :) >>> >>> >>> This now works with just using jars (new added ones in grey) after >>> resolving the dependenci

Re: Why spark-submit works with package not with jar

2020-10-21 Thread Mich Talebzadeh
t;> Anyway as Nicola suggested I used the trench war approach to sort this >> out by just using jars and working out their dependencies in ~/.ivy2/jars >> directory using grep -lRi :) >> >> >> This now works with just using jars (new added ones in grey) after >&

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Wim Van Leuven
st using jars (new added ones in grey) after > resolving the dependencies > > > ${SPARK_HOME}/bin/spark-submit \ > > --master yarn \ > > --deploy-mode client \ > > --conf spark.executor.memoryOverhead=3000 \

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Mich Talebzadeh
${SPARK_HOME}/bin/spark-submit \ --master yarn \ --deploy-mode client \ --conf spark.executor.memoryOverhead=3000 \ --class org.apache.spark.repl.Main \ --name "my own Spark shell on Yarn" "$@" \

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Sean Owen
Rather, let --packages (via Ivy) worry about them, because they tell Ivy what they need. There's no 100% guarantee that conflicting dependencies are resolved in a way that works in every single case, which you run into sometimes when using incompatible libraries, but yes this is the point of

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Mich Talebzadeh
va-1.0.5.jar >>> com.google.apis_google-api-services-storage-v1-rev135-1.24.1.jar >>> com.google.http-client_google-http-client-1.24.1.jar >>> org.apache.commons_commons-compress-1.4.1.jar >>> com.google.auto.value_auto-value-annotations-1.6.2.jar >>> com.google.http-cli

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Mich Talebzadeh
s-storage-v1-rev135-1.24.1.jar >> com.google.http-client_google-http-client-1.24.1.jar >> org.apache.commons_commons-compress-1.4.1.jar >> com.google.auto.value_auto-value-annotations-1.6.2.jar >> com.google.http-client_google-http-client-jackson2-1.24.1.jar >> org.apache.httpcomponents_httpclient

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Sean Owen
taoss_bigquery-connector-0.13.4-hadoop2.jar > com.google.j2objc_j2objc-annotations-1.1.jar > org.apache.httpcomponents_httpcore-4.0.1.jar > > I don't think I need to add all of these to spark-submit --jars list. Is > there a way I can find out which dependency is missing > > This

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Nicolas Paris
taoss_bigquery-connector-0.13.4-hadoop2.jar > com.google.j2objc_j2objc-annotations-1.1.jar > org.apache.httpcomponents_httpcore-4.0.1.jar > > I don't think I need to add all of these to spark-submit --jars list. Is > there a way I can find out which dependen

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Mich Talebzadeh
4.1.jar org.apache.httpcomponents_httpclient-4.0.1.jar com.google.cloud.bigdataoss_bigquery-connector-0.13.4-hadoop2.jar com.google.j2objc_j2objc-annotations-1.1.jar org.apache.httpcomponents_httpcore-4.0.1.jar I don't think I need to add all of these to spark-submit --jars list. Is there a way I

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Nicolas Paris
ilding Uber jar file (to evict the unwanted ones). >> These used to work a year and half ago using Google Dataproc compute >> engines (comes with Spark preloaded) and I could create an Uber jar file. >> >> Unfortunately this has become problematic now so tried to use

Re: Why spark-submit works with package not with jar

2020-10-20 Thread ayan guha
s has become problematic now so tried to use spark-submit > instead as follows: > > ${SPARK_HOME}/bin/spark-submit \ > --master yarn \ > --deploy-mode client \ > --conf spark.executor.memoryOverhead=3000 \ >

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Mich Talebzadeh
) and I could create an Uber jar file. Unfortunately this has become problematic now so tried to use spark-submit instead as follows: ${SPARK_HOME}/bin/spark-submit \ --master yarn \ --deploy-mode client \ --conf spark.executor.memoryOverhead=3000

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Sean Owen
io that I use in Spark submit as follows: > > spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar --jars > /home/hduser/jars/spark-bigquery-latest.jar,/home/hduser/jars/ddhybrid.jar, > */home/hduser/jars/spark-bigquery_2.11-0.2.6.jar* > > As you can see the

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Russell Spitzer
--jar Adds only that jar --package adds the Jar and a it's dependencies listed in maven On Tue, Oct 20, 2020 at 10:50 AM Mich Talebzadeh wrote: > Hi, > > I have a scenario that I use in Spark submit as follows: > > spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar

Why spark-submit works with package not with jar

2020-10-20 Thread Mich Talebzadeh
Hi, I have a scenario that I use in Spark submit as follows: spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar --jars /home/hduser/jars/spark-bigquery-latest.jar,/home/hduser/jars/ddhybrid.jar, */home/hduser/jars/spark-bigquery_2.11-0.2.6.jar* As you can see the jar files needed

Spark Submit processes hanging & leaking memory

2020-09-22 Thread ER
Exactly 3 spark submit processes are hanging from the first 3 jobs that were submitted to the standalone cluster using client mode. Example from the client: root 1517 0.3 4.7 8412728 1532876 ? Sl 18:49 0:38 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -cp /usr/local/spark/conf/:/usr/local/spark

Unable to run bash script when using spark-submit in cluster mode.

2020-07-23 Thread Nasrulla Khan Haris
Hi Spark Users, I am trying to execute bash script from my spark app. I can run the below command without issues from spark-shell however when I use it in the spark-app and submit with spark-submit, container is not able to find the directories. val result = "export LD_LIBRARY_PATH=/ bin

RE: Unable to run bash script when using spark-submit in cluster mode.

2020-07-23 Thread Nasrulla Khan Haris
Are local paths not exposed in containers ? Thanks, Nasrulla From: Nasrulla Khan Haris Sent: Thursday, July 23, 2020 6:13 PM To: user@spark.apache.org Subject: Unable to run bash script when using spark-submit in cluster mode. Importance: High Hi Spark Users, I am trying to execute bash

  1   2   3   4   5   6   7   8   >