Re: Where do the executors get my app jar from?

2020-08-13 Thread Rishi Raj Tandon
Hi All, What will happen in case the jar is available on the local network, which is accessible to Driver but not to executors. Is there any good study resource where the deployment of exernal is explained nicely? Regards, Rishi On Fri, Aug 14, 2020 at 11:15 AM Henoc wrote: > If you are

Re: Where do the executors get my app jar from?

2020-08-13 Thread Henoc
If you are running Spark on Yarn, the spark-submit utility will download the jar from S3 and copy it to HDFS in a distributed cache. The driver shares this location with Yarn NodeManagers via the container LaunchContext. NodeManagers localize the jar and place it on container classpath before they

Re: Where do the executors get my app jar from?

2020-08-13 Thread Russell Spitzer
Looking back at the code All --jar Args and such run through https://github.com/apache/spark/blob/7f275ee5978e00ac514e25f5ef1d4e3331f8031b/core/src/main/scala/org/apache/spark/SparkContext.scala#L493-L500 Which calls

Re: Where do the executors get my app jar from?

2020-08-13 Thread Russell Spitzer
The driver hosts a file server which the executors download the jar from. On Thu, Aug 13, 2020, 5:33 PM James Yu wrote: > Hi, > > When I spark submit a Spark app with my app jar located in S3, obviously > the Driver will download the jar from the s3 location. What is not clear > to me is:

Re: Kafka spark structure streaming out of memory issue

2020-08-13 Thread Srinivas V
It depends on how much memory is available and how much data you are processing. Please provide data size and cluster details to help. On Fri, Aug 14, 2020 at 12:54 AM km.santanu wrote: > Hi > I am using Kafka stateless structure streaming.i have enabled watermark as > 1 > hour.after long

Re: help on use case - spark parquet processing

2020-08-13 Thread Amit Sharma
Can you keep option field in your case class. Thanks Amit On Thu, Aug 13, 2020 at 12:47 PM manjay kumar wrote: > Hi , > > I have a use case, > > where i need to merge three data set and build one where ever data is > available. > > And my dataset is a complex object. > > Customer > - name -

Where do the executors get my app jar from?

2020-08-13 Thread James Yu
Hi, When I spark submit a Spark app with my app jar located in S3, obviously the Driver will download the jar from the s3 location. What is not clear to me is: where do the Executors get the jar from? From the same s3 location, or somehow from the Driver, or they don't need the jar? Thanks

Kafka spark structure streaming out of memory issue

2020-08-13 Thread km.santanu
Hi I am using Kafka stateless structure streaming.i have enabled watermark as 1 hour.after long running about 2 hour my job is terminating automatically.check point has been enabled. I am doing average on input data. Can you please suggest how to avoid out of memory error -- Sent from:

help on use case - spark parquet processing

2020-08-13 Thread manjay kumar
Hi , I have a use case, where i need to merge three data set and build one where ever data is available. And my dataset is a complex object. Customer - name - string - accounts - List Account - type - String - Adressess - List Address -name - String --- And it goes on. These file

Re: How can I use pyspark to upsert one row without replacing entire table

2020-08-13 Thread Siavash Namvar
That's kind of solution Ed, can you elaborate how can I do this on Spark side? Or do I need to update table configuration in the DB Siavash On Wed, Aug 12, 2020 at 5:55 PM ed elliott wrote: > You’ll need to do an insert and use a trigger on the table to change it > into an upsert, also make

Spark3 on k8S reading encrypted data from HDFS with KMS in HA

2020-08-13 Thread Michel Sumbul
Hi guys, Does anyone try Spark3 on k8s reading data from HDFS encrypted with KMS in HA mode (with kerberos)? I have a wordcount job running with Spark3 reading data on HDFS (hadoop 3.1) everything secure with kerberos. Everything works fine if the data folder is not encrypted (spark on k8s). If

Spark ShutdownHook through python jobs.

2020-08-13 Thread Shriraj Bhardwaj
Hi, we have spark jobs written totally in python similar to repo https://github.com/AlexIoannides/pyspark-example-project, we are using spark-submit to submit the application in the local mode, but want to send metrics when the job ends (on SIGTERM as well), to do so we need something similar to