Kill spark executor when spark runs specific stage

2018-07-04 Thread Serega Sheypak
Hi, I'm running spark on YARN. My code is very simple. I want to kill one executor when "data.repartition(10)" is executed. Ho can I do it in easy way? val data = sc.sequenceFile[NullWritable, BytesWritable](inputPath) .map { case (key, value) => Data.fromBytes(value) } process =

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-04 Thread Prem Sure
try .pipe(.py) on RDD Thanks, Prem On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri wrote: > Can someone please suggest me , thanks > > On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, > wrote: > >> Hello Dear Spark User / Dev, >> >> I would like to pass Python user defined function to Spark Job

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-04 Thread Chetan Khatri
Can someone please suggest me , thanks On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, wrote: > Hello Dear Spark User / Dev, > > I would like to pass Python user defined function to Spark Job developed > using Scala and return value of that function would be returned to DF / > Dataset API. > > Can

Re: Inferring Data driven Spark parameters

2018-07-04 Thread Mich Talebzadeh
Hi Aakash, For clarification are you running this in Yarn client mode or standalone? How much total yarn memory is available? >From my experience for a bigger cluster I found the following incremental settings useful (CDH 5.9, Yarn client) so you can scale yours [1] - 576GB --num-executors 24

Re: Inferring Data driven Spark parameters

2018-07-04 Thread Prem Sure
Can you share the API that your jobs use.. just core RDDs or SQL or DStreams..etc? refer recommendations from https://spark.apache.org/docs/2.3.0/configuration.html for detailed configurations. Thanks, Prem On Wed, Jul 4, 2018 at 12:34 PM, Aakash Basu wrote: > I do not want to change

Re: [Spark Streaming MEMORY_ONLY] Understanding Dataflow

2018-07-04 Thread Prem Sure
Hoping below would help in clearing some.. executors dont have control to share the data among themselves except sharing accumulators via driver's support. Its all based on the data locality or remote nature, tasks/stages are defined to perform which may result in shuffle. On Wed, Jul 4, 2018 at

[Spark Streaming MEMORY_ONLY] Understanding Dataflow

2018-07-04 Thread thomas lavocat
Hello, I have a question on Spark Dataflow. If I understand correctly, all received data is sent from the executor to the driver of the application prior to task creation. Then the task embeding the data transit from the driver to the executor in order to be processed. As executor cannot

Re: Inferring Data driven Spark parameters

2018-07-04 Thread Aakash Basu
I do not want to change executor/driver cores/memory on the fly in a single Spark job, all I want is to make them cluster specific. So, I want to have a formulae, with which, depending on the size of driver and executor details, I can find out the values for them before submitting those details in