date:20180704

Kill spark executor when spark runs specific stage

2018-07-04 Thread Serega Sheypak

Hi, I'm running spark on YARN. My code is very simple. I want to kill one executor when "data.repartition(10)" is executed. Ho can I do it in easy way? val data = sc.sequenceFile[NullWritable, BytesWritable](inputPath) .map { case (key, value) => Data.fromBytes(value) } process =

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-04 Thread Prem Sure

try .pipe(.py) on RDD Thanks, Prem On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri wrote: > Can someone please suggest me , thanks > > On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, > wrote: > >> Hello Dear Spark User / Dev, >> >> I would like to pass Python user defined function to Spark Job

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-04 Thread Chetan Khatri

Can someone please suggest me , thanks On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, wrote: > Hello Dear Spark User / Dev, > > I would like to pass Python user defined function to Spark Job developed > using Scala and return value of that function would be returned to DF / > Dataset API. > > Can

Re: Inferring Data driven Spark parameters

2018-07-04 Thread Mich Talebzadeh

Hi Aakash, For clarification are you running this in Yarn client mode or standalone? How much total yarn memory is available? >From my experience for a bigger cluster I found the following incremental settings useful (CDH 5.9, Yarn client) so you can scale yours [1] - 576GB --num-executors 24

Re: Inferring Data driven Spark parameters

2018-07-04 Thread Prem Sure

Can you share the API that your jobs use.. just core RDDs or SQL or DStreams..etc? refer recommendations from https://spark.apache.org/docs/2.3.0/configuration.html for detailed configurations. Thanks, Prem On Wed, Jul 4, 2018 at 12:34 PM, Aakash Basu wrote: > I do not want to change

Re: [Spark Streaming MEMORY_ONLY] Understanding Dataflow

2018-07-04 Thread Prem Sure

Hoping below would help in clearing some.. executors dont have control to share the data among themselves except sharing accumulators via driver's support. Its all based on the data locality or remote nature, tasks/stages are defined to perform which may result in shuffle. On Wed, Jul 4, 2018 at

[Spark Streaming MEMORY_ONLY] Understanding Dataflow

2018-07-04 Thread thomas lavocat

Hello, I have a question on Spark Dataflow. If I understand correctly, all received data is sent from the executor to the driver of the application prior to task creation. Then the task embeding the data transit from the driver to the executor in order to be processed. As executor cannot

Re: Inferring Data driven Spark parameters

2018-07-04 Thread Aakash Basu

I do not want to change executor/driver cores/memory on the fly in a single Spark job, all I want is to make them cluster specific. So, I want to have a formulae, with which, depending on the size of driver and executor details, I can find out the values for them before submitting those details in

Kill spark executor when spark runs specific stage

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

Re: Inferring Data driven Spark parameters

Re: Inferring Data driven Spark parameters

Re: [Spark Streaming MEMORY_ONLY] Understanding Dataflow

[Spark Streaming MEMORY_ONLY] Understanding Dataflow

Re: Inferring Data driven Spark parameters

8 matches

Site Navigation

Mail list logo

Footer information