Hi, I'm running spark on YARN. My code is very simple. I want to kill one
executor when "data.repartition(10)" is executed. Ho can I do it in easy
way?
val data = sc.sequenceFile[NullWritable, BytesWritable](inputPath)
.map { case (key, value) =>
Data.fromBytes(value)
}
process =
try .pipe(.py) on RDD
Thanks,
Prem
On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri
wrote:
> Can someone please suggest me , thanks
>
> On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri,
> wrote:
>
>> Hello Dear Spark User / Dev,
>>
>> I would like to pass Python user defined function to Spark Job
Can someone please suggest me , thanks
On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri,
wrote:
> Hello Dear Spark User / Dev,
>
> I would like to pass Python user defined function to Spark Job developed
> using Scala and return value of that function would be returned to DF /
> Dataset API.
>
> Can
Hi Aakash,
For clarification are you running this in Yarn client mode or standalone?
How much total yarn memory is available?
>From my experience for a bigger cluster I found the following incremental
settings useful (CDH 5.9, Yarn client) so you can scale yours
[1] - 576GB
--num-executors 24
Can you share the API that your jobs use.. just core RDDs or SQL or
DStreams..etc?
refer recommendations from
https://spark.apache.org/docs/2.3.0/configuration.html for detailed
configurations.
Thanks,
Prem
On Wed, Jul 4, 2018 at 12:34 PM, Aakash Basu
wrote:
> I do not want to change
Hoping below would help in clearing some..
executors dont have control to share the data among themselves except
sharing accumulators via driver's support.
Its all based on the data locality or remote nature, tasks/stages are
defined to perform which may result in shuffle.
On Wed, Jul 4, 2018 at
Hello,
I have a question on Spark Dataflow. If I understand correctly, all
received data is sent from the executor to the driver of the application
prior to task creation.
Then the task embeding the data transit from the driver to the executor
in order to be processed.
As executor cannot
I do not want to change executor/driver cores/memory on the fly in a single
Spark job, all I want is to make them cluster specific. So, I want to have
a formulae, with which, depending on the size of driver and executor
details, I can find out the values for them before submitting those details
in