Re: Spike on number of tasks - dynamic allocation

2023-02-27 Thread Mich Talebzadeh
Hi Murat,

I have dealt with EMR but have used Spark cluster on Google Dataproc with
3.1.1 with autoscaling policy.

My understanding is that autoscaling policy will decide on how to scale if
needed without manual intervention. Is this the case with yours?


HTH


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 27 Feb 2023 at 14:16, murat migdisoglu 
wrote:

> Hey Mich,
> This cluster is running spark 2.4.6 on EMR
>
> On Mon, Feb 27, 2023 at 12:20 PM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Hi,
>>
>> What is the spark version and what type of cluster is it, spark on
>> dataproc or other?
>>
>> HTH
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Mon, 27 Feb 2023 at 09:06, murat migdisoglu <
>> murat.migdiso...@gmail.com> wrote:
>>
>>> On an auto-scaling cluster using YARN as resource manager, we observed
>>> that when we decrease the number of worker nodes after upscaling instance
>>> types, the number of tasks for the same spark job spikes. (the total
>>> cpu/memory capacity of the cluster remains identical)
>>>
>>> the same spark job, with the same spark settings (dynamic allocation is
>>> on), spins up 4-5 times more tasks. Related to that, we see 4-5 times more
>>> executors being allocated.
>>>
>>> As far as I understand, dynamic allocation decides to start a new
>>> executor if it sees tasks pending being queued up. But I don't know why the
>>> same spark application with identical input files runs 4-5 times higher
>>> number of tasks.
>>>
>>> Any clues would be much appreciated, thank you.
>>>
>>> Murat
>>>
>>>
>
> --
> "Talkers aren’t good doers. Rest assured that we’re going there to use
> our hands, not our tongues."
> W. Shakespeare
>


Re: Spike on number of tasks - dynamic allocation

2023-02-27 Thread murat migdisoglu
Hey Mich,
This cluster is running spark 2.4.6 on EMR

On Mon, Feb 27, 2023 at 12:20 PM Mich Talebzadeh 
wrote:

> Hi,
>
> What is the spark version and what type of cluster is it, spark on
> dataproc or other?
>
> HTH
>
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Mon, 27 Feb 2023 at 09:06, murat migdisoglu 
> wrote:
>
>> On an auto-scaling cluster using YARN as resource manager, we observed
>> that when we decrease the number of worker nodes after upscaling instance
>> types, the number of tasks for the same spark job spikes. (the total
>> cpu/memory capacity of the cluster remains identical)
>>
>> the same spark job, with the same spark settings (dynamic allocation is
>> on), spins up 4-5 times more tasks. Related to that, we see 4-5 times more
>> executors being allocated.
>>
>> As far as I understand, dynamic allocation decides to start a new
>> executor if it sees tasks pending being queued up. But I don't know why the
>> same spark application with identical input files runs 4-5 times higher
>> number of tasks.
>>
>> Any clues would be much appreciated, thank you.
>>
>> Murat
>>
>>

-- 
"Talkers aren’t good doers. Rest assured that we’re going there to use our
hands, not our tongues."
W. Shakespeare


Re: Spike on number of tasks - dynamic allocation

2023-02-27 Thread Mich Talebzadeh
Hi,

What is the spark version and what type of cluster is it, spark on dataproc
or other?

HTH



   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 27 Feb 2023 at 09:06, murat migdisoglu 
wrote:

> On an auto-scaling cluster using YARN as resource manager, we observed
> that when we decrease the number of worker nodes after upscaling instance
> types, the number of tasks for the same spark job spikes. (the total
> cpu/memory capacity of the cluster remains identical)
>
> the same spark job, with the same spark settings (dynamic allocation is
> on), spins up 4-5 times more tasks. Related to that, we see 4-5 times more
> executors being allocated.
>
> As far as I understand, dynamic allocation decides to start a new executor
> if it sees tasks pending being queued up. But I don't know why the same
> spark application with identical input files runs 4-5 times higher number
> of tasks.
>
> Any clues would be much appreciated, thank you.
>
> Murat
>
>


Spike on number of tasks - dynamic allocation

2023-02-27 Thread murat migdisoglu
On an auto-scaling cluster using YARN as resource manager, we observed that
when we decrease the number of worker nodes after upscaling instance types,
the number of tasks for the same spark job spikes. (the total cpu/memory
capacity of the cluster remains identical)

the same spark job, with the same spark settings (dynamic allocation is
on), spins up 4-5 times more tasks. Related to that, we see 4-5 times more
executors being allocated.

As far as I understand, dynamic allocation decides to start a new executor
if it sees tasks pending being queued up. But I don't know why the same
spark application with identical input files runs 4-5 times higher number
of tasks.

Any clues would be much appreciated, thank you.

Murat