Re: Dynamic Scaling without Kubernetes

2022-10-26 Thread Artemis User
Wouldn't you need to run Spark on Hadoop in order to use YARN?  I 
believe that YARN only manages Hadoop nodes, not Spark workers 
directly.  Besides, what I read was that you would need some extra 
plug-ins to be able to get nodes managed dynamically.


Our use case would be like this:

1. A Spark cluster is launched with some fixed number of initial nodes
   at the beginning.
2. As work load reaches max capacity (e.g. no more executors), a job
   submission is rejected or has to wait in the queue.
3. A new worker node is then instantiated (e.g., a pre-configured
   container hosting a worker node is created and started) to take the
   extra work load so new jobs can be submitted.
4. Optional:  If some worker nodes have been idle for a while, they can
   be stopped or removed from the cluster.

I guess an external Spark monitor or manager would be needed to keep an 
eye on the work load of the cluster and submission status to be able to 
launch/remove new nodes.   This shouldn't be difficult to do instead of 
dealing with complex frameworks like k8s which isn't really designed for 
small scale, on-prem use of Spark and requires dedicated admin resources.



On 10/26/22 3:20 PM, Holden Karau wrote:
So Spark can dynamically scale on YARN, but standalone mode becomes a 
bit complicated — where do you envision Spark gets the extra resources 
from?


On Wed, Oct 26, 2022 at 12:18 PM Artemis User  
wrote:


Has anyone tried to make a Spark cluster dynamically scalable, i.e.,
adding a new worker node automatically to the cluster when no more
executors are available upon a new job submitted?  We need to make
the
whole cluster on-prem and really lightweight, so standalone mode is
preferred and no k8s if possible.   Any suggestion?  Thanks in
advance!

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

--
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.): 
https://amzn.to/2MaRAG9 

YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: Dynamic Scaling without Kubernetes

2022-10-26 Thread Holden Karau
So Spark can dynamically scale on YARN, but standalone mode becomes a bit
complicated — where do you envision Spark gets the extra resources from?

On Wed, Oct 26, 2022 at 12:18 PM Artemis User 
wrote:

> Has anyone tried to make a Spark cluster dynamically scalable, i.e.,
> adding a new worker node automatically to the cluster when no more
> executors are available upon a new job submitted?  We need to make the
> whole cluster on-prem and really lightweight, so standalone mode is
> preferred and no k8s if possible.   Any suggestion?  Thanks in advance!
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Dynamic Scaling without Kubernetes

2022-10-26 Thread Artemis User
Has anyone tried to make a Spark cluster dynamically scalable, i.e., 
adding a new worker node automatically to the cluster when no more 
executors are available upon a new job submitted?  We need to make the 
whole cluster on-prem and really lightweight, so standalone mode is 
preferred and no k8s if possible.   Any suggestion?  Thanks in advance!


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Running 30 Spark applications at the same time is slower than one on average

2022-10-26 Thread Sean Owen
That just means G = GB mem, C = cores, but yeah the driver and executors
are very small, possibly related.

On Wed, Oct 26, 2022 at 12:34 PM Artemis User 
wrote:

> Are these Cloudera specific acronyms?  Not sure how Cloudera configures
> Spark differently, but obviously the number of nodes is too small,
> considering each app only uses a small number of cores and RAM.  So you may
> consider increase the number of nodes.   When all these apps jam on a few
> nodes, the cluster manager/scheduler and/or the network becomes
> overwhelmed...
>
> On 10/26/22 8:09 AM, Sean Owen wrote:
>
> Resource contention. Now all the CPU and I/O is competing and probably
> slows down
>
> On Wed, Oct 26, 2022, 5:37 AM eab...@163.com  wrote:
>
>> Hi All,
>>
>> I have a CDH5.16.2 hadoop cluster with 1+3 nodes(64C/128G, 1NN/RM +
>> 3DN/NM), and yarn with 192C/240G. I used the following test scenario:
>>
>> 1.spark app resource with 2G driver memory/2C driver vcore/1 executor
>> nums/2G executor memory/2C executor vcore.
>> 2.one spark app will use 5G4C on yarn.
>> 3.first, I only run one spark app takes 40s.
>> 4.Then, I run 30 the same spark app at once, and each spark app takes 80s
>> on average.
>>
>> So, I want to know why the run time gap is so big, and how to optimize?
>>
>> Thanks
>>
>>
>


Re: Running 30 Spark applications at the same time is slower than one on average

2022-10-26 Thread Artemis User
Are these Cloudera specific acronyms?  Not sure how Cloudera configures 
Spark differently, but obviously the number of nodes is too small, 
considering each app only uses a small number of cores and RAM.  So you 
may consider increase the number of nodes.   When all these apps jam on 
a few nodes, the cluster manager/scheduler and/or the network becomes 
overwhelmed...


On 10/26/22 8:09 AM, Sean Owen wrote:
Resource contention. Now all the CPU and I/O is competing and probably 
slows down


On Wed, Oct 26, 2022, 5:37 AM eab...@163.com  wrote:

Hi All,

I have a CDH5.16.2 hadoop cluster with 1+3 nodes(64C/128G, 1NN/RM
+ 3DN/NM), and yarn with 192C/240G. I used the following test
scenario:

1.spark app resource with 2G driver memory/2C driver vcore/1
executor nums/2G executor memory/2C executor vcore.
2.one spark app will use 5G4C on yarn.
3.first, I only run one spark app takes 40s.
4.Then, I run 30 the same spark app at once, and each spark app
takes 80s on average.

So, I want to know why the run time gap is so big, and how to
optimize?

Thanks



Re: [ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread Chao Sun
Congrats everyone! and thanks Yuming for driving the release!

On Wed, Oct 26, 2022 at 7:37 AM beliefer  wrote:
>
> Congratulations everyone have contributed to this release.
>
>
> At 2022-10-26 14:21:36, "Yuming Wang"  wrote:
>
> We are happy to announce the availability of Apache Spark 3.3.1!
>
> Spark 3.3.1 is a maintenance release containing stability fixes. This
> release is based on the branch-3.3 maintenance branch of Spark. We strongly
> recommend all 3.3 users to upgrade to this stable release.
>
> To download Spark 3.3.1, head over to the download page:
> https://spark.apache.org/downloads.html
>
> To view the release notes:
> https://spark.apache.org/releases/spark-release-3-3-1.html
>
> We would like to acknowledge all community members for contributing to this
> release. This release would not have been possible without you.
>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re:[ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread beliefer
Congratulations everyone have contributed to this release.




At 2022-10-26 14:21:36, "Yuming Wang"  wrote:

We are happy to announce the availability of Apache Spark 3.3.1!

Spark 3.3.1 is a maintenance release containing stability fixes. This
release is based on the branch-3.3 maintenance branch of Spark. We strongly
recommend all 3.3 users to upgrade to this stable release.

To download Spark 3.3.1, head over to the download page:
https://spark.apache.org/downloads.html

To view the release notes:
https://spark.apache.org/releases/spark-release-3-3-1.html

We would like to acknowledge all community members for contributing to this
release. This release would not have been possible without you.




Re: Running 30 Spark applications at the same time is slower than one on average

2022-10-26 Thread Sean Owen
Resource contention. Now all the CPU and I/O is competing and probably
slows down

On Wed, Oct 26, 2022, 5:37 AM eab...@163.com  wrote:

> Hi All,
>
> I have a CDH5.16.2 hadoop cluster with 1+3 nodes(64C/128G, 1NN/RM +
> 3DN/NM), and yarn with 192C/240G. I used the following test scenario:
>
> 1.spark app resource with 2G driver memory/2C driver vcore/1 executor
> nums/2G executor memory/2C executor vcore.
> 2.one spark app will use 5G4C on yarn.
> 3.first, I only run one spark app takes 40s.
> 4.Then, I run 30 the same spark app at once, and each spark app takes 80s
> on average.
>
> So, I want to know why the run time gap is so big, and how to optimize?
>
> Thanks
>
>


Running 30 Spark applications at the same time is slower than one on average

2022-10-26 Thread eab...@163.com
Hi All,

I have a CDH5.16.2 hadoop cluster with 1+3 nodes(64C/128G, 1NN/RM + 3DN/NM), 
and yarn with 192C/240G. I used the following test scenario:

1.spark app resource with 2G driver memory/2C driver vcore/1 executor nums/2G 
executor memory/2C executor vcore.
2.one spark app will use 5G4C on yarn.
3.first, I only run one spark app takes 40s.
4.Then, I run 30 the same spark app at once, and each spark app takes 80s on 
average.

So, I want to know why the run time gap is so big, and how to optimize?

Thanks



Re: [ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread Jacek Laskowski
Yoohoo! Thanks Yuming for driving this release. A tiny step for Spark a
huge one for my clients (who still are on 3.2.1 or even older :))

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
"The Internals Of" Online Books 
Follow me on https://twitter.com/jaceklaskowski




On Wed, Oct 26, 2022 at 8:22 AM Yuming Wang  wrote:

> We are happy to announce the availability of Apache Spark 3.3.1!
>
> Spark 3.3.1 is a maintenance release containing stability fixes. This
> release is based on the branch-3.3 maintenance branch of Spark. We strongly
> recommend all 3.3 users to upgrade to this stable release.
>
> To download Spark 3.3.1, head over to the download page:
> https://spark.apache.org/downloads.html
>
> To view the release notes:
> https://spark.apache.org/releases/spark-release-3-3-1.html
>
> We would like to acknowledge all community members for contributing to this
> release. This release would not have been possible without you.
>
>
>


Re: [ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread Yang,Jie(INF)
Thanks Yuming and all developers ~

Yang Jie

发件人: Maxim Gekk 
日期: 2022年10月26日 星期三 15:19
收件人: Hyukjin Kwon 
抄送: "L. C. Hsieh" , Dongjoon Hyun , 
Yuming Wang , dev , User 

主题: Re: [ANNOUNCE] Apache Spark 3.3.1 released

Congratulations everyone with the new release, and thanks to Yuming for his 
efforts.

Maxim Gekk

Software Engineer

Databricks, Inc.


On Wed, Oct 26, 2022 at 10:14 AM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
Thanks, Yuming.

On Wed, 26 Oct 2022 at 16:01, L. C. Hsieh 
mailto:vii...@gmail.com>> wrote:
Thank you for driving the release of Apache Spark 3.3.1, Yuming!

On Tue, Oct 25, 2022 at 11:38 PM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
>
> It's great. Thank you so much, Yuming!
>
> Dongjoon
>
> On Tue, Oct 25, 2022 at 11:23 PM Yuming Wang 
> mailto:wgy...@gmail.com>> wrote:
>>
>> We are happy to announce the availability of Apache Spark 3.3.1!
>>
>> Spark 3.3.1 is a maintenance release containing stability fixes. This
>> release is based on the branch-3.3 maintenance branch of Spark. We strongly
>> recommend all 3.3 users to upgrade to this stable release.
>>
>> To download Spark 3.3.1, head over to the download page:
>> https://spark.apache.org/downloads.html
>>
>> To view the release notes:
>> https://spark.apache.org/releases/spark-release-3-3-1.html
>>
>> We would like to acknowledge all community members for contributing to this
>> release. This release would not have been possible without you.
>>
>>

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org


Re: [ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread Maxim Gekk
Congratulations everyone with the new release, and thanks to Yuming for his
efforts.

Maxim Gekk

Software Engineer

Databricks, Inc.


On Wed, Oct 26, 2022 at 10:14 AM Hyukjin Kwon  wrote:

> Thanks, Yuming.
>
> On Wed, 26 Oct 2022 at 16:01, L. C. Hsieh  wrote:
>
>> Thank you for driving the release of Apache Spark 3.3.1, Yuming!
>>
>> On Tue, Oct 25, 2022 at 11:38 PM Dongjoon Hyun 
>> wrote:
>> >
>> > It's great. Thank you so much, Yuming!
>> >
>> > Dongjoon
>> >
>> > On Tue, Oct 25, 2022 at 11:23 PM Yuming Wang  wrote:
>> >>
>> >> We are happy to announce the availability of Apache Spark 3.3.1!
>> >>
>> >> Spark 3.3.1 is a maintenance release containing stability fixes. This
>> >> release is based on the branch-3.3 maintenance branch of Spark. We
>> strongly
>> >> recommend all 3.3 users to upgrade to this stable release.
>> >>
>> >> To download Spark 3.3.1, head over to the download page:
>> >> https://spark.apache.org/downloads.html
>> >>
>> >> To view the release notes:
>> >> https://spark.apache.org/releases/spark-release-3-3-1.html
>> >>
>> >> We would like to acknowledge all community members for contributing to
>> this
>> >> release. This release would not have been possible without you.
>> >>
>> >>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread Hyukjin Kwon
Thanks, Yuming.

On Wed, 26 Oct 2022 at 16:01, L. C. Hsieh  wrote:

> Thank you for driving the release of Apache Spark 3.3.1, Yuming!
>
> On Tue, Oct 25, 2022 at 11:38 PM Dongjoon Hyun 
> wrote:
> >
> > It's great. Thank you so much, Yuming!
> >
> > Dongjoon
> >
> > On Tue, Oct 25, 2022 at 11:23 PM Yuming Wang  wrote:
> >>
> >> We are happy to announce the availability of Apache Spark 3.3.1!
> >>
> >> Spark 3.3.1 is a maintenance release containing stability fixes. This
> >> release is based on the branch-3.3 maintenance branch of Spark. We
> strongly
> >> recommend all 3.3 users to upgrade to this stable release.
> >>
> >> To download Spark 3.3.1, head over to the download page:
> >> https://spark.apache.org/downloads.html
> >>
> >> To view the release notes:
> >> https://spark.apache.org/releases/spark-release-3-3-1.html
> >>
> >> We would like to acknowledge all community members for contributing to
> this
> >> release. This release would not have been possible without you.
> >>
> >>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread L. C. Hsieh
Thank you for driving the release of Apache Spark 3.3.1, Yuming!

On Tue, Oct 25, 2022 at 11:38 PM Dongjoon Hyun  wrote:
>
> It's great. Thank you so much, Yuming!
>
> Dongjoon
>
> On Tue, Oct 25, 2022 at 11:23 PM Yuming Wang  wrote:
>>
>> We are happy to announce the availability of Apache Spark 3.3.1!
>>
>> Spark 3.3.1 is a maintenance release containing stability fixes. This
>> release is based on the branch-3.3 maintenance branch of Spark. We strongly
>> recommend all 3.3 users to upgrade to this stable release.
>>
>> To download Spark 3.3.1, head over to the download page:
>> https://spark.apache.org/downloads.html
>>
>> To view the release notes:
>> https://spark.apache.org/releases/spark-release-3-3-1.html
>>
>> We would like to acknowledge all community members for contributing to this
>> release. This release would not have been possible without you.
>>
>>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: [ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread Dongjoon Hyun
It's great. Thank you so much, Yuming!

Dongjoon

On Tue, Oct 25, 2022 at 11:23 PM Yuming Wang  wrote:

> We are happy to announce the availability of Apache Spark 3.3.1!
>
> Spark 3.3.1 is a maintenance release containing stability fixes. This
> release is based on the branch-3.3 maintenance branch of Spark. We strongly
> recommend all 3.3 users to upgrade to this stable release.
>
> To download Spark 3.3.1, head over to the download page:
> https://spark.apache.org/downloads.html
>
> To view the release notes:
> https://spark.apache.org/releases/spark-release-3-3-1.html
>
> We would like to acknowledge all community members for contributing to this
> release. This release would not have been possible without you.
>
>
>


[ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread Yuming Wang
We are happy to announce the availability of Apache Spark 3.3.1!

Spark 3.3.1 is a maintenance release containing stability fixes. This
release is based on the branch-3.3 maintenance branch of Spark. We strongly
recommend all 3.3 users to upgrade to this stable release.

To download Spark 3.3.1, head over to the download page:
https://spark.apache.org/downloads.html

To view the release notes:
https://spark.apache.org/releases/spark-release-3-3-1.html

We would like to acknowledge all community members for contributing to this
release. This release would not have been possible without you.