Re: One click to run Spark on Kubernetes

2022-02-22 Thread Mich Talebzadeh
Hi,

There are two distinct actions here; namely Deploy and Run.

Deployment can be done by command line script with autoscaling. In the
newer versions of Kubernnetes you don't even need to specify the node
types, you can leave it to the Kubernetes cluster  to scale up and down and
decide on node type.

The second point is the running spark that you will need to submit.
However, that depends on setting up access permission, use of service
accounts, pulling the correct dockerfiles for the driver and the executors.
Those details add to the complexity.

Thanks



   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 23 Feb 2022 at 04:06, bo yang  wrote:

> Hi Spark Community,
>
> We built an open source tool to deploy and run Spark on Kubernetes with a
> one click command. For example, on AWS, it could automatically create an
> EKS cluster, node group, NGINX ingress, and Spark Operator. Then you will
> be able to use curl or a CLI tool to submit Spark application. After the
> deployment, you could also install Uber Remote Shuffle Service to enable
> Dynamic Allocation on Kuberentes.
>
> Anyone interested in using or working together on such a tool?
>
> Thanks,
> Bo
>
>


Re: One click to run Spark on Kubernetes

2022-02-22 Thread bo yang
Merging another email from Prasad. It could co-exist with livy. Livy is
similar like the REST Service + Spark Operator. Unfortunately Livy is not
very active right now.

To Amihay, the link is: https://github.com/datapunchorg/punch.

On Tue, Feb 22, 2022 at 8:53 PM amihay gonen  wrote:

> Can you share link to the source?
>
> בתאריך יום ד׳, 23 בפבר׳ 2022, 6:52, מאת bo yang ‏:
>
>> We do not have SaaS yet. Now it is an open source project we build in our
>> part time , and we welcome more people working together on that.
>>
>> You could specify cluster size (EC2 instance type and number of
>> instances) and run it for 1 hour. Then you could run one click command to
>> destroy the cluster. It is possible to merge these steps as well, and
>> provide a "serverless" experience. That is in our TODO list :)
>>
>>
>> On Tue, Feb 22, 2022 at 8:36 PM Bitfox  wrote:
>>
>>> How can I specify the cluster memory and cores?
>>> For instance, I want to run a job with 16 cores and 300 GB memory for
>>> about 1 hour. Do you have the SaaS solution for this? I can pay as I did.
>>>
>>> Thanks
>>>
>>> On Wed, Feb 23, 2022 at 12:21 PM bo yang  wrote:
>>>
 It is not a standalone spark cluster. In some details, it deploys a
 Spark Operator (
 https://github.com/GoogleCloudPlatform/spark-on-k8s-operator) and an
 extra REST Service. When people submit Spark application to that REST
 Service, the REST Service will create a CRD inside the Kubernetes cluster.
 Then Spark Operator will pick up the CRD and launch the Spark application.
 The one click tool intends to hide these details, so people could just
 submit Spark and do not need to deal with too many deployment details.

 On Tue, Feb 22, 2022 at 8:09 PM Bitfox  wrote:

> Can it be a cluster installation of spark? or just the standalone node?
>
> Thanks
>
> On Wed, Feb 23, 2022 at 12:06 PM bo yang  wrote:
>
>> Hi Spark Community,
>>
>> We built an open source tool to deploy and run Spark on Kubernetes
>> with a one click command. For example, on AWS, it could automatically
>> create an EKS cluster, node group, NGINX ingress, and Spark Operator. 
>> Then
>> you will be able to use curl or a CLI tool to submit Spark application.
>> After the deployment, you could also install Uber Remote Shuffle Service 
>> to
>> enable Dynamic Allocation on Kuberentes.
>>
>> Anyone interested in using or working together on such a tool?
>>
>> Thanks,
>> Bo
>>
>>


Re: One click to run Spark on Kubernetes

2022-02-22 Thread bo yang
We do not have SaaS yet. Now it is an open source project we build in our
part time , and we welcome more people working together on that.

You could specify cluster size (EC2 instance type and number of instances)
and run it for 1 hour. Then you could run one click command to destroy the
cluster. It is possible to merge these steps as well, and provide a
"serverless" experience. That is in our TODO list :)


On Tue, Feb 22, 2022 at 8:36 PM Bitfox  wrote:

> How can I specify the cluster memory and cores?
> For instance, I want to run a job with 16 cores and 300 GB memory for
> about 1 hour. Do you have the SaaS solution for this? I can pay as I did.
>
> Thanks
>
> On Wed, Feb 23, 2022 at 12:21 PM bo yang  wrote:
>
>> It is not a standalone spark cluster. In some details, it deploys a Spark
>> Operator (https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)
>> and an extra REST Service. When people submit Spark application to that
>> REST Service, the REST Service will create a CRD inside the
>> Kubernetes cluster. Then Spark Operator will pick up the CRD and launch the
>> Spark application. The one click tool intends to hide these details, so
>> people could just submit Spark and do not need to deal with too many
>> deployment details.
>>
>> On Tue, Feb 22, 2022 at 8:09 PM Bitfox  wrote:
>>
>>> Can it be a cluster installation of spark? or just the standalone node?
>>>
>>> Thanks
>>>
>>> On Wed, Feb 23, 2022 at 12:06 PM bo yang  wrote:
>>>
 Hi Spark Community,

 We built an open source tool to deploy and run Spark on Kubernetes with
 a one click command. For example, on AWS, it could automatically create an
 EKS cluster, node group, NGINX ingress, and Spark Operator. Then you will
 be able to use curl or a CLI tool to submit Spark application. After the
 deployment, you could also install Uber Remote Shuffle Service to enable
 Dynamic Allocation on Kuberentes.

 Anyone interested in using or working together on such a tool?

 Thanks,
 Bo




Re: One click to run Spark on Kubernetes

2022-02-22 Thread Prasad Paravatha
Hi Bo Yang,
Would it be something along the lines of Apache livy?

Thanks,
Prasad


On Tue, Feb 22, 2022 at 10:22 PM bo yang  wrote:

> It is not a standalone spark cluster. In some details, it deploys a Spark
> Operator (https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)
> and an extra REST Service. When people submit Spark application to that
> REST Service, the REST Service will create a CRD inside the
> Kubernetes cluster. Then Spark Operator will pick up the CRD and launch the
> Spark application. The one click tool intends to hide these details, so
> people could just submit Spark and do not need to deal with too many
> deployment details.
>
> On Tue, Feb 22, 2022 at 8:09 PM Bitfox  wrote:
>
>> Can it be a cluster installation of spark? or just the standalone node?
>>
>> Thanks
>>
>> On Wed, Feb 23, 2022 at 12:06 PM bo yang  wrote:
>>
>>> Hi Spark Community,
>>>
>>> We built an open source tool to deploy and run Spark on Kubernetes with
>>> a one click command. For example, on AWS, it could automatically create an
>>> EKS cluster, node group, NGINX ingress, and Spark Operator. Then you will
>>> be able to use curl or a CLI tool to submit Spark application. After the
>>> deployment, you could also install Uber Remote Shuffle Service to enable
>>> Dynamic Allocation on Kuberentes.
>>>
>>> Anyone interested in using or working together on such a tool?
>>>
>>> Thanks,
>>> Bo
>>>
>>>

-- 
Regards,
Prasad Paravatha


Re: One click to run Spark on Kubernetes

2022-02-22 Thread bo yang
It is not a standalone spark cluster. In some details, it deploys a Spark
Operator (https://github.com/GoogleCloudPlatform/spark-on-k8s-operator) and
an extra REST Service. When people submit Spark application to that REST
Service, the REST Service will create a CRD inside the Kubernetes cluster.
Then Spark Operator will pick up the CRD and launch the Spark application.
The one click tool intends to hide these details, so people could just
submit Spark and do not need to deal with too many deployment details.

On Tue, Feb 22, 2022 at 8:09 PM Bitfox  wrote:

> Can it be a cluster installation of spark? or just the standalone node?
>
> Thanks
>
> On Wed, Feb 23, 2022 at 12:06 PM bo yang  wrote:
>
>> Hi Spark Community,
>>
>> We built an open source tool to deploy and run Spark on Kubernetes with a
>> one click command. For example, on AWS, it could automatically create an
>> EKS cluster, node group, NGINX ingress, and Spark Operator. Then you will
>> be able to use curl or a CLI tool to submit Spark application. After the
>> deployment, you could also install Uber Remote Shuffle Service to enable
>> Dynamic Allocation on Kuberentes.
>>
>> Anyone interested in using or working together on such a tool?
>>
>> Thanks,
>> Bo
>>
>>


One click to run Spark on Kubernetes

2022-02-22 Thread bo yang
Hi Spark Community,

We built an open source tool to deploy and run Spark on Kubernetes with a
one click command. For example, on AWS, it could automatically create an
EKS cluster, node group, NGINX ingress, and Spark Operator. Then you will
be able to use curl or a CLI tool to submit Spark application. After the
deployment, you could also install Uber Remote Shuffle Service to enable
Dynamic Allocation on Kuberentes.

Anyone interested in using or working together on such a tool?

Thanks,
Bo


Re: [ANNOUNCE] Apache Spark 3.1.3 released + Docker images

2022-02-22 Thread Denis Bolshakov
I understand that, and we do so, but news about official images were
breaking me 

Ok, I will follow you on those activities.


Thanks for the quick response.

On Tue, 22 Feb 2022 at 22:03, Holden Karau  wrote:

> So your more than welcome to still build your own Spark docker containers
> with the docker image tool, these are provided to make it easier for folks
> without specific needs. In the future well hopefully have published Spark
> containers tagged for different JDKs but that work has not yet been done.
>
> On Tue, Feb 22, 2022 at 10:51 AM Denis Bolshakov <
> bolshakov.de...@gmail.com> wrote:
>
>> Hello Holden,
>>
>> Could you please provide more details and plan for docker images support?
>>
>> So far I see that there are only two tags, I get from them spark version,
>> but there is no information about java, hadoop, scala versions.
>>
>> Also there is no description on docker hub, probably it would be nice to
>> put a link to Docker files in github repository.
>>
>> What directories are expected to be mounted and ports forwarded? How can
>> I mount the krb5.conf file and directory where my kerberos ticket is
>> located?
>>
>> I've pulled the docker image with tag spark 3.2.1 and I see that there is
>> java 11 and hadoop 3.3, but our environment requires us to have other
>> versions.
>>
>> On Tue, 22 Feb 2022 at 16:29, Mich Talebzadeh 
>> wrote:
>>
>>> Well that is just a recommendation.
>>>
>>> The onus is on me the user to download and go through dev and test
>>> running suite of batch jobs to ensure that all work ok, especially on the
>>> edge, sign the release off and roll it in out into production. It won’t be
>>> prudent otherwise.
>>>
>>> HHH
>>>
>>> On Tue, 22 Feb 2022 at 12:12, Bjørn Jørgensen 
>>> wrote:
>>>
 "Spark 3.1.3 is a maintenance release containing stability fixes. This
 release is based on the branch-3.1 maintenance branch of Spark. We strongly
 recommend all 3.1.3 users to upgrade to this stable release."
 https://spark.apache.org/releases/spark-release-3-1-3.html

 Do we have another 3.13 or do we strongly recommend all 3.1.2 users to
 upgrade to this stable release ?

 tir. 22. feb. 2022 kl. 09:50 skrev angers zhu :

> Hi,  seems
>
>- [SPARK-35391] :
>Memory leak in ExecutorAllocationListener breaks dynamic allocation 
> under
>high load
>
> Links to wrong jira ticket?
>
> Mich Talebzadeh  于2022年2月22日周二 15:49写道:
>
>> Well, that is pretty easy to do.
>>
>> However, a quick fix for now could be to retag the image created. It
>> is a small volume which can be done manually for now. For example, I just
>> downloaded v3.1.3
>>
>>
>> docker image ls
>>
>> REPOSITORY TAG
>> IMAGE ID   CREATEDSIZE
>>
>> apache/spark   v3.1.3
>>  31ed15daa2bf   12 hours ago   531MB
>>
>> Retag it with
>>
>>
>> docker tag 31ed15daa2bf
>> apache/spark/tags/spark-3.1.3-scala_2.12-8-jre-slim-buster
>>
>> docker image ls
>>
>> REPOSITORY   TAG
>>   IMAGE ID   CREATED
>> SIZE
>>
>> apache/spark/tags/spark-3.1.3-scala_2.12-8-jre-slim-buster   latest
>>31ed15daa2bf   12 hours 
>> ago
>>  531MB
>>
>> Then push it with (example)
>>
>> docker push apache/spark/tags/spark-3.1.3-scala_2.12-8-jre-slim-buster
>>
>>
>> HTH
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>> for any loss, damage or destruction of data or any other property which 
>> may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Mon, 21 Feb 2022 at 23:51, Holden Karau 
>> wrote:
>>
>>> Yeah I think we should still adopt that naming convention, however
>>> no one has taken the time submit write a script to do it yet so until we
>>> get that script merged I think we'll just have one build. I can try and 
>>> do
>>> that for the next release but it would be a great 2nd issue for someone
>>> getting more familiar with the release tooling.
>>>
>>> On Mon, Feb 21, 2022 at 2:18 PM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 Ok thanks for the correction.

 The docker pull line shows as 

Re: [ANNOUNCE] Apache Spark 3.1.3 released + Docker images

2022-02-22 Thread Holden Karau
So your more than welcome to still build your own Spark docker containers
with the docker image tool, these are provided to make it easier for folks
without specific needs. In the future well hopefully have published Spark
containers tagged for different JDKs but that work has not yet been done.

On Tue, Feb 22, 2022 at 10:51 AM Denis Bolshakov 
wrote:

> Hello Holden,
>
> Could you please provide more details and plan for docker images support?
>
> So far I see that there are only two tags, I get from them spark version,
> but there is no information about java, hadoop, scala versions.
>
> Also there is no description on docker hub, probably it would be nice to
> put a link to Docker files in github repository.
>
> What directories are expected to be mounted and ports forwarded? How can I
> mount the krb5.conf file and directory where my kerberos ticket is located?
>
> I've pulled the docker image with tag spark 3.2.1 and I see that there is
> java 11 and hadoop 3.3, but our environment requires us to have other
> versions.
>
> On Tue, 22 Feb 2022 at 16:29, Mich Talebzadeh 
> wrote:
>
>> Well that is just a recommendation.
>>
>> The onus is on me the user to download and go through dev and test
>> running suite of batch jobs to ensure that all work ok, especially on the
>> edge, sign the release off and roll it in out into production. It won’t be
>> prudent otherwise.
>>
>> HHH
>>
>> On Tue, 22 Feb 2022 at 12:12, Bjørn Jørgensen 
>> wrote:
>>
>>> "Spark 3.1.3 is a maintenance release containing stability fixes. This
>>> release is based on the branch-3.1 maintenance branch of Spark. We strongly
>>> recommend all 3.1.3 users to upgrade to this stable release."
>>> https://spark.apache.org/releases/spark-release-3-1-3.html
>>>
>>> Do we have another 3.13 or do we strongly recommend all 3.1.2 users to
>>> upgrade to this stable release ?
>>>
>>> tir. 22. feb. 2022 kl. 09:50 skrev angers zhu :
>>>
 Hi,  seems

- [SPARK-35391] :
Memory leak in ExecutorAllocationListener breaks dynamic allocation 
 under
high load

 Links to wrong jira ticket?

 Mich Talebzadeh  于2022年2月22日周二 15:49写道:

> Well, that is pretty easy to do.
>
> However, a quick fix for now could be to retag the image created. It
> is a small volume which can be done manually for now. For example, I just
> downloaded v3.1.3
>
>
> docker image ls
>
> REPOSITORY TAG
> IMAGE ID   CREATEDSIZE
>
> apache/spark   v3.1.3
>31ed15daa2bf   12 hours ago   531MB
>
> Retag it with
>
>
> docker tag 31ed15daa2bf
> apache/spark/tags/spark-3.1.3-scala_2.12-8-jre-slim-buster
>
> docker image ls
>
> REPOSITORY   TAG
>   IMAGE ID   CREATED
> SIZE
>
> apache/spark/tags/spark-3.1.3-scala_2.12-8-jre-slim-buster   latest
>  31ed15daa2bf   12 hours ago
>  531MB
>
> Then push it with (example)
>
> docker push apache/spark/tags/spark-3.1.3-scala_2.12-8-jre-slim-buster
>
>
> HTH
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any loss, damage or destruction of data or any other property which may
> arise from relying on this email's technical content is explicitly
> disclaimed. The author will in no case be liable for any monetary damages
> arising from such loss, damage or destruction.
>
>
>
>
> On Mon, 21 Feb 2022 at 23:51, Holden Karau 
> wrote:
>
>> Yeah I think we should still adopt that naming convention, however no
>> one has taken the time submit write a script to do it yet so until we get
>> that script merged I think we'll just have one build. I can try and do 
>> that
>> for the next release but it would be a great 2nd issue for someone 
>> getting
>> more familiar with the release tooling.
>>
>> On Mon, Feb 21, 2022 at 2:18 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Ok thanks for the correction.
>>>
>>> The docker pull line shows as follows:
>>>
>>> docker pull apache/spark:v3.2.1
>>>
>>>
>>> So this only tells me the version of Spark 3.2.1
>>>
>>>
>>> I thought we discussed deciding on the docker naming conventions in
>>> detail, and broadly agreed on what needs to be in the naming convention.
>>> For example, in this thread:
>>>
>>>
>>> Time to start publishing Spark Docker 

Re: [ANNOUNCE] Apache Spark 3.1.3 released + Docker images

2022-02-22 Thread Denis Bolshakov
Hello Holden,

Could you please provide more details and plan for docker images support?

So far I see that there are only two tags, I get from them spark version,
but there is no information about java, hadoop, scala versions.

Also there is no description on docker hub, probably it would be nice to
put a link to Docker files in github repository.

What directories are expected to be mounted and ports forwarded? How can I
mount the krb5.conf file and directory where my kerberos ticket is located?

I've pulled the docker image with tag spark 3.2.1 and I see that there is
java 11 and hadoop 3.3, but our environment requires us to have other
versions.

On Tue, 22 Feb 2022 at 16:29, Mich Talebzadeh 
wrote:

> Well that is just a recommendation.
>
> The onus is on me the user to download and go through dev and test running
> suite of batch jobs to ensure that all work ok, especially on the edge,
> sign the release off and roll it in out into production. It won’t be
> prudent otherwise.
>
> HHH
>
> On Tue, 22 Feb 2022 at 12:12, Bjørn Jørgensen 
> wrote:
>
>> "Spark 3.1.3 is a maintenance release containing stability fixes. This
>> release is based on the branch-3.1 maintenance branch of Spark. We strongly
>> recommend all 3.1.3 users to upgrade to this stable release."
>> https://spark.apache.org/releases/spark-release-3-1-3.html
>>
>> Do we have another 3.13 or do we strongly recommend all 3.1.2 users to
>> upgrade to this stable release ?
>>
>> tir. 22. feb. 2022 kl. 09:50 skrev angers zhu :
>>
>>> Hi,  seems
>>>
>>>- [SPARK-35391] :
>>>Memory leak in ExecutorAllocationListener breaks dynamic allocation under
>>>high load
>>>
>>> Links to wrong jira ticket?
>>>
>>> Mich Talebzadeh  于2022年2月22日周二 15:49写道:
>>>
 Well, that is pretty easy to do.

 However, a quick fix for now could be to retag the image created. It is
 a small volume which can be done manually for now. For example, I just
 downloaded v3.1.3


 docker image ls

 REPOSITORY TAG
   IMAGE ID   CREATEDSIZE

 apache/spark   v3.1.3
31ed15daa2bf   12 hours ago   531MB

 Retag it with


 docker tag 31ed15daa2bf
 apache/spark/tags/spark-3.1.3-scala_2.12-8-jre-slim-buster

 docker image ls

 REPOSITORY   TAG
 IMAGE ID   CREATED
 SIZE

 apache/spark/tags/spark-3.1.3-scala_2.12-8-jre-slim-buster   latest
  31ed15daa2bf   12 hours ago
  531MB

 Then push it with (example)

 docker push apache/spark/tags/spark-3.1.3-scala_2.12-8-jre-slim-buster


 HTH


view my Linkedin profile
 


  https://en.everybodywiki.com/Mich_Talebzadeh



 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




 On Mon, 21 Feb 2022 at 23:51, Holden Karau 
 wrote:

> Yeah I think we should still adopt that naming convention, however no
> one has taken the time submit write a script to do it yet so until we get
> that script merged I think we'll just have one build. I can try and do 
> that
> for the next release but it would be a great 2nd issue for someone getting
> more familiar with the release tooling.
>
> On Mon, Feb 21, 2022 at 2:18 PM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Ok thanks for the correction.
>>
>> The docker pull line shows as follows:
>>
>> docker pull apache/spark:v3.2.1
>>
>>
>> So this only tells me the version of Spark 3.2.1
>>
>>
>> I thought we discussed deciding on the docker naming conventions in
>> detail, and broadly agreed on what needs to be in the naming convention.
>> For example, in this thread:
>>
>>
>> Time to start publishing Spark Docker Images? -
>> mich.talebza...@gmail.com - Gmail (google.com)
>> 
>>  dated
>> 22nd July 2021
>>
>>
>> Referring to that, I think the broad agreement was that the docker
>> image name should be of the form:
>>
>>
>> The name of the file provides:
>>
>>- Built for spark or spark-py (PySpark) spark-r
>>- Spark version: 3.1.1, 3.1.2, 3.2.1 etc.
>>- Scala 

Re: [ANNOUNCE] Apache Spark 3.1.3 released + Docker images

2022-02-22 Thread Mich Talebzadeh
Well that is just a recommendation.

The onus is on me the user to download and go through dev and test running
suite of batch jobs to ensure that all work ok, especially on the edge,
sign the release off and roll it in out into production. It won’t be
prudent otherwise.

HHH

On Tue, 22 Feb 2022 at 12:12, Bjørn Jørgensen 
wrote:

> "Spark 3.1.3 is a maintenance release containing stability fixes. This
> release is based on the branch-3.1 maintenance branch of Spark. We strongly
> recommend all 3.1.3 users to upgrade to this stable release."
> https://spark.apache.org/releases/spark-release-3-1-3.html
>
> Do we have another 3.13 or do we strongly recommend all 3.1.2 users to
> upgrade to this stable release ?
>
> tir. 22. feb. 2022 kl. 09:50 skrev angers zhu :
>
>> Hi,  seems
>>
>>- [SPARK-35391] :
>>Memory leak in ExecutorAllocationListener breaks dynamic allocation under
>>high load
>>
>> Links to wrong jira ticket?
>>
>> Mich Talebzadeh  于2022年2月22日周二 15:49写道:
>>
>>> Well, that is pretty easy to do.
>>>
>>> However, a quick fix for now could be to retag the image created. It is
>>> a small volume which can be done manually for now. For example, I just
>>> downloaded v3.1.3
>>>
>>>
>>> docker image ls
>>>
>>> REPOSITORY TAG
>>>   IMAGE ID   CREATEDSIZE
>>>
>>> apache/spark   v3.1.3
>>>  31ed15daa2bf   12 hours ago   531MB
>>>
>>> Retag it with
>>>
>>>
>>> docker tag 31ed15daa2bf
>>> apache/spark/tags/spark-3.1.3-scala_2.12-8-jre-slim-buster
>>>
>>> docker image ls
>>>
>>> REPOSITORY   TAG
>>> IMAGE ID   CREATED
>>> SIZE
>>>
>>> apache/spark/tags/spark-3.1.3-scala_2.12-8-jre-slim-buster   latest
>>>31ed15daa2bf   12 hours ago
>>>  531MB
>>>
>>> Then push it with (example)
>>>
>>> docker push apache/spark/tags/spark-3.1.3-scala_2.12-8-jre-slim-buster
>>>
>>>
>>> HTH
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Mon, 21 Feb 2022 at 23:51, Holden Karau  wrote:
>>>
 Yeah I think we should still adopt that naming convention, however no
 one has taken the time submit write a script to do it yet so until we get
 that script merged I think we'll just have one build. I can try and do that
 for the next release but it would be a great 2nd issue for someone getting
 more familiar with the release tooling.

 On Mon, Feb 21, 2022 at 2:18 PM Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

> Ok thanks for the correction.
>
> The docker pull line shows as follows:
>
> docker pull apache/spark:v3.2.1
>
>
> So this only tells me the version of Spark 3.2.1
>
>
> I thought we discussed deciding on the docker naming conventions in
> detail, and broadly agreed on what needs to be in the naming convention.
> For example, in this thread:
>
>
> Time to start publishing Spark Docker Images? -
> mich.talebza...@gmail.com - Gmail (google.com)
> 
>  dated
> 22nd July 2021
>
>
> Referring to that, I think the broad agreement was that the docker
> image name should be of the form:
>
>
> The name of the file provides:
>
>- Built for spark or spark-py (PySpark) spark-r
>- Spark version: 3.1.1, 3.1.2, 3.2.1 etc.
>- Scala version; 2.1.2
>- The OS version based on JAVA: 8-jre-slim-buster,
>11-jre-slim-buster meaning JAVA 8 and JAVA 11 respectively
>
> I believe it is a good thing and we ought to adopt that convention.
> For example:
>
>
> spark-py-3.2.1-scala_2.12-11-jre-slim-buster
>
>
> HTH
>
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any loss, damage or destruction of data or any other property which may
> arise from relying on this email's technical content is explicitly
> disclaimed. The author will in no case be liable for any monetary damages
> arising from such loss, damage or 

Re: [ANNOUNCE] Apache Spark 3.1.3 released + Docker images

2022-02-22 Thread Bjørn Jørgensen
"Spark 3.1.3 is a maintenance release containing stability fixes. This
release is based on the branch-3.1 maintenance branch of Spark. We strongly
recommend all 3.1.3 users to upgrade to this stable release."
https://spark.apache.org/releases/spark-release-3-1-3.html

Do we have another 3.13 or do we strongly recommend all 3.1.2 users to
upgrade to this stable release ?

tir. 22. feb. 2022 kl. 09:50 skrev angers zhu :

> Hi,  seems
>
>- [SPARK-35391] :
>Memory leak in ExecutorAllocationListener breaks dynamic allocation under
>high load
>
> Links to wrong jira ticket?
>
> Mich Talebzadeh  于2022年2月22日周二 15:49写道:
>
>> Well, that is pretty easy to do.
>>
>> However, a quick fix for now could be to retag the image created. It is a
>> small volume which can be done manually for now. For example, I just
>> downloaded v3.1.3
>>
>>
>> docker image ls
>>
>> REPOSITORY TAG
>> IMAGE ID   CREATEDSIZE
>>
>> apache/spark   v3.1.3
>>  31ed15daa2bf   12 hours ago   531MB
>>
>> Retag it with
>>
>>
>> docker tag 31ed15daa2bf
>> apache/spark/tags/spark-3.1.3-scala_2.12-8-jre-slim-buster
>>
>> docker image ls
>>
>> REPOSITORY   TAG
>>   IMAGE ID   CREATEDSIZE
>>
>> apache/spark/tags/spark-3.1.3-scala_2.12-8-jre-slim-buster   latest
>>31ed15daa2bf   12 hours ago
>>  531MB
>>
>> Then push it with (example)
>>
>> docker push apache/spark/tags/spark-3.1.3-scala_2.12-8-jre-slim-buster
>>
>>
>> HTH
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Mon, 21 Feb 2022 at 23:51, Holden Karau  wrote:
>>
>>> Yeah I think we should still adopt that naming convention, however no
>>> one has taken the time submit write a script to do it yet so until we get
>>> that script merged I think we'll just have one build. I can try and do that
>>> for the next release but it would be a great 2nd issue for someone getting
>>> more familiar with the release tooling.
>>>
>>> On Mon, Feb 21, 2022 at 2:18 PM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 Ok thanks for the correction.

 The docker pull line shows as follows:

 docker pull apache/spark:v3.2.1


 So this only tells me the version of Spark 3.2.1


 I thought we discussed deciding on the docker naming conventions in
 detail, and broadly agreed on what needs to be in the naming convention.
 For example, in this thread:


 Time to start publishing Spark Docker Images? -
 mich.talebza...@gmail.com - Gmail (google.com)
 
  dated
 22nd July 2021


 Referring to that, I think the broad agreement was that the docker
 image name should be of the form:


 The name of the file provides:

- Built for spark or spark-py (PySpark) spark-r
- Spark version: 3.1.1, 3.1.2, 3.2.1 etc.
- Scala version; 2.1.2
- The OS version based on JAVA: 8-jre-slim-buster,
11-jre-slim-buster meaning JAVA 8 and JAVA 11 respectively

 I believe it is a good thing and we ought to adopt that convention. For
 example:


 spark-py-3.2.1-scala_2.12-11-jre-slim-buster


 HTH



view my Linkedin profile
 


  https://en.everybodywiki.com/Mich_Talebzadeh



 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




 On Mon, 21 Feb 2022 at 21:58, Holden Karau 
 wrote:

> My bad, the correct link is:
>
> https://hub.docker.com/r/apache/spark/tags
>
> On Mon, Feb 21, 2022 at 1:17 PM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> well that docker link is not found! may be permission issue
>>
>> [image: image.png]
>>
>>
>>
>>
>>view my Linkedin profile
>> 

Re: [ANNOUNCE] Apache Spark 3.1.3 released + Docker images

2022-02-22 Thread angers zhu
Hi,  seems

   - [SPARK-35391] :
   Memory leak in ExecutorAllocationListener breaks dynamic allocation under
   high load

Links to wrong jira ticket?

Mich Talebzadeh  于2022年2月22日周二 15:49写道:

> Well, that is pretty easy to do.
>
> However, a quick fix for now could be to retag the image created. It is a
> small volume which can be done manually for now. For example, I just
> downloaded v3.1.3
>
>
> docker image ls
>
> REPOSITORY TAG
> IMAGE ID   CREATEDSIZE
>
> apache/spark   v3.1.3
>31ed15daa2bf   12 hours ago   531MB
>
> Retag it with
>
>
> docker tag 31ed15daa2bf
> apache/spark/tags/spark-3.1.3-scala_2.12-8-jre-slim-buster
>
> docker image ls
>
> REPOSITORY   TAG
>   IMAGE ID   CREATEDSIZE
>
> apache/spark/tags/spark-3.1.3-scala_2.12-8-jre-slim-buster   latest
>  31ed15daa2bf   12 hours ago   531MB
>
> Then push it with (example)
>
> docker push apache/spark/tags/spark-3.1.3-scala_2.12-8-jre-slim-buster
>
>
> HTH
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Mon, 21 Feb 2022 at 23:51, Holden Karau  wrote:
>
>> Yeah I think we should still adopt that naming convention, however no one
>> has taken the time submit write a script to do it yet so until we get that
>> script merged I think we'll just have one build. I can try and do that for
>> the next release but it would be a great 2nd issue for someone getting more
>> familiar with the release tooling.
>>
>> On Mon, Feb 21, 2022 at 2:18 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Ok thanks for the correction.
>>>
>>> The docker pull line shows as follows:
>>>
>>> docker pull apache/spark:v3.2.1
>>>
>>>
>>> So this only tells me the version of Spark 3.2.1
>>>
>>>
>>> I thought we discussed deciding on the docker naming conventions in
>>> detail, and broadly agreed on what needs to be in the naming convention.
>>> For example, in this thread:
>>>
>>>
>>> Time to start publishing Spark Docker Images? -
>>> mich.talebza...@gmail.com - Gmail (google.com)
>>> 
>>>  dated
>>> 22nd July 2021
>>>
>>>
>>> Referring to that, I think the broad agreement was that the docker image
>>> name should be of the form:
>>>
>>>
>>> The name of the file provides:
>>>
>>>- Built for spark or spark-py (PySpark) spark-r
>>>- Spark version: 3.1.1, 3.1.2, 3.2.1 etc.
>>>- Scala version; 2.1.2
>>>- The OS version based on JAVA: 8-jre-slim-buster,
>>>11-jre-slim-buster meaning JAVA 8 and JAVA 11 respectively
>>>
>>> I believe it is a good thing and we ought to adopt that convention. For
>>> example:
>>>
>>>
>>> spark-py-3.2.1-scala_2.12-11-jre-slim-buster
>>>
>>>
>>> HTH
>>>
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Mon, 21 Feb 2022 at 21:58, Holden Karau  wrote:
>>>
 My bad, the correct link is:

 https://hub.docker.com/r/apache/spark/tags

 On Mon, Feb 21, 2022 at 1:17 PM Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

> well that docker link is not found! may be permission issue
>
> [image: image.png]
>
>
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any loss, damage or destruction of data or any other property which may
> arise from relying on this email's technical content is explicitly
> disclaimed. The author will in no case be liable for any monetary damages
> arising from such loss, damage or destruction.
>
>
>
>
> On Mon, 21 Feb 2022 at 21:09, Holden Karau 
> wrote:
>
>> We are happy