Re: Choosing architecture for on-premise Spark & HDFS on Kubernetes cluster

2021-11-25 Thread JHI Star
Thanks, I'll have a closer look at GKE and compare it with what some other
sites running similar to use have used (Openstack).

Well, no, I don't envisage any public cloud integration. There is no plan
to use Hive just PySpark using HDFS !

On Wed, Nov 24, 2021 at 10:31 AM Mich Talebzadeh 
wrote:

> Just to clarify it should say  The current Spark Kubernetes model ...
>
>
> You will also need to build or get the Spark docker image that you are
> going to use in k8s clusters based on spark version, java version, scala
> version, OS and so forth. Are you going to use Hive as your main storage?
>
>
> HTH
>
>
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 23 Nov 2021 at 19:39, Mich Talebzadeh 
> wrote:
>
>> OK  to your point below
>>
>> "... We are going to deploy 20 physical Linux servers for use as an
>> on-premise Spark & HDFS on Kubernetes cluster..
>>
>>  Kubernetes is really a cloud-native technology. However, the
>> cloud-native concept does not exclude the use of on-premises infrastructure
>> in cases where it makes sense. So the question is are you going to use a
>> mesh structure to integrate these microservices together, including
>> on-premise and in cloud?
>> Now you have 20 tin boxes on-prem that you want to deploy for
>> building your Spark & HDFS stack on top of them. You will gain benefit from
>> Kubernetes and your microservices by simplifying the deployment by
>> decoupling the dependencies and abstracting your infra-structure away with
>> the ability to port these infrastructures. As you have your hardware
>> (your Linux servers),running k8s on bare metal will give you native
>> hardware performance. However, with 20 linux servers, you may limit your
>> scalability (your number of k8s nodes). If you go this way, you will need
>> to invest in a bare metal automation platform such as platform9
>> <https://platform9.com/bare-metal/> . The likelihood is that  you may
>> decide to move to the public cloud at some point or integrate with the
>> public cloud. My advice would be to look at something like GKE on-prem
>> <https://cloud.google.com/anthos/clusters/docs/on-prem/1.3/overview>
>>
>>
>> Back to Spark, The current Kubernetes model works on the basis of the 
>> "one-container-per-Pod"
>> model  <https://kubernetes.io/docs/concepts/workloads/pods/> meaning
>> that for each node of the cluster you will have one node running the driver
>> and each remaining node running one executor each. My question would be
>> will you be integrating with public cloud (AWS, GCP etc) at some point? In
>> that case you should look at mesh technologies like Istio
>> <https://cloud.google.com/learn/what-is-istio>
>>
>>
>> HTH
>>
>>
>>
>>view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 23 Nov 2021 at 14:09, JHI Star  wrote:
>>
>>> We are going to deploy 20 physical Linux servers for use as an
>>> on-premise Spark & HDFS on Kubernetes cluster. My question is: within this
>>> architecture, is it best to have the pods run directly on bare metal or
>>> under VMs or system containers like LXC and/or under an on-premise instance
>>> of something like OpenStack - or something else altogether ?
>>>
>>> I am looking to garner any experience around this question relating
>>> directly to the specific use case of Spark & HDFS on Kuberenetes - I know
>>> there are also general points to consider regardless of the use case.
>>>
>>


Choosing architecture for on-premise Spark & HDFS on Kubernetes cluster

2021-11-23 Thread JHI Star
We are going to deploy 20 physical Linux servers for use as an on-premise
Spark & HDFS on Kubernetes cluster. My question is: within this
architecture, is it best to have the pods run directly on bare metal or
under VMs or system containers like LXC and/or under an on-premise instance
of something like OpenStack - or something else altogether ?

I am looking to garner any experience around this question relating
directly to the specific use case of Spark & HDFS on Kuberenetes - I know
there are also general points to consider regardless of the use case.


Re: [ANNOUNCE] Announcing Apache Spark 3.0.0-preview2

2019-12-24 Thread Star

Awesome work. Thanks and happy holidays~!


On 2019-12-25 04:52, Yuming Wang wrote:

Hi all,

To enable wide-scale community testing of the upcoming Spark 3.0
release, the Apache Spark community has posted a new preview release
of Spark 3.0. This preview is not a stable release in terms of either
API or functionality, but it is meant to give the community early
access to try the code that will become Spark 3.0. If you would like
to test the release, please download it, and send feedback using
either the mailing lists [1] or JIRA [2].

There are a lot of exciting new features added to Spark 3.0, including
Dynamic Partition Pruning, Adaptive Query Execution, Accelerator-aware
Scheduling, Data Source API with Catalog Supports, Vectorization in
SparkR, support of Hadoop 3/JDK 11/Scala 2.12, and many more. For a
full list of major features and changes in Spark 3.0.0-preview2,
please check the
thread(http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-3-0-preview-release-feature-list-and-major-changes-td28050.html
and
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-3-0-preview-release-2-td28491.html).

We'd like to thank our contributors and users for their contributions
and early feedback to this release. This release would not have been
possible without you.

To download Spark 3.0.0-preview2, head over to the download page:
https://archive.apache.org/dist/spark/spark-3.0.0-preview2

Happy Holidays.

Yuming

Links:
--
[1] https://spark.apache.org/community.html
[2] 
https://issues.apache.org/jira/projects/SPARK?selectedItem=com.atlassian.jira.jira-projects-plugin%3Asummary-page


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



How to learn Spark ?

2015-04-02 Thread Star Guo
Hi, all

 

I am new to here. Could you give me some suggestion to learn Spark ? Thanks.

 

Best Regards,

Star Guo



Re: How to learn Spark ?

2015-04-02 Thread Star Guo
Thank you ! I Begin with it.

 

Best Regards,

Star Guo

 



 

I have a self-study workshop here:

 

https://github.com/deanwampler/spark-workshop

 

dean




Dean Wampler, Ph.D.

Author: Programming Scala, 2nd Edition 
http://shop.oreilly.com/product/0636920033073.do  (O'Reilly)

Typesafe http://typesafe.com 
@deanwampler http://twitter.com/deanwampler 

http://polyglotprogramming.com

 

On Thu, Apr 2, 2015 at 8:33 AM, Vadim Bichutskiy vadim.bichuts...@gmail.com 
wrote:

You can start with http://spark.apache.org/docs/1.3.0/index.html

 

Also get the Learning Spark book http://amzn.to/1NDFI5x. It's great.

 

Enjoy!

 

Vadim

  
https://mailfoogae.appspot.com/t?sender=admFkaW0uYmljaHV0c2tpeUBnbWFpbC5jb20%3Dtype=zerocontentguid=25ae00bb-d455-45e8-994c-b0e83ee8f68c
 ᐧ

  
http://t.signauxtrois.com/e1t/o/5/f18dQhb0S7ks8dDMPbW2n0x6l2B9gXrN7sKj6v5dsrxW7gbZX-8q-6ZdVdnPvF2zlZNzW3hF9wD1k1H6H0?si=5533377798602752pi=9f8cc75d-3c1b-4f69-ef56-1f207f8f09f1
 

 

On Thu, Apr 2, 2015 at 4:19 AM, Star Guo st...@ceph.me wrote:

Hi, all

 

I am new to here. Could you give me some suggestion to learn Spark ? Thanks.

 

Best Regards,

Star Guo

 

 



Re How to learn Spark ?

2015-04-02 Thread Star Guo
So cool !! Thanks.

Best Regards,
Star Guo

=

You can also refer this blog http://blog.prabeeshk.com/blog/archives/

On 2 April 2015 at 12:19, Star Guo st...@ceph.me wrote:
Hi, all
 
I am new to here. Could you give me some suggestion to learn Spark ? Thanks.
 
Best Regards,
Star Guo



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Re:How to learn Spark ?

2015-04-02 Thread Star Guo
Thanks a lot. Follow you suggestion .

Best Regards,
Star Guo

=

The best way of learning spark is to use spark
you may follow the instruction of apache spark 
website.http://spark.apache.org/docs/latest/
 
download-deploy it in standalone mode-run some examples-try cluster deploy 
mode- then try to develop your own app and deploy it in your spark cluster.
 
and it's better to learn scala well if you wanna dive into spark.
Also there are some books about spark.


 
Thanksamp;Best regards!
San.Luo

- 原始邮件 -
发件人:Star Guo st...@ceph.me
收件人:user@spark.apache.org
主题:How to learn Spark ?
日期:2015年04月02日 16点19分

Hi, all
 
I am new to here. Could you give me some suggestion to learn Spark ? Thanks.
 
Best Regards,
Star Guo


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to learn Spark ?

2015-04-02 Thread Star Guo
Yes, I just search for it !

Best Regards,
Star Guo

==

You can start with http://spark.apache.org/docs/1.3.0/index.html

Also get the Learning Spark book http://amzn.to/1NDFI5x. It's great.

Enjoy!

Vadim
ᐧ


On Thu, Apr 2, 2015 at 4:19 AM, Star Guo st...@ceph.me wrote:
Hi, all
 
I am new to here. Could you give me some suggestion to learn Spark ? Thanks.
 
Best Regards,
Star Guo



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org