+1 I think it is in a good shape to move forward
On Wed, Jan 5, 2022 at 3:00 PM Bowen Li <b...@apache.org> wrote: > +1 for SPIP > > According our production experience, the default scheduler isn't meeting > prod requirements on K8S, and such effort of integrating with batch-native > schedulers makes running Spark natively on K8S much easier for users. > > Thanks, > Bowen > > On Wed, Jan 5, 2022 at 11:52 AM Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> +1 non-binding >> >> >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> On Wed, 5 Jan 2022 at 19:16, Holden Karau <hol...@pigscanfly.ca> wrote: >> >>> Do we want to move the SPIP forward to a vote? It seems like we're >>> mostly agreeing in principle? >>> >>> On Wed, Jan 5, 2022 at 11:12 AM Mich Talebzadeh < >>> mich.talebza...@gmail.com> wrote: >>> >>>> Hi Bo, >>>> >>>> Thanks for the info. Let me elaborate: >>>> >>>> In theory you can set the number of executors to multiple values of >>>> Nodes. For example if you have a three node k8s cluster (in my case Google >>>> GKE), you can set the number of executors to 6 and end up with six >>>> executors queuing to start but ultimately you finish with two running >>>> executors plus the driver in a 3 node cluster as shown below >>>> >>>> hduser@ctpvm: /home/hduser> k get pods -n spark >>>> >>>> NAME READY STATUS >>>> RESTARTS AGE >>>> >>>> *randomdatabigquery-d42d067e2b91c88a-exec-1 1/1 Running 0 >>>> 33s* >>>> >>>> *randomdatabigquery-d42d067e2b91c88a-exec-2 1/1 Running 0 >>>> 33s* >>>> >>>> randomdatabigquery-d42d067e2b91c88a-exec-3 0/1 Pending 0 >>>> 33s >>>> >>>> randomdatabigquery-d42d067e2b91c88a-exec-4 0/1 Pending 0 >>>> 33s >>>> >>>> randomdatabigquery-d42d067e2b91c88a-exec-5 0/1 Pending 0 >>>> 33s >>>> >>>> randomdatabigquery-d42d067e2b91c88a-exec-6 0/1 Pending 0 >>>> 33s >>>> >>>> *sparkbq-0beda77e2b919e01-driver 1/1 Running 0 >>>> 45s* >>>> >>>> hduser@ctpvm: /home/hduser> k get pods -n spark >>>> >>>> NAME READY STATUS >>>> RESTARTS AGE >>>> >>>> randomdatabigquery-d42d067e2b91c88a-exec-1 1/1 Running 0 >>>> 38s >>>> >>>> randomdatabigquery-d42d067e2b91c88a-exec-2 1/1 Running 0 >>>> 38s >>>> >>>> sparkbq-0beda77e2b919e01-driver 1/1 Running 0 >>>> 50s >>>> >>>> hduser@ctpvm: /home/hduser> k get pods -n spark >>>> >>>> *NAME READY STATUS >>>> RESTARTS AGE* >>>> >>>> *randomdatabigquery-d42d067e2b91c88a-exec-1 1/1 Running 0 >>>> 40s* >>>> >>>> *randomdatabigquery-d42d067e2b91c88a-exec-2 1/1 Running 0 >>>> 40s* >>>> >>>> *sparkbq-0beda77e2b919e01-driver 1/1 Running 0 >>>> 52s* >>>> >>>> So you end up with the three added executors dropping out. Hence the >>>> conclusion seems to be you want to fit exactly one Spark executor pod >>>> per Kubernetes node with the current model. >>>> >>>> HTH >>>> >>>> >>>> view my Linkedin profile >>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>> >>>> >>>> >>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>> any loss, damage or destruction of data or any other property which may >>>> arise from relying on this email's technical content is explicitly >>>> disclaimed. The author will in no case be liable for any monetary damages >>>> arising from such loss, damage or destruction. >>>> >>>> >>>> >>>> >>>> On Wed, 5 Jan 2022 at 17:01, bo yang <bobyan...@gmail.com> wrote: >>>> >>>>> Hi Mich, >>>>> >>>>> Curious what do you mean “The constraint seems to be that you can fit one >>>>> Spark executor pod per Kubernetes node and from my tests you don't seem to >>>>> be able to allocate more than 50% of RAM on the node to the >>>>> container", Would you help to explain a bit? Asking this because there >>>>> could be multiple executor pods running on a single Kuberentes node. >>>>> >>>>> Thanks, >>>>> Bo >>>>> >>>>> >>>>> On Wed, Jan 5, 2022 at 1:13 AM Mich Talebzadeh < >>>>> mich.talebza...@gmail.com> wrote: >>>>> >>>>>> Thanks William for the info. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> The current model of Spark on k8s has certain drawbacks with pod >>>>>> based scheduling as I tested it on Google Kubernetes Cluster (GKE). The >>>>>> constraint seems to be that you can fit one Spark executor pod per >>>>>> Kubernetes node and from my tests you don't seem to be able to allocate >>>>>> more than 50% of RAM on the node to the container. >>>>>> >>>>>> >>>>>> [image: gke_memoeyPlot.png] >>>>>> >>>>>> >>>>>> Anymore results in the container never been created (stuck at pending) >>>>>> >>>>>> kubectl describe pod sparkbq-b506ac7dc521b667-driver -n spark >>>>>> >>>>>> Events: >>>>>> >>>>>> Type Reason Age From >>>>>> Message >>>>>> >>>>>> ---- ------ ---- ---- >>>>>> ------- >>>>>> >>>>>> Warning FailedScheduling 17m default-scheduler >>>>>> 0/3 nodes are available: 3 Insufficient memory. >>>>>> >>>>>> Warning FailedScheduling 17m default-scheduler >>>>>> 0/3 nodes are available: 3 Insufficient memory. >>>>>> >>>>>> Normal NotTriggerScaleUp 2m28s (x92 over 17m) cluster-autoscaler >>>>>> pod didn't trigger scale-up: >>>>>> >>>>>> Obviously this is far from ideal and this model although works is not >>>>>> efficient. >>>>>> >>>>>> >>>>>> Cheers, >>>>>> >>>>>> >>>>>> Mich >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> view my Linkedin profile >>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility >>>>>> for any loss, damage or destruction >>>>>> >>>>>> of data or any other property which may arise from relying on this >>>>>> email's technical content is explicitly disclaimed. >>>>>> >>>>>> The author will in no case be liable for any monetary damages arising >>>>>> from such >>>>>> >>>>>> loss, damage or destruction. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Wed, 5 Jan 2022 at 03:55, William Wang <wang.platf...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Mich, >>>>>>> >>>>>>> Here are parts of performance indications in Volcano. >>>>>>> 1. Scheduler throughput: 1.5k pod/s (default scheduler: 100 Pod/s) >>>>>>> 2. Spark application performance improved 30%+ with minimal resource >>>>>>> reservation feature in case of insufficient resource.(tested with >>>>>>> TPC-DS) >>>>>>> >>>>>>> We are still working on more optimizations. Besides the performance, >>>>>>> Volcano is continuously enhanced in below four directions to provide >>>>>>> abilities that users care about. >>>>>>> - Full lifecycle management for jobs >>>>>>> - Scheduling policies for high-performance workloads(fair-share, >>>>>>> topology, sla, reservation, preemption, backfill etc) >>>>>>> - Support for heterogeneous hardware >>>>>>> - Performance optimization for high-performance workloads >>>>>>> >>>>>>> Thanks >>>>>>> LeiBo >>>>>>> >>>>>>> Mich Talebzadeh <mich.talebza...@gmail.com> 于2022年1月4日周二 18:12写道: >>>>>>> >>>>>> Interesting,thanks >>>>>>>> >>>>>>>> Do you have any indication of the ballpark figure (a rough >>>>>>>> numerical estimate) of adding Volcano as an alternative scheduler >>>>>>>> is going to improve Spark on k8s performance? >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> view my Linkedin profile >>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility >>>>>>>> for any loss, damage or destruction >>>>>>>> >>>>>>>> of data or any other property which may arise from relying on this >>>>>>>> email's technical content is explicitly disclaimed. >>>>>>>> >>>>>>>> The author will in no case be liable for any monetary damages >>>>>>>> arising from such >>>>>>>> >>>>>>>> loss, damage or destruction. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, 4 Jan 2022 at 09:43, Yikun Jiang <yikunk...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, folks! Wishing you all the best in 2022. >>>>>>>>> >>>>>>>>> I'd like to share the current status on "Support Customized K8S >>>>>>>>> Scheduler in Spark". >>>>>>>>> >>>>>>>>> >>>>>>>>> https://docs.google.com/document/d/1xgQGRpaHQX6-QH_J9YV2C2Dh6RpXefUpLM7KGkzL6Fg/edit#heading=h.1quyr1r2kr5n >>>>>>>>> >>>>>>>>> Framework/Common support >>>>>>>>> >>>>>>>>> - Volcano and Yunikorn team join the discussion and complete the >>>>>>>>> initial doc on framework/common part. >>>>>>>>> >>>>>>>>> - SPARK-37145 <https://issues.apache.org/jira/browse/SPARK-37145> >>>>>>>>> (under reviewing): We proposed to extend the customized scheduler by >>>>>>>>> just >>>>>>>>> using a custom feature step, it will meet the requirement of >>>>>>>>> customized >>>>>>>>> scheduler after it gets merged. After this, the user can enable >>>>>>>>> featurestep >>>>>>>>> and scheduler like: >>>>>>>>> >>>>>>>>> spark-submit \ >>>>>>>>> >>>>>>>>> --conf spark.kubernete.scheduler.name volcano \ >>>>>>>>> >>>>>>>>> --conf spark.kubernetes.driver.pod.featureSteps >>>>>>>>> org.apache.spark.deploy.k8s.features.scheduler.VolcanoFeatureStep >>>>>>>>> >>>>>>>>> --conf spark.kubernete.job.queue xxx >>>>>>>>> >>>>>>>>> (such as above, the VolcanoFeatureStep will help to set the the >>>>>>>>> spark scheduler queue according user specified conf) >>>>>>>>> >>>>>>>>> - SPARK-37331 <https://issues.apache.org/jira/browse/SPARK-37331>: >>>>>>>>> Added the ability to create kubernetes resources before driver pod >>>>>>>>> creation. >>>>>>>>> >>>>>>>>> - SPARK-36059 <https://issues.apache.org/jira/browse/SPARK-36059>: >>>>>>>>> Add the ability to specify a scheduler in driver/executor >>>>>>>>> >>>>>>>>> After above all, the framework/common support would be ready for >>>>>>>>> most of customized schedulers >>>>>>>>> >>>>>>>>> Volcano part: >>>>>>>>> >>>>>>>>> - SPARK-37258 <https://issues.apache.org/jira/browse/SPARK-37258>: >>>>>>>>> Upgrade kubernetes-client to 5.11.1 to add volcano scheduler API >>>>>>>>> support. >>>>>>>>> >>>>>>>>> - SPARK-36061 <https://issues.apache.org/jira/browse/SPARK-36061>: >>>>>>>>> Add a VolcanoFeatureStep to help users to create a PodGroup with user >>>>>>>>> specified minimum resources required, there is also a WIP commit >>>>>>>>> to show the preview of this >>>>>>>>> <https://github.com/Yikun/spark/pull/45/commits/81bf6f98edb5c00ebd0662dc172bc73f980b6a34> >>>>>>>>> . >>>>>>>>> >>>>>>>>> Yunikorn part: >>>>>>>>> >>>>>>>>> - @WeiweiYang is completing the doc of the Yunikorn part and >>>>>>>>> implementing the Yunikorn part. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Yikun >>>>>>>>> >>>>>>>>> >>>>>>>>> Weiwei Yang <w...@apache.org> 于2021年12月2日周四 02:00写道: >>>>>>>>> >>>>>>>>>> Thank you Yikun for the info, and thanks for inviting me to a >>>>>>>>>> meeting to discuss this. >>>>>>>>>> I appreciate your effort to put these together, and I agree that >>>>>>>>>> the purpose is to make Spark easy/flexible enough to support other >>>>>>>>>> K8s >>>>>>>>>> schedulers (not just for Volcano). >>>>>>>>>> As discussed, could you please help to abstract out the things in >>>>>>>>>> common and allow Spark to plug different implementations? I'd be >>>>>>>>>> happy to >>>>>>>>>> work with you guys on this issue. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Nov 30, 2021 at 6:49 PM Yikun Jiang <yikunk...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> @Weiwei @Chenya >>>>>>>>>>> >>>>>>>>>>> > Thanks for bringing this up. This is quite interesting, we >>>>>>>>>>> definitely should participate more in the discussions. >>>>>>>>>>> >>>>>>>>>>> Thanks for your reply and welcome to join the discussion, I >>>>>>>>>>> think the input from Yunikorn is very critical. >>>>>>>>>>> >>>>>>>>>>> > The main thing here is, the Spark community should make Spark >>>>>>>>>>> pluggable in order to support other schedulers, not just for >>>>>>>>>>> Volcano. It >>>>>>>>>>> looks like this proposal is pushing really hard for adopting >>>>>>>>>>> PodGroup, >>>>>>>>>>> which isn't part of K8s yet, that to me is problematic. >>>>>>>>>>> >>>>>>>>>>> Definitely yes, we are on the same page. >>>>>>>>>>> >>>>>>>>>>> I think we have the same goal: propose a general and reasonable >>>>>>>>>>> mechanism to make spark on k8s with a custom scheduler more usable. >>>>>>>>>>> >>>>>>>>>>> But for the PodGroup, just allow me to do a brief introduction: >>>>>>>>>>> - The PodGroup definition has been approved by Kubernetes >>>>>>>>>>> officially in KEP-583. [1] >>>>>>>>>>> - It can be regarded as a general concept/standard in Kubernetes >>>>>>>>>>> rather than a specific concept in Volcano, there are also others to >>>>>>>>>>> implement it, such as [2][3]. >>>>>>>>>>> - Kubernetes recommends using CRD to do more extension to >>>>>>>>>>> implement what they want. [4] >>>>>>>>>>> - Volcano as extension provides an interface to maintain the >>>>>>>>>>> life cycle PodGroup CRD and use volcano-scheduler to complete the >>>>>>>>>>> scheduling. >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> https://github.com/kubernetes/enhancements/tree/master/keps/sig-scheduling/583-coscheduling >>>>>>>>>>> [2] >>>>>>>>>>> https://github.com/kubernetes-sigs/scheduler-plugins/tree/master/pkg/coscheduling#podgroup >>>>>>>>>>> [3] https://github.com/kubernetes-sigs/kube-batch >>>>>>>>>>> [4] >>>>>>>>>>> https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/ >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Yikun >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Weiwei Yang <w...@apache.org> 于2021年12月1日周三 上午5:57写道: >>>>>>>>>>> >>>>>>>>>>>> Hi Chenya >>>>>>>>>>>> >>>>>>>>>>>> Thanks for bringing this up. This is quite interesting, we >>>>>>>>>>>> definitely should participate more in the discussions. >>>>>>>>>>>> The main thing here is, the Spark community should make Spark >>>>>>>>>>>> pluggable in order to support other schedulers, not just for >>>>>>>>>>>> Volcano. It >>>>>>>>>>>> looks like this proposal is pushing really hard for adopting >>>>>>>>>>>> PodGroup, >>>>>>>>>>>> which isn't part of K8s yet, that to me is problematic. >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Nov 30, 2021 at 9:21 AM Prasad Paravatha < >>>>>>>>>>>> prasad.parava...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> This is a great feature/idea. >>>>>>>>>>>>> I'd love to get involved in some form (testing and/or >>>>>>>>>>>>> documentation). This could be my 1st contribution to Spark! >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Nov 30, 2021 at 10:46 PM John Zhuge <jzh...@apache.org> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> +1 Kudos to Yikun and the community for starting the >>>>>>>>>>>>>> discussion! >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Nov 30, 2021 at 8:47 AM Chenya Zhang < >>>>>>>>>>>>>> chenyazhangche...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks folks for bringing up the topic of natively >>>>>>>>>>>>>>> integrating Volcano and other alternative schedulers into Spark! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> +Weiwei, Wilfred, Chaoran. We would love to contribute to >>>>>>>>>>>>>>> the discussion as well. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> From our side, we have been using and improving on one >>>>>>>>>>>>>>> alternative resource scheduler, Apache YuniKorn ( >>>>>>>>>>>>>>> https://yunikorn.apache.org/), for Spark on Kubernetes in >>>>>>>>>>>>>>> production at Apple with solid results in the past year. It is >>>>>>>>>>>>>>> capable of >>>>>>>>>>>>>>> supporting Gang scheduling (similar to PodGroups), multi-tenant >>>>>>>>>>>>>>> resource >>>>>>>>>>>>>>> queues (similar to YARN), FIFO, and other handy features like >>>>>>>>>>>>>>> bin packing >>>>>>>>>>>>>>> to enable efficient autoscaling, etc. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Natively integrating with Spark would provide more >>>>>>>>>>>>>>> flexibility for users and reduce the extra cost and potential >>>>>>>>>>>>>>> inconsistency >>>>>>>>>>>>>>> of maintaining different layers of resource strategies. One >>>>>>>>>>>>>>> interesting >>>>>>>>>>>>>>> topic we hope to discuss more about is dynamic allocation, >>>>>>>>>>>>>>> which would >>>>>>>>>>>>>>> benefit from native coordination between Spark and resource >>>>>>>>>>>>>>> schedulers in >>>>>>>>>>>>>>> K8s & cloud environment for an optimal resource efficiency. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Nov 30, 2021 at 8:10 AM Holden Karau < >>>>>>>>>>>>>>> hol...@pigscanfly.ca> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks for putting this together, I’m really excited for us >>>>>>>>>>>>>>>> to add better batch scheduling integrations. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Nov 30, 2021 at 12:46 AM Yikun Jiang < >>>>>>>>>>>>>>>> yikunk...@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hey everyone, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'd like to start a discussion on "Support >>>>>>>>>>>>>>>>> Volcano/Alternative Schedulers Proposal". >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> This SPIP is proposed to make spark k8s schedulers provide >>>>>>>>>>>>>>>>> more YARN like features (such as queues and minimum resources >>>>>>>>>>>>>>>>> before >>>>>>>>>>>>>>>>> scheduling jobs) that many folks want on Kubernetes. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The goal of this SPIP is to improve current spark k8s >>>>>>>>>>>>>>>>> scheduler implementations, add the ability of batch >>>>>>>>>>>>>>>>> scheduling and support >>>>>>>>>>>>>>>>> volcano as one of implementations. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Design doc: >>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1xgQGRpaHQX6-QH_J9YV2C2Dh6RpXefUpLM7KGkzL6Fg >>>>>>>>>>>>>>>>> JIRA: https://issues.apache.org/jira/browse/SPARK-36057 >>>>>>>>>>>>>>>>> Part of PRs: >>>>>>>>>>>>>>>>> Ability to create resources >>>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/34599 >>>>>>>>>>>>>>>>> Add PodGroupFeatureStep: >>>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/34456 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>> Yikun >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>>>>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>>>>>>>>>>> YouTube Live Streams: >>>>>>>>>>>>>>>> https://www.youtube.com/user/holdenkarau >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> John Zhuge >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Prasad Paravatha >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>> >>> -- >>> Twitter: https://twitter.com/holdenkarau >>> Books (Learning Spark, High Performance Spark, etc.): >>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>> >>