Re: Announcing Delta Lake 0.2.0

2019-06-20 Thread Li Gao
Lyft recently open sourced a data discovery tool called Amundsen that can serve many of the data catalog needs. https://eng.lyft.com/amundsen-lyfts-data-discovery-metadata-engine-62d27254fbb9 https://github.com/lyft/amundsenmetadatalibrary You still need HMS to store the data schema though.

Re: Spark 2.4.1 on Kubernetes - DNS resolution of driver fails

2019-05-02 Thread Li Gao
hi Olivier, This seems a GKE specific issue? have you tried on other vendors ? Also on the kubelet nodes did you notice any pressure on the DNS side? Li On Mon, Apr 29, 2019, 5:43 AM Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > Hi everyone, > I have ~300 spark job on

Re: Difference between 'cores' config params: spark submit on k8s

2019-04-20 Thread Li Gao
hi Battini, The limit is a k8s construct that tells k8s how much cpu/cores your driver *can* consume. when you have the same value for 'spark.driver.cores' and ' spark.kubernetes.driver.limit.cores' your driver then runs at the 'Guranteed' k8s quality of service class, which can make your driver

Re: Spark Kubernetes Architecture: Deployments vs Pods that create Pods

2019-01-30 Thread Li Gao
Hi Wilson, As Yinan well said, for batch jobs with dynamic scaling requirements and communication between driver and executor, it does not fit into the service oriented Deployment paradigm of k8s. Thus we have the need to abstract these spark specific differences to k8s CRD and CRD controller to

Re: Spark UI History server on Kubernetes

2019-01-23 Thread Li Gao
In addition to what Rao mentioned, if you are using cloud blob storage such as AWS S3, you can specify your history location to be an S3 location such as: `s3://mybucket/path/to/history` On Wed, Jan 23, 2019 at 12:55 AM Rao, Abhishek (Nokia - IN/Bangalore) < abhishek@nokia.com> wrote: > Hi

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-19 Thread Li Gao
on yarn it is impossible afaik. on kubernetes you can use taints to keep certain nodes outside of spark On Fri, Jan 18, 2019 at 9:35 PM Felix Cheung wrote: > Not as far as I recall... > > > -- > *From:* Serega Sheypak > *Sent:* Friday, January 18, 2019 3:21 PM >

[Spark on K8s] Scaling experiences sharing

2018-11-09 Thread Li Gao
Hi Spark Community, I am reaching out to see if there are current large scale production or pre-production deployment of Spark on k8s for batch and micro batch jobs. Large scale means running 100s of thousand spark jobs daily and 1000s of concurrent spark jobs on a single k8s cluster and 10s of

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Li Gao
this is wonderful ! I noticed the official spark download site does not have 2.4 download links yet. On Thu, Nov 8, 2018, 4:11 PM Swapnil Shinde Great news.. thank you very much! > > On Thu, Nov 8, 2018, 5:19 PM Stavros Kontopoulos < > stavros.kontopou...@lightbend.com wrote: > >> Awesome! >> >>

Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

2018-10-31 Thread Li Gao
ts. Instead, please > notify the sender and delete the e-mail and any attachments. Thank you. > > Please consider the environment before printing. > > > > > > > > *From: *Li Gao > *Date: *Thursday, November 1, 2018 0:07 > *To: *"Zhang, Yuqi" > *

Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

2018-10-31 Thread Li Gao
Yuqi, Your error seems unrelated to headless service config you need to enable. For headless service you need to create a headless service that matches to your driver pod name exactly in order for spark 2.4 RC to work under client mode. We have this running for a while now using Jupyter kernel as

Re: External shuffle service on K8S

2018-10-26 Thread Li Gao
There are existing 2.2 based ext shuffle on the fork: https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes.html You can modify it to suit your needs. -Li On Fri, Oct 26, 2018 at 3:22 AM vincent gromakowski < vincent.gromakow...@gmail.com> wrote: > No it's on the roadmap >2.4 >

[K8S] Option to keep the executor pods after job finishes

2018-10-09 Thread Li Gao
Hi, Is there an option to keep the executor pods on k8s after the job finishes? We want to extract the logs and stats before removing the executor pods. Thanks, Li

Re: [K8S] Spark initContainer custom bootstrap support for Spark master

2018-08-16 Thread Li Gao
he > scripts prior to entering the entrypoint. > > Yinan > > On Wed, Aug 15, 2018 at 9:12 AM Li Gao wrote: > >> Hi, >> >> We've noticed on the latest Master (not Spark 2.3.1 branch), the support >> for Kubernetes initContainer is no longer there. What would be t

[K8S] Spark initContainer custom bootstrap support for Spark master

2018-08-15 Thread Li Gao
Hi, We've noticed on the latest Master (not Spark 2.3.1 branch), the support for Kubernetes initContainer is no longer there. What would be the path forward if we need to do custom bootstrap actions (i.e. run additional scripts) prior to driver/executor container entering running mode? Thanks,

Spark 2.4 release date

2018-06-18 Thread Li Gao
Hello, Do we know the estimate when Spark 2.4 will be GA? We are evaluating whether to back port some of 2.4 fixes into our 2.3 deployment. Thank you.