Lyft recently open sourced a data discovery tool called Amundsen that can
serve many of the data catalog needs.
https://eng.lyft.com/amundsen-lyfts-data-discovery-metadata-engine-62d27254fbb9
https://github.com/lyft/amundsenmetadatalibrary
You still need HMS to store the data schema though.
hi Olivier,
This seems a GKE specific issue? have you tried on other vendors ? Also on
the kubelet nodes did you notice any pressure on the DNS side?
Li
On Mon, Apr 29, 2019, 5:43 AM Olivier Girardot <
o.girar...@lateral-thoughts.com> wrote:
> Hi everyone,
> I have ~300 spark job on
hi Battini,
The limit is a k8s construct that tells k8s how much cpu/cores your driver
*can* consume.
when you have the same value for 'spark.driver.cores' and '
spark.kubernetes.driver.limit.cores' your driver then runs at the
'Guranteed' k8s quality of service class, which can make your driver
Hi Wilson,
As Yinan well said, for batch jobs with dynamic scaling requirements and
communication between driver and executor, it does not fit into the service
oriented Deployment paradigm of k8s. Thus we have the need to abstract
these spark specific differences to k8s CRD and CRD controller to
In addition to what Rao mentioned, if you are using cloud blob storage such
as AWS S3, you can specify your history location to be an S3 location such
as: `s3://mybucket/path/to/history`
On Wed, Jan 23, 2019 at 12:55 AM Rao, Abhishek (Nokia - IN/Bangalore) <
abhishek@nokia.com> wrote:
> Hi
on yarn it is impossible afaik. on kubernetes you can use taints to keep
certain nodes outside of spark
On Fri, Jan 18, 2019 at 9:35 PM Felix Cheung
wrote:
> Not as far as I recall...
>
>
> --
> *From:* Serega Sheypak
> *Sent:* Friday, January 18, 2019 3:21 PM
>
Hi Spark Community,
I am reaching out to see if there are current large scale production or
pre-production deployment of Spark on k8s for batch and micro batch jobs.
Large scale means running 100s of thousand spark jobs daily and 1000s of
concurrent spark jobs on a single k8s cluster and 10s of
this is wonderful !
I noticed the official spark download site does not have 2.4 download links
yet.
On Thu, Nov 8, 2018, 4:11 PM Swapnil Shinde Great news.. thank you very much!
>
> On Thu, Nov 8, 2018, 5:19 PM Stavros Kontopoulos <
> stavros.kontopou...@lightbend.com wrote:
>
>> Awesome!
>>
>>
ts. Instead, please
> notify the sender and delete the e-mail and any attachments. Thank you.
>
> Please consider the environment before printing.
>
>
>
>
>
>
>
> *From: *Li Gao
> *Date: *Thursday, November 1, 2018 0:07
> *To: *"Zhang, Yuqi"
> *
Yuqi,
Your error seems unrelated to headless service config you need to enable.
For headless service you need to create a headless service that matches to
your driver pod name exactly in order for spark 2.4 RC to work under client
mode. We have this running for a while now using Jupyter kernel as
There are existing 2.2 based ext shuffle on the fork:
https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes.html
You can modify it to suit your needs.
-Li
On Fri, Oct 26, 2018 at 3:22 AM vincent gromakowski <
vincent.gromakow...@gmail.com> wrote:
> No it's on the roadmap >2.4
>
Hi,
Is there an option to keep the executor pods on k8s after the job finishes?
We want to extract the logs and stats before removing the executor pods.
Thanks,
Li
he
> scripts prior to entering the entrypoint.
>
> Yinan
>
> On Wed, Aug 15, 2018 at 9:12 AM Li Gao wrote:
>
>> Hi,
>>
>> We've noticed on the latest Master (not Spark 2.3.1 branch), the support
>> for Kubernetes initContainer is no longer there. What would be t
Hi,
We've noticed on the latest Master (not Spark 2.3.1 branch), the support
for Kubernetes initContainer is no longer there. What would be the path
forward if we need to do custom bootstrap actions (i.e. run additional
scripts) prior to driver/executor container entering running mode?
Thanks,
Hello,
Do we know the estimate when Spark 2.4 will be GA?
We are evaluating whether to back port some of 2.4 fixes into our 2.3
deployment.
Thank you.
15 matches
Mail list logo