Hey vaquar,

The link does't explain the crucial detail we're interested in - does executor
re-use the data that exists on a node from previous executor and if not, how
can we configure it to do so?

We are not running on kubernetes, so EKS/Kubernetes-specific advice isn't
very relevant.

We are running spark standalone mode.

Best regards,
maksym

On 2023/05/17 12:28:35 vaquar khan wrote:
> Following link you will get all required details
> 
> https://aws.amazon.com/blogs/containers/best-practices-for-running-spark-on-amazon-eks/
> 
> Let me know if you required further informations.
> 
> 
> Regards,
> Vaquar khan
> 
> 
> 
> 
> On Mon, May 15, 2023, 10:14 PM Mich Talebzadeh <mi...@gmail.com>
> wrote:
> 
> > Couple of points
> >
> > Why use spot or pre-empt intantes when your application as you stated
> > shuffles heavily.
> > Have you looked at why you are having these shuffles? What is the cause of
> > these large transformations ending up in shuffle
> >
> > Also on your point:
> > "..then ideally we should expect that when an executor is killed/OOM'd
> > and a new executor is spawned on the same host, the new executor registers
> > the shuffle files to itself. Is that so?"
> >
> > What guarantee is that the new executor with inherited shuffle files will
> > succeed?
> >
> > Also OOM is often associated with some form of skewed data
> >
> > HTH
> > .
> > Mich Talebzadeh,
> > Lead Solutions Architect/Engineering Lead
> > Palantir Technologies Limited
> > London
> > United Kingdom
> >
> >
> >    view my Linkedin profile
> > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
> >
> >
> >  https://en.everybodywiki.com/Mich_Talebzadeh
> >
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising from
> > such loss, damage or destruction.
> >
> >
> >
> >
> > On Mon, 15 May 2023 at 13:11, Faiz Halde <fa...@nubank.com.br.invalid>
> > wrote:
> >
> >> Hello,
> >>
> >> We've been in touch with a few spark specialists who suggested us a
> >> potential solution to improve the reliability of our jobs that are shuffle
> >> heavy
> >>
> >> Here is what our setup looks like
> >>
> >>    - Spark version: 3.3.1
> >>    - Java version: 1.8
> >>    - We do not use external shuffle service
> >>    - We use spot instances
> >>
> >> We run spark jobs on clusters that use Amazon EBS volumes. The
> >> spark.local.dir is mounted on this EBS volume. One of the offerings from
> >> the service we use is EBS migration which basically means if a host is
> >> about to get evicted, a new host is created and the EBS volume is attached
> >> to it
> >>
> >> When Spark assigns a new executor to the newly created instance, it
> >> basically can recover all the shuffle files that are already persisted in
> >> the migrated EBS volume
> >>
> >> Is this how it works? Do executors recover / re-register the shuffle
> >> files that they found?
> >>
> >> So far I have not come across any recovery mechanism. I can only see
> >>
> >> KubernetesLocalDiskShuffleDataIO
> >>
> >>  that has a pre-init step where it tries to register the available
> >> shuffle files to itself
> >>
> >> A natural follow-up on this,
> >>
> >> If what they claim is true, then ideally we should expect that when an
> >> executor is killed/OOM'd and a new executor is spawned on the same host,
> >> the new executor registers the shuffle files to itself. Is that so?
> >>
> >> Thanks
> >>
> >> ------------------------------
> >> Confidentiality note: This e-mail may contain confidential information
> >> from Nu Holdings Ltd and/or its affiliates. If you have received it by
> >> mistake, please let us know by e-mail reply and delete it from your system;
> >> you may not copy this message or disclose its contents to anyone; for
> >> details about what personal information we collect and why, please refer to
> >> our privacy policy
> >> <https://api.mziq.com/mzfilemanager/v2/d/59a081d2-0d63-4bb5-b786-4c07ae26bc74/6f4939b9-5f74-a528-1835-596b481dca54>
> >> .
> >>
> >
> 
-- 

Confidentiality note: This e-mail may contain confidential information 
from Nu Holdings Ltd and/or its affiliates. If you have received it by 
mistake, please let us know by e-mail reply and delete it from your system; 
you may not copy this message or disclose its contents to anyone; for 
details about what personal information we collect and why, please refer to 
our privacy policy 
<https://api.mziq.com/mzfilemanager/v2/d/59a081d2-0d63-4bb5-b786-4c07ae26bc74/6f4939b9-5f74-a528-1835-596b481dca54>.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to