k.apache.org<mailto:user@spark.apache.org>
Subject: Re: How to configure spark on Yarn cluster
Not sure that we are OK on one thing: Yarn limitations are for the sum of all
nodes, while you only specify the memory for a single node through Spark.
By the way, the memory displayed in the
jeff saremi; user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: How to configure spark on Yarn cluster
Check the executor page of the Spark UI, to check if your storage level is
limiting.
Also, instead of starting with 100 TB of data, sample it, make it work, and
grow it little by litt
park.apache.org>
Subject: Re: How to configure spark on Yarn cluster
Check the executor page of the Spark UI, to check if your storage level is
limiting.
Also, instead of starting with 100 TB of data, sample it, make it work, and
grow it little by little until you reached 100 TB.
From: yohann jardin <yohannjar...@hotmail.com>
Sent: Thursday, July 27, 2017 11:15:39 PM
To: jeff saremi; user@spark.apache.org
Subject: Re: How to configure spark on Yarn cluster
Check the executor page of the Spark UI, to check if your storage level is
limiting.
Also, i
Check the executor page of the Spark UI, to check if your storage level is
limiting.
Also, instead of starting with 100 TB of data, sample it, make it work, and
grow it little by little until you reached 100 TB. This will validate the
workflow and let you see how much data is shuffled, etc.
I have the simplest job which i'm running against 100TB of data. The job keeps
failing with ExecutorLostFailure's on containers killed by Yarn for exceeding
memory limits
I have varied the executor-memory from 32GB to 96GB, the
spark.yarn.executor.memoryOverhead from 8192 to 36000 and similar