Hi
I am using PySpark for writing Spark queries. My research project requires me
to accurately measure latency for each and every operator/stage in the query. I
can make some guesses but unable to exactly map the stages (shown in the DAG on
Spark UI) to the exact line in my PySpark code.
Can
You must forgive me for this seemingly pseudo technical question. Last
week I came across a client manager who mentioned developing 4th generation
data warehousing with Spark. And I was wondering whether the individual
pointedly made a reference to the new data lakehouse concept and how it was
Are you sure about the worker mem configuration? what are you setting
--memory too and what does the worker UI think its memory allocation is?
On Sun, Apr 18, 2021 at 4:08 AM Mohamadreza Rostami <
mohamadrezarosta...@gmail.com> wrote:
> I see a bug in executer memory allocation in the standalone
I see a bug in executer memory allocation in the standalone cluster, but I
can't find which part of the spark code causes this problem. That why's I
decided to raise this issue here.
Assume you have 3 workers with 10 CPU cores and 10 Gigabyte memories. Assume
also you have 2 spark jobs that run