Mapping stages in DAG to line of code in pyspark

2021-04-18 Thread Dhruv Kumar
Hi I am using PySpark for writing Spark queries. My research project requires me to accurately measure latency for each and every operator/stage in the query. I can make some guesses but unable to exactly map the stages (shown in the DAG on Spark UI) to the exact line in my PySpark code. Can

4th generation Data Warehousing and Spark

2021-04-18 Thread Mich Talebzadeh
You must forgive me for this seemingly pseudo technical question. Last week I came across a client manager who mentioned developing 4th generation data warehousing with Spark. And I was wondering whether the individual pointedly made a reference to the new data lakehouse concept and how it was

Re: [Spark Core][Advanced]: Wrong memory allocation on standalone mode cluster

2021-04-18 Thread Sean Owen
Are you sure about the worker mem configuration? what are you setting --memory too and what does the worker UI think its memory allocation is? On Sun, Apr 18, 2021 at 4:08 AM Mohamadreza Rostami < mohamadrezarosta...@gmail.com> wrote: > I see a bug in executer memory allocation in the standalone

[Spark Core][Advanced]: Wrong memory allocation on standalone mode cluster

2021-04-18 Thread Mohamadreza Rostami
I see a bug in executer memory allocation in the standalone cluster, but I can't find which part of the spark code causes this problem. That why's I decided to raise this issue here. Assume you have 3 workers with 10 CPU cores and 10 Gigabyte memories. Assume also you have 2 spark jobs that run