Re: Re: OOM issue in Spark Driver

2024-06-11 Thread Mich Talebzadeh
In a nutshell, the culprit for the OOM  issue in your Spark driver appears
to be memory leakage or inefficient memory usage within your application.
This could be caused by factors such as:

   1. Accumulation of data or objects in memory over time without proper
   cleanup.
   2. Inefficient data processing or transformations leading to excessive
   memory usage.
   3. Long-running tasks or stages that accumulate memory usage.
   4. Suboptimal Spark configuration settings, such as insufficient memory
   allocation for the driver or executors.
   5. Check stages and executor tabs in Spark GUI

HTH

Mich Talebzadeh,

Technologist | Architect | Data Engineer  | Generative AI | FinCrime

PhD Imperial College London
London, United Kingdom


   view my Linkedin profile


 https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: The information provided is correct to the best of my knowledge
but of course cannot be guaranteed . It is essential to note that, as with
any advice, quote "one test result is worth one-thousand expert opinions
(Werner Von Braun)".


Mich Talebzadeh,
Technologist | Architect | Data Engineer  | Generative AI | FinCrime
PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College
London <https://en.wikipedia.org/wiki/Imperial_College_London>
London, United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".


On Tue, 11 Jun 2024 at 06:50, Lingzhe Sun  wrote:

> Hi Kathick,
>
> That suggests you're not performing stateful operations and therefore
> there're no state related metrics. You should consider other aspects that
> may cause OOM.
> Checking logs will always be a good start. And it would be better if some
> colleague of you is familiar with JVM and OOM related issues.
>
> BS
> Lingzhe Sun
>
>
> *From:* Karthick Nk 
> *Date:* 2024-06-11 13:28
> *To:* Lingzhe Sun 
> *CC:* Andrzej Zera ; User 
> *Subject:* Re: Re: OOM issue in Spark Driver
> Hi Lingzhe,
>
> I am able to get the below stats(i.e input rate, process rate, input rows
> etc..), but not able to find the exact stats that Andrzej asking (ie. 
> Aggregated
> Number Of Total State Rows), Could you guide me on how do I get those
> details for states under structured streaming.
> [image: image.png]
>
> Details:
> I am using Databricks runtime version: 13.3 LTS (includes Apache Spark
> 3.4.1, Scala 2.12)
> Driver and worker type:
> [image: image.png]
>
>
> Thanks
>
>
> On Tue, Jun 11, 2024 at 7:34 AM Lingzhe Sun 
> wrote:
>
>> Hi Kathick,
>>
>> I believed that what Andrzej means is that you should check
>> Aggregated Number Of Total State Rows
>> metirc which you could find in the structured streaming UI tab, which
>> indicate the total number of your states, only if you perform stateful
>> operations. If that increase indefinitely, you should probably check your
>> code logic.
>>
>> BS
>> Lingzhe Sun
>>
>>
>> *From:* Karthick Nk 
>> *Date:* 2024-06-09 14:45
>> *To:* Andrzej Zera 
>> *CC:* user 
>> *Subject:* Re: OOM issue in Spark Driver
>> Hi Andrzej,
>>
>> We are using both driver and workers too,
>> Details are as follows
>> Driver size:128GB Memory, 64 cores.
>> Executors size: 64GB Memory, 32 Cores (Executors 1 to 10 - Autoscale)
>>
>> Workers memory usage:
>> One of the worker memory usage screenshot:
>> [image: image.png]
>>
>>
>> State metrics details below:
>> [image: image.png]
>> [image: image.png]
>>
>> I am not getting memory-related info from the structure streaming tab,
>> Could you help me here?
>>
>> Please let me know if you need more details.
>>
>> If possible we can connect once at your time and look into the issue
>> which will be more helpful to me.
>>
>> Thanks
>>
>> On Sat, Jun 8, 2024 at 2:41 PM Andrzej Zera 
>> wrote:
>>
>>> Hey, do you perform stateful operations? Maybe your state is growing
>>> indefinitely - a screenshot with state metrics would help (you can find it
>>> in Spark UI -> Structured Streaming -> your query). Do you have a
>>> driver-only cluster or do you have workers too? What's the memory usage
>>> profile at workers?
>>>

Re: OOM issue in Spark Driver

2024-06-08 Thread Andrzej Zera
Hey, do you perform stateful operations? Maybe your state is growing
indefinitely - a screenshot with state metrics would help (you can find it
in Spark UI -> Structured Streaming -> your query). Do you have a
driver-only cluster or do you have workers too? What's the memory usage
profile at workers?

Regards,
Andrzej


sob., 8 cze 2024 o 10:39 Karthick Nk  napisał(a):

> Hi All,
>
> I am using the pyspark structure streaming with Azure Databricks for data
> load process.
>
> In the Pipeline I am using a Job cluster and I am running only one
> pipeline, I am getting the OUT OF MEMORY issue while running for a
> long time. When I inspect the metrics of the cluster I found that, the
> memory usage getting increased by time by time even when there is no
> huge volume of data.
>
> [image: image.png]
>
>
> [image: image.png]
>
> After 4 hours of running the pipeline continuously, I am getting out of
> memory issue where used memory in the driver getting increased from 47 GB
> to 111 GB which is almost triple, I am unable to understand why this many
> memory occcupied in the driver. Am I missing anything here to notice? Could
> you guide me to figure out the root cause?
>
> Note:
> 1. I confirmed persist and unpersist that I used in code taken care
> properly for every batch execution.
> 2. Data is not increasing when time passes, (stream data getting almost
> same amount of data for every batch)
>
>
> Thanks,
>
>
>
>