We started hitting this as well, seeing 90+ GB resident memory on a 25 GB heap executor. After a lot of manually testing fixes, I finally figured out the root problem: https://issues.apache.org/jira/browse/SPARK-41339
Starting to work on a PR now to fix. On Mon, Sep 12, 2022 at 10:46 AM Artemis User <arte...@dtechspace.com> wrote: > The off-heap memory isn't subjected to GC. So the obvious reason is that > your have too many states to maintain in your streaming app, and the GC > couldn't keep up, and end up with resources but to die. Are you using > continues processing or microbatch in structured streaming? You may want > to lower your incoming data rate and/or increase your microbatch size so to > lower the number of states to be persisted/maintained... > > On 9/11/22 10:59 AM, akshit marwah wrote: > > Hi Team, > > We are trying to shift from HDFS State Manager to Rocks DB State Manager, > but while doing POC we realised it is using much more off-heap space than > expected. Because of this, the executors get killed with : *out of** > physical memory exception.* > > Could you please help in understanding, why is there a massive increase in > off-heap space, and what can we do about it? > > We are using, SPARK 3.2.1 with 1 executor and 1 executor core, to > understand the memory requirements - > 1. Rocks DB Run - took 3.5 GB heap and 11.5 GB Res Memory > 2. Hdfs State Manager - took 5 GB heap and 10 GB Res Memory. > > Thanks, > Akshit > > > Thanks and regards > - Akshit Marwah > > > -- Adam Binford