[ https://issues.apache.org/jira/browse/SPARK-40927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17750660#comment-17750660 ]
Iain Morrison commented on SPARK-40927: --------------------------------------- In our case I found the following settings greatly improved our streaming applications, currently running for over 2 weeks without OOM killed (previously lasted a day or two) 1. Use RocksDB state store provider improved executor memory usage "spark.sql.streaming.stateStore.providerClass" -> "org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreProvider" Not sure if there is a leak in the default HDFS state store implementation or not. 2. Store UI on disk instead of in memory in the driver "spark.ui.store.path" -> "some path" Old issue but I hope this helps someone > Memory issue with Structured streaming > -------------------------------------- > > Key: SPARK-40927 > URL: https://issues.apache.org/jira/browse/SPARK-40927 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 3.3.0, 3.2.2 > Reporter: Mihir Kelkar > Priority: Major > > In Pyspark Structured streaming with Kafka as source and sink, the driver as > well as the executors seem to get OOM killed after a long period of time (few > days). Not able to pinpoint to any specific thing. > But 8-12 hrs long runs also show the slow memory creep in Prometheus metrics > values - > # JVM Off-heap memory of both driver and executors keep on increasing over > time (12-24hrs observation time) [I have NOT enabled off-heap usage] > # JVM heap memory of executors also keeps on bumping up in slow steps. > # JVM RSS of executors and driver keeps increasing but python RSS does not > increase > -Basic operation of counting rows from within sdf.forEachBatch() is being > done to debug ( -Original business logic has Some dropDuplicates, > aggregations , windowing are being done within the forEachBatch. > -watermarking on a custom timestamp column is being done. > > Heap Dump analysis shows large no. of duplicate strings (which look like > generated code). Further large no. of byte[], char[] and UTF8String objects.. > Does this point to any potential memory leak in Tungsten optimizer related > code? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org