Hi! I am talking about "stateful operations like aggregations". Does this happen on heap or off heap by default? I came across a article where I saw both on and off heap are possible but I am not sure what happens by default and when Spark or Spark Structured Streaming decides to store off heap?
I don't even know what mapGroupsWithState does since It's not part of spark 2.1 which is what we currently use. Any pointers would be great. Thanks! On Tue, Apr 4, 2017 at 5:34 PM, Tathagata Das <tathagata.das1...@gmail.com> wrote: > Are you referring to the memory usage of stateful operations like > aggregations, or the new mapGroupsWithState? > The current implementation of the internal state store (that maintains the > stateful aggregates) is such that it keeps all the data in memory of the > executor. It does use HDFS-compatible file system for checkpointing, but as > of now, it currently keeps all the data in memory of the executor. This is > something we will improve in the future. > > That said, you can enabled watermarking in your query that would > automatically clear old, unnecessary state thus limiting the total memory > used for stateful operations. > Furthermore, you can also monitor the size of the state and get alerts if > the state is growing too large. > > Read more in the programming guide. > Watermarking - http://spark.apache.org/docs/latest/structured- > streaming-programming-guide.html#handling-late-data-and-watermarking > Monitoring - http://spark.apache.org/docs/latest/structured- > streaming-programming-guide.html#monitoring-streaming-queries > > In case you were referring to something else, please give us more context > details - what query, what are the symptoms you are observing. > > On Tue, Apr 4, 2017 at 5:17 PM, kant kodali <kanth...@gmail.com> wrote: > >> Why do we ever run out of memory in Spark Structured Streaming especially >> when Memory can always spill to disk ? until the disk is full we shouldn't >> be out of memory.isn't it? sure thrashing will happen more frequently and >> degrades performance but we do we ever run out Memory even in case of >> maintaining a state for 6 months or a year? >> >> Thanks! >> > >