I have a DStream receiving data from a socket. I'm using local mode.
I set "spark.streaming.unpersist" to "false" and leave "
spark.cleaner.ttl" to be infinite.
I can see files for input and shuffle blocks under "spark.local.dir"
folder and the size of folder keeps increasing, although JVM's memory
usage seems to be stable.

[question] In this case, because input RDDs are persisted but they don't
fit into memory, so write to disk, right? And where can I see the
details about these RDDs? I don't see them in web UI.

Then I set "spark.streaming.unpersist" to "true", the size of
"spark.local.dir" folder and JVM's used heap size are reduced regularly.

[question] In this case, because I didn't change "spark.cleaner.ttl",
which component is doing the cleanup? And what's the difference if I set
"spark.cleaner.ttl" to some duration in this case?

Thank you!

Reply via email to