For #1, do we agree on the behavior? I think that closing a SparkSession should not close the SparkContext unless it is the only session. Evidently, that's not what happens and I consider the current the current behavior a bug.
For more context, we're working on the new catalog APIs and how to guarantee consistent operations. Self-joining a table, for example, should use the same version of the table for both scans, and that state should be specific to a session, not global. These plans assume that SparkSession represents a session of interactions, along with a reasonable life-cycle. If that life-cycle includes closing all sessions when you close any session, then we can't really use sessions for this. rb On Wed, Apr 3, 2019 at 9:35 AM Vinoo Ganesh <vgan...@palantir.com> wrote: > Yeah, so I think there are 2 separate issues here: > > > > 1. The coupling of the SparkSession + SparkContext in their current > form seem unnatural > 2. The current memory leak, which I do believe is a case where the > session is added onto the spark context, but is only needed by the session > (but would appreciate a sanity check here). Meaning, it may make sense to > investigate an API change. > > > > Thoughts? > > > > On 4/2/19, 15:13, "Sean Owen" <sro...@gmail.com> wrote: > > > @Sean – To the point that Ryan made, it feels wrong that stopping a > session force stops the global context. Building in the logic to only stop > the context when the last session is stopped also feels like a solution, > but the best way I can think about doing this involves storing the global > list of every available SparkSession, which may be difficult. > > > > I tend to agree it would be more natural for the SparkSession to have > > its own lifecycle 'stop' method that only stops/releases its own > > resources. But is that the source of the problem here? if the state > > you're trying to free is needed by the SparkContext, it won't help. If > > it happens to be in the SparkContext but is state only needed by one > > SparkSession and that there isn't any way to clean up now, that's a > > compelling reason to change the API. Is that the situation? The only > > downside is making the user separately stop the SparkContext then. > > > > *From: *Vinoo Ganesh <vgan...@palantir.com> > *Date: *Tuesday, April 2, 2019 at 13:24 > *To: *Arun Mahadevan <ar...@apache.org>, Ryan Blue <rb...@netflix.com> > *Cc: *Sean Owen <sro...@gmail.com>, "dev@spark.apache.org" < > dev@spark.apache.org> > *Subject: *Re: Closing a SparkSession stops the SparkContext > > > > // Merging threads > > > > Thanks everyone for your thoughts. I’m very much in sync with Ryan here. > > > > @Sean – To the point that Ryan made, it feels wrong that stopping a > session force stops the global context. Building in the logic to only stop > the context when the last session is stopped also feels like a solution, > but the best way I can think about doing this involves storing the global > list of every available SparkSession, which may be difficult. > > > > @Arun – If the intention is not to be able to clear and create new > sessions, then what specific is the intended use case of Sessions? > https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html > [databricks.com] > <https://urldefense.proofpoint.com/v2/url?u=https-3A__databricks.com_blog_2016_08_15_how-2Dto-2Duse-2Dsparksession-2Din-2Dapache-2Dspark-2D2-2D0.html&d=DwMGaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc&m=gSbHnTWozD5jH8QNJAVS16x0z9oydSZcgWVhXacfk00&s=L_WFiV7KKSXbpJfUKioz7JL3SSS-QhRAZOzS2OpTI3c&e=> > describes SparkSessions as time bounded interactions which implies that old > ones should be clear-able an news ones create-able in lockstep without > adverse effect? > > > > *From: *Arun Mahadevan <ar...@apache.org> > *Date: *Tuesday, April 2, 2019 at 12:31 > *To: *Ryan Blue <rb...@netflix.com> > *Cc: *Vinoo Ganesh <vgan...@palantir.com>, Sean Owen <sro...@gmail.com>, " > dev@spark.apache.org" <dev@spark.apache.org> > *Subject: *Re: Closing a SparkSession stops the SparkContext > > > > I am not sure how would it cause a leak though. When a spark session or > the underlying context is stopped it should clean up everything. The > getOrCreate is supposed to return the active thread local or the global > session. May be if you keep creating new sessions after explicitly clearing > the default and the local sessions and keep leaking the sessions it could > happen, but I don't think Sessions are intended to be used that way. > > > > On Tue, 2 Apr 2019 at 08:45, Ryan Blue <rb...@netflix.com.invalid> wrote: > > I think Vinoo is right about the intended behavior. If we support multiple > sessions in one context, then stopping any one session shouldn't stop the > shared context. The last session to be stopped should stop the context, but > not any before that. We don't typically run multiple sessions in the same > context so we haven't hit this, but it sounds reasonable. > > > > On 4/2/19, 11:44, "Sean Owen" <sro...@gmail.com> wrote: > > > > Yeah there's one global default session, but it's possible to create > > others and set them as the thread's active session, to allow for > > different configurations in the SparkSession within one app. I think > > you're asking why closing one of them would effectively shut all of > > them down by stopping the SparkContext. My best guess is simply, well, > > that's how it works. You'd only call this, like SparkContext.stop(), > > when you know the whole app is done and want to clean up. SparkSession > > is a kind of wrapper on SparkContext and it wouldn't be great to make > > users stop all the sessions and go find and stop the context. > > > > If there is some per-SparkSession state that needs a cleanup, then > > that's a good point, as I don't see a lifecycle method that means > > "just close this session". > > You're talking about SparkContext state though right, and there is > > definitely just one SparkContext though. It can/should only be stopped > > when the app is really done. > > > > Is the point that each session is adding some state to the context and > > doesn't have any mechanism for removing it? > > > > > > On Tue, Apr 2, 2019 at 8:23 AM Vinoo Ganesh <vgan...@palantir.com> wrote: > > Hey Sean - Cool, maybe I'm misunderstanding the intent of clearing a > session vs. stopping it. > > The cause of the leak looks to be because of this line here > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L131 > [github.com] > <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_blob_master_sql_core_src_main_scala_org_apache_spark_sql_util_QueryExecutionListener.scala-23L131&d=DwMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc&m=KIwH3bmyCIzxTh975_fcntmBD66iY1eMpUjQGCIj5CE&s=m3TMVURR5i3oR6XWWcWO97RPEjjsWiY4ZS9vlnF3Ktk&e=>. > The ExecutionListenerBus that's added persists forever on the context's > listener bus (the SparkContext ListenerBus has an ExecutionListenerBus). > I'm trying to figure out the place that this cleanup should happen. > > With the current implementation, calling SparkSession.stop will clean up > the ExecutionListenerBus (since the context itself is stopped), but it's > unclear to me why terminating one session should terminate the JVM-global > context. Possible my mental model is off here, but I would expect stopping > a session to remove all traces of that session, while keeping the context > alive, and stopping a context would, well, stop the context. > > If stopping the session is expected to stop the context, what's the > intended usage of clearing the active / default session? > > Vinoo > > On 4/2/19, 10:57, "Sean Owen" <sro...@gmail.com> wrote: > > What are you expecting there ... that sounds correct? something else > needs to be closed? > > On Tue, Apr 2, 2019 at 9:45 AM Vinoo Ganesh <vgan...@palantir.com> > wrote: > > > > Hi All - > > > > I’ve been digging into the code and looking into what appears to > be a memory leak ( > https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.apache.org_jira_browse_SPARK-2D27337&d=DwIFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc&m=TjtXLhnSM5M_aKQlD2NFU2wRnXPvtrUbRm-t84gBNlY&s=JUsN7EzGimus0jYxyj47_xHYUDC6KnxieeUBfUKTefk&e=) > and have noticed something kind of peculiar about the way closing a > SparkSession is handled. Despite being marked as Closeable, > closing/stopping a SparkSession simply stops the SparkContext. This changed > happened as a result of one of the PRs addressing > https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.apache.org_jira_browse_SPARK-2D15073&d=DwIFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc&m=TjtXLhnSM5M_aKQlD2NFU2wRnXPvtrUbRm-t84gBNlY&s=Nd9eBDH-FDdzEn_BVt2nZaNQn6fXA8EfVq5rKGztOUo&e= > in > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_pull_12873_files-23diff-2Dd91c284798f1c98bf03a31855e26d71cR596&d=DwIFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc&m=TjtXLhnSM5M_aKQlD2NFU2wRnXPvtrUbRm-t84gBNlY&s=RM9LrT3Yp2mf1BcbBf1o_m3bcNZdOjznrogBLzUzgeE&e= > . > > > > > > > > I’m trying to understand why this is the intended behavior – anyone > have any knowledge of why this is the case? > > > > > > > > Thanks, > > > > Vinoo > > > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > > > > -- > > Ryan Blue > > Software Engineer > > Netflix > > -- Ryan Blue Software Engineer Netflix