For #1, do we agree on the behavior? I think that closing a SparkSession
should not close the SparkContext unless it is the only session. Evidently,
that's not what happens and I consider the current the current behavior a
bug.

For more context, we're working on the new catalog APIs and how to
guarantee consistent operations. Self-joining a table, for example, should
use the same version of the table for both scans, and that state should be
specific to a session, not global. These plans assume that SparkSession
represents a session of interactions, along with a reasonable life-cycle.
If that life-cycle includes closing all sessions when you close any
session, then we can't really use sessions for this.

rb

On Wed, Apr 3, 2019 at 9:35 AM Vinoo Ganesh <vgan...@palantir.com> wrote:

> Yeah, so I think there are 2 separate issues here:
>
>
>
>    1. The coupling of the SparkSession + SparkContext in their current
>    form seem unnatural
>    2. The current memory leak, which I do believe is a case where the
>    session is added onto the spark context, but is only needed by the session
>    (but would appreciate a sanity check here). Meaning, it may make sense to
>    investigate an API change.
>
>
>
> Thoughts?
>
>
>
> On 4/2/19, 15:13, "Sean Owen" <sro...@gmail.com> wrote:
>
>     > @Sean – To the point that Ryan made, it feels wrong that stopping a
> session force stops the global context. Building in the logic to only stop
> the context when the last session is stopped also feels like a solution,
> but the best way I can think about doing this involves storing the global
> list of every available SparkSession, which may be difficult.
>
>
>
>     I tend to agree it would be more natural for the SparkSession to have
>
>     its own lifecycle 'stop' method that only stops/releases its own
>
>     resources. But is that the source of the problem here? if the state
>
>     you're trying to free is needed by the SparkContext, it won't help. If
>
>     it happens to be in the SparkContext but is state only needed by one
>
>     SparkSession and that there isn't any way to clean up now, that's a
>
>     compelling reason to change the API.  Is that the situation? The only
>
>     downside is making the user separately stop the SparkContext then.
>
>
>
> *From: *Vinoo Ganesh <vgan...@palantir.com>
> *Date: *Tuesday, April 2, 2019 at 13:24
> *To: *Arun Mahadevan <ar...@apache.org>, Ryan Blue <rb...@netflix.com>
> *Cc: *Sean Owen <sro...@gmail.com>, "dev@spark.apache.org" <
> dev@spark.apache.org>
> *Subject: *Re: Closing a SparkSession stops the SparkContext
>
>
>
> // Merging threads
>
>
>
> Thanks everyone for your thoughts. I’m very much in sync with Ryan here.
>
>
>
> @Sean – To the point that Ryan made, it feels wrong that stopping a
> session force stops the global context. Building in the logic to only stop
> the context when the last session is stopped also feels like a solution,
> but the best way I can think about doing this involves storing the global
> list of every available SparkSession, which may be difficult.
>
>
>
> @Arun – If the intention is not to be able to clear and create new
> sessions, then what specific is the intended use case of Sessions? 
> https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html
> [databricks.com]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__databricks.com_blog_2016_08_15_how-2Dto-2Duse-2Dsparksession-2Din-2Dapache-2Dspark-2D2-2D0.html&d=DwMGaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc&m=gSbHnTWozD5jH8QNJAVS16x0z9oydSZcgWVhXacfk00&s=L_WFiV7KKSXbpJfUKioz7JL3SSS-QhRAZOzS2OpTI3c&e=>
> describes SparkSessions as time bounded interactions which implies that old
> ones should be clear-able an news ones create-able in lockstep without
> adverse effect?
>
>
>
> *From: *Arun Mahadevan <ar...@apache.org>
> *Date: *Tuesday, April 2, 2019 at 12:31
> *To: *Ryan Blue <rb...@netflix.com>
> *Cc: *Vinoo Ganesh <vgan...@palantir.com>, Sean Owen <sro...@gmail.com>, "
> dev@spark.apache.org" <dev@spark.apache.org>
> *Subject: *Re: Closing a SparkSession stops the SparkContext
>
>
>
> I am not sure how would it cause a leak though. When a spark session or
> the underlying context is stopped it should clean up everything. The
> getOrCreate is supposed to return the active thread local or the global
> session. May be if you keep creating new sessions after explicitly clearing
> the default and the local sessions and keep leaking the sessions it could
> happen, but I don't think Sessions are intended to be used that way.
>
>
>
> On Tue, 2 Apr 2019 at 08:45, Ryan Blue <rb...@netflix.com.invalid> wrote:
>
> I think Vinoo is right about the intended behavior. If we support multiple
> sessions in one context, then stopping any one session shouldn't stop the
> shared context. The last session to be stopped should stop the context, but
> not any before that. We don't typically run multiple sessions in the same
> context so we haven't hit this, but it sounds reasonable.
>
>
>
> On 4/2/19, 11:44, "Sean Owen" <sro...@gmail.com> wrote:
>
>
>
>     Yeah there's one global default session, but it's possible to create
>
>     others and set them as the thread's active session, to allow for
>
>     different configurations in the SparkSession within one app. I think
>
>     you're asking why closing one of them would effectively shut all of
>
>     them down by stopping the SparkContext. My best guess is simply, well,
>
>     that's how it works. You'd only call this, like SparkContext.stop(),
>
>     when you know the whole app is done and want to clean up. SparkSession
>
>     is a kind of wrapper on SparkContext and it wouldn't be great to make
>
>     users stop all the sessions and go find and stop the context.
>
>
>
>     If there is some per-SparkSession state that needs a cleanup, then
>
>     that's a good point, as I don't see a lifecycle method that means
>
>     "just close this session".
>
>     You're talking about SparkContext state though right, and there is
>
>     definitely just one SparkContext though. It can/should only be stopped
>
>     when the app is really done.
>
>
>
>     Is the point that each session is adding some state to the context and
>
>     doesn't have any mechanism for removing it?
>
>
>
>
>
> On Tue, Apr 2, 2019 at 8:23 AM Vinoo Ganesh <vgan...@palantir.com> wrote:
>
> Hey Sean - Cool, maybe I'm misunderstanding the intent of clearing a
> session vs. stopping it.
>
> The cause of the leak looks to be because of this line here 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L131
> [github.com]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_blob_master_sql_core_src_main_scala_org_apache_spark_sql_util_QueryExecutionListener.scala-23L131&d=DwMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc&m=KIwH3bmyCIzxTh975_fcntmBD66iY1eMpUjQGCIj5CE&s=m3TMVURR5i3oR6XWWcWO97RPEjjsWiY4ZS9vlnF3Ktk&e=>.
> The ExecutionListenerBus that's added persists forever on the context's
> listener bus (the SparkContext ListenerBus has an ExecutionListenerBus).
> I'm trying to figure out the place that this cleanup should happen.
>
> With the current implementation, calling SparkSession.stop will clean up
> the ExecutionListenerBus (since the context itself is stopped), but it's
> unclear to me why terminating one session should terminate the JVM-global
> context. Possible my mental model is off here, but I would expect stopping
> a session to remove all traces of that session, while keeping the context
> alive, and stopping a context would, well, stop the context.
>
> If stopping the session is expected to stop the context, what's the
> intended usage of clearing the active / default session?
>
> Vinoo
>
> On 4/2/19, 10:57, "Sean Owen" <sro...@gmail.com> wrote:
>
>     What are you expecting there ... that sounds correct? something else
>     needs to be closed?
>
>     On Tue, Apr 2, 2019 at 9:45 AM Vinoo Ganesh <vgan...@palantir.com>
> wrote:
>     >
>     > Hi All -
>     >
>     >    I’ve been digging into the code and looking into what appears to
> be a memory leak (
> https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.apache.org_jira_browse_SPARK-2D27337&d=DwIFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc&m=TjtXLhnSM5M_aKQlD2NFU2wRnXPvtrUbRm-t84gBNlY&s=JUsN7EzGimus0jYxyj47_xHYUDC6KnxieeUBfUKTefk&e=)
> and have noticed something kind of peculiar about the way closing a
> SparkSession is handled. Despite being marked as Closeable,
> closing/stopping a SparkSession simply stops the SparkContext. This changed
> happened as a result of one of the PRs addressing
> https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.apache.org_jira_browse_SPARK-2D15073&d=DwIFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc&m=TjtXLhnSM5M_aKQlD2NFU2wRnXPvtrUbRm-t84gBNlY&s=Nd9eBDH-FDdzEn_BVt2nZaNQn6fXA8EfVq5rKGztOUo&e=
> in
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_pull_12873_files-23diff-2Dd91c284798f1c98bf03a31855e26d71cR596&d=DwIFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc&m=TjtXLhnSM5M_aKQlD2NFU2wRnXPvtrUbRm-t84gBNlY&s=RM9LrT3Yp2mf1BcbBf1o_m3bcNZdOjznrogBLzUzgeE&e=
> .
>     >
>     >
>     >
>     > I’m trying to understand why this is the intended behavior – anyone
> have any knowledge of why this is the case?
>     >
>     >
>     >
>     > Thanks,
>     >
>     > Vinoo
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
>
>
> --
>
> Ryan Blue
>
> Software Engineer
>
> Netflix
>
>

-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to