Seems reasonable. We should probably add `getActiveSession` to the PySpark API (filed a starter JIRA https://issues.apache.org/jira/browse/SPARK-25255 )
On Mon, Aug 27, 2018 at 12:09 PM Andrew Melo <andrew.m...@gmail.com> wrote: > Hello Sean, others - > > Just to confirm, is it OK for client applications to access > SparkContext._active_spark_context, if it wraps the accesses in `with > SparkContext._lock:`? > > If that's acceptable to Spark, I'll implement the modifications in the > Jupyter extensions. > > thanks! > Andrew > > On Tue, Aug 7, 2018 at 5:52 PM, Andrew Melo <andrew.m...@gmail.com> wrote: > > Hi Sean, > > > > On Tue, Aug 7, 2018 at 5:44 PM, Sean Owen <sro...@gmail.com> wrote: > >> Ah, python. How about SparkContext._active_spark_context then? > > > > Ah yes, that looks like the right member, but I'm a bit wary about > > depending on functionality of objects with leading underscores. I > > assumed that was "private" and subject to change. Is that something I > > should be unconcerned about. > > > > The other thought is that the accesses with SparkContext are protected > > by "SparkContext._lock" -- should I also use that lock? > > > > Thanks for your help! > > Andrew > > > >> > >> On Tue, Aug 7, 2018 at 5:34 PM Andrew Melo <andrew.m...@gmail.com> > wrote: > >>> > >>> Hi Sean, > >>> > >>> On Tue, Aug 7, 2018 at 5:16 PM, Sean Owen <sro...@gmail.com> wrote: > >>> > Is SparkSession.getActiveSession what you're looking for? > >>> > >>> Perhaps -- though there's not a corresponding python function, and I'm > >>> not exactly sure how to call the scala getActiveSession without first > >>> instantiating the python version and causing a JVM to start. > >>> > >>> Is there an easy way to call getActiveSession that doesn't start a JVM? > >>> > >>> Cheers > >>> Andrew > >>> > >>> > > >>> > On Tue, Aug 7, 2018 at 5:11 PM Andrew Melo <andrew.m...@gmail.com> > >>> > wrote: > >>> >> > >>> >> Hello, > >>> >> > >>> >> One pain point with various Jupyter extensions [1][2] that provide > >>> >> visual feedback about running spark processes is the lack of a > public > >>> >> API to introspect the web URL. The notebook server needs to know the > >>> >> URL to find information about the current SparkContext. > >>> >> > >>> >> Simply looking for "localhost:4040" works most of the time, but > fails > >>> >> if multiple spark notebooks are being run on the same host -- spark > >>> >> increments the port for each new context, leading to confusion when > >>> >> the notebooks are trying to probe the web interface for information. > >>> >> > >>> >> I'd like to implement an analog to SparkContext.getOrCreate(), > perhaps > >>> >> called "getIfExists()" that returns the current singleton if it > >>> >> exists, or None otherwise. The Jupyter code would then be able to > use > >>> >> this entrypoint to query Spark for an active Spark context, which it > >>> >> could use to probe the web URL. > >>> >> > >>> >> It's a minor change, but this would be my first contribution to > Spark, > >>> >> and I want to make sure my plan was kosher before I implemented it. > >>> >> > >>> >> Thanks! > >>> >> Andrew > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> [1] https://krishnan-r.github.io/sparkmonitor/ > >>> >> > >>> >> [2] https://github.com/mozilla/jupyter-spark > >>> >> > >>> >> > --------------------------------------------------------------------- > >>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >>> >> > >>> > > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau