Re: SparkContext singleton get w/o create?

Andrew Melo Mon, 27 Aug 2018 12:49:08 -0700

Hi,

I'm a long-time listener, first-time committer to spark, so this is
good to get my feet wet. I'm particularly interested in SPARK-23836,
which is an itch I may want to dive into and scratch myself in the
next month or so since it's pretty painful for our use-case.


Thanks!
Andrew

On Mon, Aug 27, 2018 at 2:20 PM, Holden Karau <hol...@pigscanfly.ca> wrote:
> Sure, I don't think you should wait on that being merged in. If you want to
> take the JIRA go ahead (although if you're already familiar with the Spark
> code base it might make sense to leave it as a starter issue for someone who
> is just getting started).
>
> On Mon, Aug 27, 2018 at 12:18 PM Andrew Melo <andrew.m...@gmail.com> wrote:
>>
>> Hi Holden,
>>
>> I'm agnostic to the approach (though it seems cleaner to have an
>> explicit API for it). If you would like, I can take that JIRA and
>> implement it (should be a 3-line function).
>>
>> Cheers
>> Andrew
>>
>> On Mon, Aug 27, 2018 at 2:14 PM, Holden Karau <hol...@pigscanfly.ca>
>> wrote:
>> > Seems reasonable. We should probably add `getActiveSession` to the
>> > PySpark
>> > API (filed a starter JIRA
>> > https://issues.apache.org/jira/browse/SPARK-25255
>> > )
>> >
>> > On Mon, Aug 27, 2018 at 12:09 PM Andrew Melo <andrew.m...@gmail.com>
>> > wrote:
>> >>
>> >> Hello Sean, others -
>> >>
>> >> Just to confirm, is it OK for client applications to access
>> >> SparkContext._active_spark_context, if it wraps the accesses in `with
>> >> SparkContext._lock:`?
>> >>
>> >> If that's acceptable to Spark, I'll implement the modifications in the
>> >> Jupyter extensions.
>> >>
>> >> thanks!
>> >> Andrew
>> >>
>> >> On Tue, Aug 7, 2018 at 5:52 PM, Andrew Melo <andrew.m...@gmail.com>
>> >> wrote:
>> >> > Hi Sean,
>> >> >
>> >> > On Tue, Aug 7, 2018 at 5:44 PM, Sean Owen <sro...@gmail.com> wrote:
>> >> >> Ah, python.  How about SparkContext._active_spark_context then?
>> >> >
>> >> > Ah yes, that looks like the right member, but I'm a bit wary about
>> >> > depending on functionality of objects with leading underscores. I
>> >> > assumed that was "private" and subject to change. Is that something I
>> >> > should be unconcerned about.
>> >> >
>> >> > The other thought is that the accesses with SparkContext are
>> >> > protected
>> >> > by "SparkContext._lock" -- should I also use that lock?
>> >> >
>> >> > Thanks for your help!
>> >> > Andrew
>> >> >
>> >> >>
>> >> >> On Tue, Aug 7, 2018 at 5:34 PM Andrew Melo <andrew.m...@gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> Hi Sean,
>> >> >>>
>> >> >>> On Tue, Aug 7, 2018 at 5:16 PM, Sean Owen <sro...@gmail.com> wrote:
>> >> >>> > Is SparkSession.getActiveSession what you're looking for?
>> >> >>>
>> >> >>> Perhaps -- though there's not a corresponding python function, and
>> >> >>> I'm
>> >> >>> not exactly sure how to call the scala getActiveSession without
>> >> >>> first
>> >> >>> instantiating the python version and causing a JVM to start.
>> >> >>>
>> >> >>> Is there an easy way to call getActiveSession that doesn't start a
>> >> >>> JVM?
>> >> >>>
>> >> >>> Cheers
>> >> >>> Andrew
>> >> >>>
>> >> >>> >
>> >> >>> > On Tue, Aug 7, 2018 at 5:11 PM Andrew Melo
>> >> >>> > <andrew.m...@gmail.com>
>> >> >>> > wrote:
>> >> >>> >>
>> >> >>> >> Hello,
>> >> >>> >>
>> >> >>> >> One pain point with various Jupyter extensions [1][2] that
>> >> >>> >> provide
>> >> >>> >> visual feedback about running spark processes is the lack of a
>> >> >>> >> public
>> >> >>> >> API to introspect the web URL. The notebook server needs to know
>> >> >>> >> the
>> >> >>> >> URL to find information about the current SparkContext.
>> >> >>> >>
>> >> >>> >> Simply looking for "localhost:4040" works most of the time, but
>> >> >>> >> fails
>> >> >>> >> if multiple spark notebooks are being run on the same host --
>> >> >>> >> spark
>> >> >>> >> increments the port for each new context, leading to confusion
>> >> >>> >> when
>> >> >>> >> the notebooks are trying to probe the web interface for
>> >> >>> >> information.
>> >> >>> >>
>> >> >>> >> I'd like to implement an analog to SparkContext.getOrCreate(),
>> >> >>> >> perhaps
>> >> >>> >> called "getIfExists()" that returns the current singleton if it
>> >> >>> >> exists, or None otherwise. The Jupyter code would then be able
>> >> >>> >> to
>> >> >>> >> use
>> >> >>> >> this entrypoint to query Spark for an active Spark context,
>> >> >>> >> which
>> >> >>> >> it
>> >> >>> >> could use to probe the web URL.
>> >> >>> >>
>> >> >>> >> It's a minor change, but this would be my first contribution to
>> >> >>> >> Spark,
>> >> >>> >> and I want to make sure my plan was kosher before I implemented
>> >> >>> >> it.
>> >> >>> >>
>> >> >>> >> Thanks!
>> >> >>> >> Andrew
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> [1] https://krishnan-r.github.io/sparkmonitor/
>> >> >>> >>
>> >> >>> >> [2] https://github.com/mozilla/jupyter-spark
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> ---------------------------------------------------------------------
>> >> >>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >> >>> >>
>> >> >>> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >>
>> >
>> >
>> > --
>> > Twitter: https://twitter.com/holdenkarau
>> > Books (Learning Spark, High Performance Spark, etc.):
>> > https://amzn.to/2MaRAG9
>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: SparkContext singleton get w/o create?

Reply via email to