Hi, I'm a long-time listener, first-time committer to spark, so this is good to get my feet wet. I'm particularly interested in SPARK-23836, which is an itch I may want to dive into and scratch myself in the next month or so since it's pretty painful for our use-case.
Thanks! Andrew On Mon, Aug 27, 2018 at 2:20 PM, Holden Karau <hol...@pigscanfly.ca> wrote: > Sure, I don't think you should wait on that being merged in. If you want to > take the JIRA go ahead (although if you're already familiar with the Spark > code base it might make sense to leave it as a starter issue for someone who > is just getting started). > > On Mon, Aug 27, 2018 at 12:18 PM Andrew Melo <andrew.m...@gmail.com> wrote: >> >> Hi Holden, >> >> I'm agnostic to the approach (though it seems cleaner to have an >> explicit API for it). If you would like, I can take that JIRA and >> implement it (should be a 3-line function). >> >> Cheers >> Andrew >> >> On Mon, Aug 27, 2018 at 2:14 PM, Holden Karau <hol...@pigscanfly.ca> >> wrote: >> > Seems reasonable. We should probably add `getActiveSession` to the >> > PySpark >> > API (filed a starter JIRA >> > https://issues.apache.org/jira/browse/SPARK-25255 >> > ) >> > >> > On Mon, Aug 27, 2018 at 12:09 PM Andrew Melo <andrew.m...@gmail.com> >> > wrote: >> >> >> >> Hello Sean, others - >> >> >> >> Just to confirm, is it OK for client applications to access >> >> SparkContext._active_spark_context, if it wraps the accesses in `with >> >> SparkContext._lock:`? >> >> >> >> If that's acceptable to Spark, I'll implement the modifications in the >> >> Jupyter extensions. >> >> >> >> thanks! >> >> Andrew >> >> >> >> On Tue, Aug 7, 2018 at 5:52 PM, Andrew Melo <andrew.m...@gmail.com> >> >> wrote: >> >> > Hi Sean, >> >> > >> >> > On Tue, Aug 7, 2018 at 5:44 PM, Sean Owen <sro...@gmail.com> wrote: >> >> >> Ah, python. How about SparkContext._active_spark_context then? >> >> > >> >> > Ah yes, that looks like the right member, but I'm a bit wary about >> >> > depending on functionality of objects with leading underscores. I >> >> > assumed that was "private" and subject to change. Is that something I >> >> > should be unconcerned about. >> >> > >> >> > The other thought is that the accesses with SparkContext are >> >> > protected >> >> > by "SparkContext._lock" -- should I also use that lock? >> >> > >> >> > Thanks for your help! >> >> > Andrew >> >> > >> >> >> >> >> >> On Tue, Aug 7, 2018 at 5:34 PM Andrew Melo <andrew.m...@gmail.com> >> >> >> wrote: >> >> >>> >> >> >>> Hi Sean, >> >> >>> >> >> >>> On Tue, Aug 7, 2018 at 5:16 PM, Sean Owen <sro...@gmail.com> wrote: >> >> >>> > Is SparkSession.getActiveSession what you're looking for? >> >> >>> >> >> >>> Perhaps -- though there's not a corresponding python function, and >> >> >>> I'm >> >> >>> not exactly sure how to call the scala getActiveSession without >> >> >>> first >> >> >>> instantiating the python version and causing a JVM to start. >> >> >>> >> >> >>> Is there an easy way to call getActiveSession that doesn't start a >> >> >>> JVM? >> >> >>> >> >> >>> Cheers >> >> >>> Andrew >> >> >>> >> >> >>> > >> >> >>> > On Tue, Aug 7, 2018 at 5:11 PM Andrew Melo >> >> >>> > <andrew.m...@gmail.com> >> >> >>> > wrote: >> >> >>> >> >> >> >>> >> Hello, >> >> >>> >> >> >> >>> >> One pain point with various Jupyter extensions [1][2] that >> >> >>> >> provide >> >> >>> >> visual feedback about running spark processes is the lack of a >> >> >>> >> public >> >> >>> >> API to introspect the web URL. The notebook server needs to know >> >> >>> >> the >> >> >>> >> URL to find information about the current SparkContext. >> >> >>> >> >> >> >>> >> Simply looking for "localhost:4040" works most of the time, but >> >> >>> >> fails >> >> >>> >> if multiple spark notebooks are being run on the same host -- >> >> >>> >> spark >> >> >>> >> increments the port for each new context, leading to confusion >> >> >>> >> when >> >> >>> >> the notebooks are trying to probe the web interface for >> >> >>> >> information. >> >> >>> >> >> >> >>> >> I'd like to implement an analog to SparkContext.getOrCreate(), >> >> >>> >> perhaps >> >> >>> >> called "getIfExists()" that returns the current singleton if it >> >> >>> >> exists, or None otherwise. The Jupyter code would then be able >> >> >>> >> to >> >> >>> >> use >> >> >>> >> this entrypoint to query Spark for an active Spark context, >> >> >>> >> which >> >> >>> >> it >> >> >>> >> could use to probe the web URL. >> >> >>> >> >> >> >>> >> It's a minor change, but this would be my first contribution to >> >> >>> >> Spark, >> >> >>> >> and I want to make sure my plan was kosher before I implemented >> >> >>> >> it. >> >> >>> >> >> >> >>> >> Thanks! >> >> >>> >> Andrew >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> [1] https://krishnan-r.github.io/sparkmonitor/ >> >> >>> >> >> >> >>> >> [2] https://github.com/mozilla/jupyter-spark >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> --------------------------------------------------------------------- >> >> >>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> >>> >> >> >> >>> > >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> >> > >> > >> > -- >> > Twitter: https://twitter.com/holdenkarau >> > Books (Learning Spark, High Performance Spark, etc.): >> > https://amzn.to/2MaRAG9 >> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > > > > -- > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 > YouTube Live Streams: https://www.youtube.com/user/holdenkarau --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org