[jupyter] Re: Scala Kernel Discussion

hadim Tue, 14 Mar 2017 13:18:16 -0700

Hello,

I am glad to see some people working on this.


I am the creator of a very early stage kernel for ImageJ 
<http://imagej.net/Welcome> (a widely used Java software for scientific 
imaging analysis) called scijava-jupyter-kernel 
<https://github.com/hadim/scijava-jupyter-kernel>. In ImageJ with have a script 
editor <https://github.com/scijava/script-editor> that allow the user to 
interact easily with ImageJ. A lot of languages are supported 
<https://github.com/scijava?utf8=%E2%9C%93&q=scripting-*&type=&language=> but 
we are mainly focusing on Groovy and Python.

Being a Python developers and Jupyter user for a long time now, I am really 
excited to be able to interact with ImageJ via Jupyter. Also, the lead 
developer of ImageJ (Curtis Rueden <https://imagej.net/User:Rueden>) 
started to make some notebooks 
<https://imagej.github.io/tutorials/%5B1.02%5D%20Introduction%20to%20ImageJ%20Ops.bkr.html>
 
using Beaker and Groovy.

In short, the scijava-jupyter-kernel is a java library to communicate with 
a Jupyter server. All the code execution rely on the various language 
specific packages available languages are supported 
<https://github.com/scijava?utf8=%E2%9C%93&q=scripting-*&type=&language=> and 
discovered during execution.

As probably everyone here we are very interested in features such as nice 
output formatting (image, table, plot, etc), code completion, etc. And I 
think a generic Jupyter library for JVM based kernels (as already said 
before) is definitively a step in the right direction.

Best,


On Friday, 3 March 2017 20:14:52 UTC-5, rgbkrk wrote:
>
> On February 27, 2017 a group of us met to talk about Scala kernels and 
> pave a path forward for Scala users. There is a youtube video available of 
> the discussion available here:
>
> https://www.youtube.com/watch?v=0NRONVuct0E
>
> What follows is a summary from the call, mostly in linear order from the 
> video itself.
> Attendees
>    
>    - 
>    
>    Alexander Archambault - Jupyter Scala, Ammonium
>    - 
>    
>    Ryan Blue (Netflix) - Toree
>    - 
>    
>    Gino Bustelo (IBM) - Toree
>    - 
>    
>    Joy Chakraborty (Bloomberg) - Spark Magic with Livy
>    - 
>    
>    Kyle Kelley (Netflix) - Jupyter
>    - 
>    
>    Haley Most (Cloudera) - Toree
>    - 
>    
>    Marius van Niekerk (Maxpoint) - Toree, Spylon
>    - 
>    
>    Peter Parente (Maxpoint) - Jupyter
>    - 
>    
>    Corey Stubbs (IBM) - Toree
>    - 
>    
>    Jamie Whitacre (Berkeley) - Jupyter
>    - 
>    
>    Tristan Zajonc (Cloudera) - Toree, Livy
>    
>
> Each of the people on the call has a preferred kernel, way of building it, 
> and integrating it. We have a significant user experience problem in terms 
> of users installing and using Scala kernels, beyond just Spark usage. The 
> overarching goal is to create a cohesive experience for Scala users when 
> they use Jupyter.
>
> When a Scala user tries to come to the Jupyter ecosystem (or even a 
> familiar Python developer), they face many options for kernels. Being faced 
> with choice when trying to get things done is creating new friction points 
> for users. As examples see 
> https://twitter.com/chrisalbon/status/833156959150841856 and 
> https://twitter.com/sarah_guido/status/833165030296322049.
> What are our foundations for REPL libraries in Scala?
>
> Toree was built on top of the Spark REPL and developers tried to use as 
> much code as possible from Spark. For Alex’s jupyter-scala, he recognized 
> that the Spark REPL was changing a lot from version to version. At the same 
> time, Ammonite <https://github.com/lihaoyi/Ammonite> was created to 
> assist in Scala scripting. In order to make big data frameworks such as 
> Spark, Flink, and Scio to work well in this environment, a fork called 
> Ammonium <https://github.com/alexarchambault/ammonium> was created. There 
> is some amount of trepidation in using a separate fork as part of the 
> kernel community. We should make sure to unify with the originating 
> Ammonite and contribute back as part of a larger scala community that can 
> maintain these together.
> Action Items:
>    
>    - 
>    
>    Renew focus on Scala within Toree, improve outward messaging about how 
>    Toree provides a scala kernel
>    - 
>    
>    Unify Ammonite and Ammonium ([email protected])
>    - 
>       
>       To be used in jupyter-scala, potentially for spylon
>       
> There is more than one implementation of the Jupyter protocol in the Java 
> Stack.
>
> Toree has one, jupyter-scala does one, clojure kernels have their own. 
> People would like to see a stable Jupyter library for the JVM. Some think 
> it’s better to have one per language. Regardless of choice, we should have 
> a well supported Jupyter library.
> Action Items:
>
>    - 
>    
>    Create an idiomatic Java Library for the Jupyter messaging protocol - 
>    propose this as an incubation project within Jupyter
>    
> Decouple Spark from Scala in kernels
>
> Decouple language specific parts from the computing framework to allow for 
> using other computing frameworks. This is paramount for R and Python. When 
> we inevitably want to connect to a GPU cluster, we want to be able to use 
> the same foundations of a kernel. The reason that these end up being 
> coupled is that Spark does “slightly weird things” for how it wants its 
> classes compiled. It’s thought that there is some amount of specialization 
> and that we can work around it. At the very least, we can bake it into the 
> core and leave room for other frameworks to have solid built in support if 
> necessary.
>
> An approach being worked on in Toree right now is lazy loading of spark. 
> One concern that is different between jupyter-scala and Toree is that 
> jupyter-scala can dynamically load spark versions whereas for Toree is 
> bound to a version of Spark on deployment. For end users that have 
> operators/admins, kernels can be configured per version of spark it will 
> use (common for Python, R). Spark drives lots of interest in Scala kernel, 
> many kernels conflate the two. This results in poor messaging and 
> experiences for users getting started.
> Action Items:
>
>    - 
>    
>    Lazy load spark within Toree
>    
> Focus efforts within kernel communities
>
> Larger in scope than just the Scala kernel, we need jupyter to acknowledge 
> fully supported kernels. In contrast, the whole community in Zeppelin 
> collaborates in one repository around their interpreters.
>
> “Fragmentation of kernels makes it harder for large enterprises to adopt 
> them.”
>
> - Tristan Zajonc (Cloudera)
>
> Beyond the technical implementation of what is a supported kernel, we also 
> need the messaging to end users to be simple and clear. There are several 
> objectives we need to do to improve our messaging, organization, and 
> technical underpinnings.
> Action Items
>
>    - 
>    
>    On the Jupyter site provide blurbs and links to kernels for R, Python, 
>    and Scala
>    - 
>    
>    Create an organized effort around the Scala Kernel, possibly by 
>    unifying in an organization while isolating projects in separate 
>    repositories
>    - 
>    
>    Align a specification of what it takes to be acknowledged as a 
>    supported kernel
>    
> Visualization
>
> We would like to be able to push on the idea of mimetypes that output a 
> hunk of JSON and are able to draw beautiful visualizations. Having these 
> adopted in core Jupyter by default would go a long way towards providing 
> simple just works visualization. The current landscape of visualization 
> with the Scala kernels includes
>
>
>    - 
>    
>    Vegas <https://github.com/vegas-viz/Vegas>
>    - 
>    
>    Plotly Scala <https://github.com/alexarchambault/plotly-scala>
>    - 
>    
>    Brunel <https://github.com/Brunel-Visualization/Brunel>
>    - 
>    
>    Data Resource / Table Schema (see 
>    https://github.com/pandas-dev/pandas/pull/14904)
>    
>
> There is a bit of worry about standardization around the HTML outputs. 
> Some libraries try to use frontend libraries that may not exist on the 
> frontend or mismatch in version - jquery, requirejs, ipywidgets, jupyter, 
> ipython. In some frontends, at times dictated by the operating environment, 
> the HTML outputs must be in null origin iframes.
> Action Items
>    
>    - 
>    
>    Continue involvement in Jupyter frontends to provide rich 
>    visualization out of the box with less configuration and less friction
>    
> Standardizing display and reprs for Scala
>
> Since it’s likely that we there will still be multiple kernels available 
> for the JVM, not just within Scala, we want to standardize the way in which 
> you inspect objects in the JVM. IPython provides a way for libraries to 
> integrate with IPython automatically for users. We want library developers 
> to be able to follow a common scheme and be well represented regardless of 
> the kernel.
> Action Items:
>    
>    - Create a specification for object representation for JVM languages 
>    as part of the Jupyter project
>
>
> -- 
> Kyle Kelley (@rgbkrk <https://twitter.com/rgbkrk>; lambdaops.com)
>

-- 
You received this message because you are subscribed to the Google Groups 
"Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jupyter/c9d4c28b-0310-41fd-92b8-3d2705ac60dd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[jupyter] Re: Scala Kernel Discussion

Reply via email to