RE 702: I wanted to respond to Eric’s discussion from December 30. I finally had some time to put aside a good chunk of dedicated, uninterrupted time. This means I had a chance to “really” dig into this with a Data Science R developer hat on. I also thought about this from a DevOps point of view (deploying in an EC2 cluster, standalone, locally, VM). I tested it with a spark installation outside of the zeppelin build - as if it was running on a cluster or standalone install.
I also had a chance to dig under the hood a bit, and explore what the Java/Scala code in PR 702 is doing. I like the simplicity of this PR (the source code and approach). Works as expected, all graphic works, interactive charts works. I also see your point about Rendering the text result vs TABLE plot when the R interpreter result is a data frame. To confirm - the approach is to use %sql to display it in a native Zeppelin visualization. Your approach makes sense, since this in line with how this works in other Zeppelin work flows. I suppose you could add an R interpreter function, such as: z.R.showDFAsTable(fooDF) if we wanted to force the data frame into a %table without having to jump to %sql (perhaps a nice addition in this or a future PR). It’s GREAT that %r print('%html') works with the Zeppelin display system! (as well as the other display system methods) Regarding rscala jar. You have a profile that will allow us to sync up the version rscala, so that makes sense as well. This too worked as expected. I specifically installed rscala (as you describe in your docs) in the VM with: curl https://cran.r-project.org/src/contrib/Archive/rscala/rscala_1.0.6.tar.gz -o /tmp/rscala_1.0.6.tar.gz R CMD INSTALL /tmp/rscala_1.0.6.tar.gz Installing rscala outside of the Zeppelin dependencies does seem to keep this PR simpler, and reduces the licensing overhead required to get this PR through (based on comments I see from others) I would need to add the two rscala install lines above to PR#751 (I will add this today) https://github.com/apache/incubator-zeppelin/pull/751 Regarding the Interpreters. Just having %r as the our first interpreter keyword makes sense. Loading knitr within the interpreter to enable rendering (versus having a %knitr interpreter specifically) seems to keep things simple. In summary - Looks good since everything in your sample R notebook (as well as a few other tests I tried) worked for me using the VM script in PR#751. The documentation also facilitated a smooth installation and allowed me to create a repeatable script, that when paired with the VM worked as expected. ---- Jeff Steinmetz Principal Architect Akili Interactive www.akiliinteractive.com <http://www.akiliinteractive.com/> >From > Eric Charles <e...@apache.org> > > > Subject > [DISCUSS] PR #208 - R Interpreter for Zeppelin > > > Date >Wed, 30 Dec 2015 14:04:33 GMT >Hi, > >I had a look at https://github.com/apache/incubator-zeppelin/pull/208 >(and related Github repo https://github.com/elbamos/Zeppelin-With-R [1]) > >Here are a few topics for discussion based on my experience developing >https://github.com/datalayer/zeppelin-R [2]. > >1. rscala jar not in Maven Repository > >[1] copies the source (scala and R) code from rscala repo and >changes/extends/repackages it a bit. [2] declares the jar as system >scoped library. I recently had incompatibly issues between the 1.0.8 >(the one you get since 2015-12-10 when you install rscala on your R >environment) and the 1.0.6 jar I am using part of the zeppelin-R build. >To avoid such issues, why not the user choosing the version via a >property at build time to fit the version he runs on its host? This will >also allow to benefit from the next rscala releases which fix bugs, >bring not features... This also means we don't have to copy the rscala >code in Zeppelin tree. > >2. Interpreters > >[1] proposes 2 interpreters %sparkr.r and %sparkr.knitr which are >implemented in their own module apart from the Spark one. To be aligned >the existing pyspark implementation, why not integrating the R code into >the Spark one? Any reason to keep 2 versions which does basically the >same? The unique magic keyword would then be %spark.r > >3. Rendering TABLE plot when interpreter result is a dataframe > >This may be confusing. What if I display a plot and simply want to print >the first 10 rows at the end of my code? To keep the same behavior as >the other interpreters, we could make this feature optional (disabled by >default, enabled via property). > > >Thx, Eric > > >