Re: R Interpreter - PR 702

Amos B. Elberg Tue, 08 Mar 2016 12:17:21 -0800

Jeff - one of the problems with the rscala approach in 702 is it doesn't take 
into account the R library. If rscala gets updated, the user will likely 
download and update it automatically when they call update.packages(). The 
result will be that the version of the rscala R package doesn't match the 
version of the rscala jar, and the interpreter will fail. Or, if the jar is 
also updated, it will simply break 702.


This has happened to 702 already-702 changed its library management because an 
update to rscala broke the prior version. Actually though, every rscala update 
is going to break 702.

Regarding return values from the interpreter, the norm in R is that the return 
value of the last expression is shown in a native format. So, if the result is 
a data frame, a dataframe should be visualized. If the result is an html 
object, it should be interpreted as html by the browser.  Do you disagree? 

> On Mar 8, 2016, at 2:52 PM, Jeff Steinmetz <[email protected]> 
> wrote:
> 
> RE 702:
> I wanted to respond to Eric’s discussion from December 30.
> 
> I finally had some time to put aside a good chunk of dedicated, uninterrupted 
> time.
> This means I had a chance to “really” dig into this with a Data Science R 
> developer hat on.
> I also thought about this from a DevOps point of view (deploying in an EC2 
> cluster, standalone, locally, VM).
> I tested it with a spark installation outside of the zeppelin build - as if 
> it was running on a cluster or standalone install.
> 
> I also had a chance to dig under the hood a bit, and explore what the 
> Java/Scala code in PR 702 is doing.
> 
> I like the simplicity of this PR (the source code and approach).  
> 
> Works as expected, all graphic works, interactive charts works.
> 
> I also see your point about Rendering the text result vs TABLE plot when the 
> R interpreter result is a data frame.
> To confirm - the approach is to use  %sql to display it in a native Zeppelin 
> visualization.  
> 
> Your approach makes sense, since this in line with how this works in other 
> Zeppelin work flows.
> I suppose you could add an R interpreter function, such as: 
> z.R.showDFAsTable(fooDF) if we wanted to force the data frame into a %table 
> without having to jump to %sql (perhaps a nice addition in this or a future 
> PR).
> 
> It’s GREAT that %r print('%html') works with the Zeppelin display system!  
> (as well as the other display system methods)
> 
> Regarding rscala jar.  You have a profile that will allow us to sync up the 
> version rscala, so that makes sense as well.  
> This too worked as expected.  I specifically installed rscala (as you 
> describe in your docs) in the VM with:
> 
> curl 
> https://cran.r-project.org/src/contrib/Archive/rscala/rscala_1.0.6.tar.gz -o 
> /tmp/rscala_1.0.6.tar.gz
> R CMD INSTALL /tmp/rscala_1.0.6.tar.gz
> 
> 
> Installing rscala outside of the Zeppelin dependencies does seem to keep this 
> PR simpler, and reduces the licensing overhead required to get this PR 
> through (based on comments I see from others)
> 
> I would need to add the two rscala install lines above to PR#751 (I will add 
> this today)
> https://github.com/apache/incubator-zeppelin/pull/751
> 
> 
> Regarding the Interpreters.   Just having %r as the our first interpreter 
> keyword makes sense.   Loading knitr within the interpreter to enable 
> rendering (versus having a %knitr interpreter specifically) seems to keep 
> things simple.
> 
> In summary - Looks good since everything in your sample R notebook (as well 
> as a few other tests I tried) worked for me using the VM script in PR#751.
> The documentation also facilitated a smooth installation and allowed me to 
> create a repeatable script, that when paired with the VM worked as expected.
> 
> ----
> Jeff Steinmetz
> Principal Architect
> Akili Interactive
> www.akiliinteractive.com <http://www.akiliinteractive.com/>
> 
> 
> 
> 
> 
> 
> 
>> From
>>   Eric Charles <[email protected]>
>> 
>> 
>>   Subject
>>   [DISCUSS] PR #208 - R Interpreter for Zeppelin
>> 
>> 
>>   Date
>> Wed, 30 Dec 2015 14:04:33 GMT
> 
> 
>> Hi,
>> 
>> I had a look at https://github.com/apache/incubator-zeppelin/pull/208 
>> (and related Github repo https://github.com/elbamos/Zeppelin-With-R [1])
>> 
>> Here are a few topics for discussion based on my experience developing 
>> https://github.com/datalayer/zeppelin-R [2].
>> 
>> 1. rscala jar not in Maven Repository
>> 
>> [1] copies the source (scala and R) code from rscala repo and 
>> changes/extends/repackages it a bit. [2] declares the jar as system 
>> scoped library. I recently had incompatibly issues between the 1.0.8 
>> (the one you get since 2015-12-10 when you install rscala on your R 
>> environment) and the 1.0.6 jar I am using part of the zeppelin-R build. 
>> To avoid such issues, why not the user choosing the version via a 
>> property at build time to fit the version he runs on its host? This will 
>> also allow to benefit from the next rscala releases which fix bugs, 
>> bring not features... This also means we don't have to copy the rscala 
>> code in Zeppelin tree.
>> 
>> 2. Interpreters
>> 
>> [1] proposes 2 interpreters %sparkr.r and %sparkr.knitr which are 
>> implemented in their own module apart from the Spark one. To be aligned 
>> the existing pyspark implementation, why not integrating the R code into 
>> the Spark one? Any reason to keep 2 versions which does basically the 
>> same? The unique magic keyword would then be %spark.r
>> 
>> 3. Rendering TABLE plot when interpreter result is a dataframe
>> 
>> This may be confusing. What if I display a plot and simply want to print 
>> the first 10 rows at the end of my code? To keep the same behavior as 
>> the other interpreters, we could make this feature optional (disabled by 
>> default, enabled via property).
>> 
>> 
>> Thx, Eric
>

Re: R Interpreter - PR 702

Reply via email to