@enzo - 208 does not need an R administrator in server of single user mode. This is because the R package that 208 uses is segregated--the same approach is used by other server-based R systems like RStudio and shiny.
702, the problem is unsolvable except by locking-down the system's R installation entirely. Again, nothing is achieved by the "design-choice" in 702. I have been asking for a long time, and no one has even tried to offer any benefit to it. I can't see any reason we're still talking about 702-in fact any reason 702 exists at all-other than the games that some people have been playing with 208 for the past six months. It's time to stop this nonsense. > On Mar 13, 2016, at 5:27 PM, enzo <e...@smartinsightsfromdata.com> wrote: > > Just a side consideration on package management, expanding some of Jaff’s > comments. > > As well all know R is in essence a “single user” tool. I assume at the > beginning Zeppelin will be the same. > > On my machine I have two libraries where my R packages are stored, a system > library and a user library. > > I imagine Zeppelin will “discover” my personal user library and of course the > system library. As such there is no need for anything else. Any package > necessary for the R interpreter(s) could be stored in the user’s library. > > This would work, but in principle a user could still download rscala or > similar packages to use for example with Rstudio or in a parallel instance of > Jupyter (using IRkernel), hence the way Zeppelin will manage these will have > to be error-proof (maybe managing a Zeppelin library, not accessible / > updatable by RStudio or R GUI on the system? I am not sure what is the > design of choice of PR 702 or PR 208). > > While maybe premature, I think we need also to discuss what is going to > happen in case of a Zeppelin Server, serving many users. This extends a bit > normal operating patterns for R. I don’t think in such a case we should > expect to have a R administrator managing packages outside of Zeppelin (does > Zeppelin plan to have in such a case an admin interface??). > > In case of Zeppelin Server different scenarios could be applied: > A possibility would be to have a single library dedicated to the server where > all users share all packages. The issue would be how to maintain it, and what > if different users will require different versions of some package? While it > may appear cumbersome, it may be appropriate in environments where they plan > to control rigidly which version of a package is applied when. > Possibly the most functionally complete approach would be to have a library > per users (plus probably a “private” library for Zeppelin - or dedicated > packages copied into each library, as currently done by Rstudio with > rstudioapi). > > What are the plans / ideas on this for PR 208 / 702? > > It would be interesting to have the re-assurance that whichever design there > will be flexibility to implement different approached in the future... > > > > Enzo > e...@smartinsightsfromdata.com > > > >> On 9 Mar 2016, at 18:56, Jeff Steinmetz <jeffrey.steinm...@gmail.com> wrote: >> >> The package management thoughts I presented could be considered general >> suggestions for any R interpreter improvement, with implications beyond just >> rScala. >> >> Researching options to lock down package management in the R notebook were a >> suggestions I raised, which I wouldn’t consider to be a show stoppers for >> getting an R interpreter off the ground as a first step. >> Although, if we came up with an awesome solution to help buckle down R >> package management in Zeppelin, there is no reason not to give it a try or >> discuss its utility. >> R Interpreter functionality and security can mature over time via small >> iterative improvements and collaboration. >> >> >> >> >> Cheers, Jeff >> >> >>> On 3/9/16, 10:06 AM, "Amos B. Elberg" <amos.elb...@gmail.com> wrote: >>> >>> Eric, denying what happened is just silly. Theres a whole record of it. >>> >>> When you started, your code downloaded the latest rscala. Your code broke >>> entirely when rscala was updated. You scrambled to change it, and at the >>> same time you started bundling the binary so you could control the version >>> of the rscala jar, which you mistakenly thought would prevent it from >>> breaking again. Then you found out (from me) about the R library issue, so >>> you started letting the user pick the rscala version at build time. Which >>> still doesn't fix it. >>> >>> Now, you say that implementing your solution requires an administrator to >>> lock-down the machine running Zeppelin, etc etc etc --- but to what >>> benefit? There's no way to take advantage of new rscala features anyway. >>> >>> Regarding pyspark, we have to maintain version parity with spark. Python >>> also doesn't have r's library management system. >>> >>> Why are you still defending this? It was a poor design decision. Move on. >>> >>>> On Mar 9, 2016, at 1:05 AM, Eric Charles <e...@apache.org> wrote: >>>> >>>> >>>>> On 09/03/16 06:41, Amos B. Elberg wrote: >>>>> That's not true eric. When rscala was updated to 1.0.8, your interpreter >>>>> broke entirely. You then rushed to fix it, and that's when you began >>>>> including the binary in the distribution. This is all in the commit logs. >>>> >>>> I always shipped or downloaded the rscala jar... >>>> >>>>> I recognize that you now allow the user to select an rscala version at >>>>> build time, which means they have to compile Zeppelin for a specific >>>>> rscala version. >>>> >>>> User has not to choose. Packager and devops will do it for the user and >>>> user should not have the permission to install/update any package. >>>> >>>>> What's the point? What you've achieved is to replace 3 short source >>>>> files, by introducing a proven instability, a maintenance burden on the >>>>> user, and a support burden on us when 200 people show up with obscure >>>>> error messages that have to be diagnosed. >>>> >>>> Let me take the analogy with the py4j jar which fulfills a similar role as >>>> rscala jar, the role of binder between scala and another language. >>>> >>>> With spark 1.6, the pyspark version has been updated. If Zeppelin had >>>> shipped the source code, zeppelin developer would have had the >>>> responsibility to update the complete source code of pyspark. >>>> >>>> In our case (relying on external jar), upgrading from 0.8.2.1 to 0.9 was >>>> easy: >>>> >>>> https://github.com/apache/incubator-zeppelin/pull/463/files#diff-dbda0c4083ad9c59ff05f0273b5e760fR320 >>>> >>>> That approach has also the enormous advantage to support not aonly >>>> different rscala versions but also different scala profiles (2.10, >>>> 2.11...). >>>> >>>>>> On Mar 9, 2016, at 12:22 AM, Eric Charles <e...@apache.org> wrote: >>>>>> >>>>>> >>>>>> >>>>>>> On 09/03/16 06:05, Amos B. Elberg wrote: >>>>>>> Jeff you're correct that when Zeppelin is being professionally >>>>>>> administered, the administrator can take care of all of this. >>>>>>> >>>>>>> But why create an additional system administration task? And what about >>>>>>> users without professional systems admins? >>>>>>> >>>>>>> The only "benefit" to doing it that way, is we bundle a binary instead >>>>>>> of source, when the source is likely to never need updating. That >>>>>>> doesn't seem like a "benefit" at all. >>>>>>> >>>>>>> And in exchange for that, the cost is things like having to lock down R >>>>>>> or prevent package updates? That doesn't make much sense to me. >>>>>>> >>>>>>> The question of which method has more overhead has been answered >>>>>>> empirically: this issue already broke 702, and there have been a whole >>>>>>> series of revisions to it to address various issues, with no end in >>>>>>> sight. >>>>>> >>>>>> Amos, You mention a few times that 'the issue broke 702...'. I don't see >>>>>> when and why that particular approach broke anything. >>>>>> >>>>>> I certainly experimented and reported that the rscala version alignment >>>>>> is important, like any version of linux, jdk, scala, spark... dependency. >>>>>> >>>>>> What I changed, after Moon comment, is the download of the jar at build >>>>>> time instead of shipping the jar in the source tree. This approach has >>>>>> the advantage of being able to define at build time which version you >>>>>> want. >>>>>> >>>>>>> Meanwhile, this part of 208 has been stable for six months, without a >>>>>>> single user issue. >>>>>>> >>>>>>> >>>>>>>> On Mar 8, 2016, at 8:34 PM, Jeff Steinmetz >>>>>>>> <jeffrey.steinm...@gmail.com> wrote: >>>>>>>> >>>>>>>> During my tests, I found the rScala installation its dependencies to >>>>>>>> be manageable - even though it may not be ideal (i.e. the source is >>>>>>>> not included). >>>>>>>> The Zeppelin build already needs to target the correct version of >>>>>>>> Spark + Hadoop so for this exercise, I treat rScala similarly, as an >>>>>>>> additional build and install consideration. >>>>>>>> >>>>>>>> I’m looking at this through a specific lens: >>>>>>>> “As a technology decision maker in a company, could I and would I >>>>>>>> deploy this in our environment? Could I work with our Data Scientist >>>>>>>> team to implement its usage? Would the Data Engineering and Data >>>>>>>> Science team that commonly use R find it useful?” >>>>>>>> >>>>>>>> Inadvertent rScala updates could be minimized with education, letting >>>>>>>> the users know that R package management within a notebook should be >>>>>>>> avoided. Which generally seems like a good idea regardless of how >>>>>>>> R-Zeppelin is implemented since it’s a shared environment (you don’t >>>>>>>> want to break other users graphs with an rGraph update or uninstall >>>>>>>> ggplot2, etc, etc.) >>>>>>>> >>>>>>>> Even better - what if there was a zeppelin config that disabled >>>>>>>> `install.packages()`, `remove.packages()` and `update.packages()`. >>>>>>>> This would allow package installation to be carried out only by >>>>>>>> administrators or devops outside of Zeppelin. >>>>>>>> Although its not clear on the effort vs. benefit, I’m sure somebody >>>>>>>> crafty could come up with a way around this with a convoluted Eval or >>>>>>>> running something through the shell in Zeppelin. >>>>>>>> >>>>>>>> R, Python and Scala all have pretty wide open door to parts of the >>>>>>>> underlying operating system. >>>>>>>> A 100% bullet proof way to locking “everything" down is a tough >>>>>>>> challenge. >>>>>>>> >>>>>>>> ---- >>>>>>>> Jeff Steinmetz >>>>>>>> Principal Architect >>>>>>>> Akili Interactive >>>>>>>> www.akiliinteractive.com <http://www.akiliinteractive.com/> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On 3/8/16, 12:16 PM, "Amos B. Elberg" <amos.elb...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> Jeff - one of the problems with the rscala approach in 702 is it >>>>>>>>> doesn't take into account the R library. If rscala gets updated, the >>>>>>>>> user will likely download and update it automatically when they call >>>>>>>>> update.packages(). The result will be that the version of the rscala >>>>>>>>> R package doesn't match the version of the rscala jar, and the >>>>>>>>> interpreter will fail. Or, if the jar is also updated, it will simply >>>>>>>>> break 702. >>>>>>>>> >>>>>>>>> This has happened to 702 already-702 changed its library management >>>>>>>>> because an update to rscala broke the prior version. Actually though, >>>>>>>>> every rscala update is going to break 702. >>>>>>>>> >>>>>>>>> Regarding return values from the interpreter, the norm in R is that >>>>>>>>> the return value of the last expression is shown in a native format. >>>>>>>>> So, if the result is a data frame, a dataframe should be visualized. >>>>>>>>> If the result is an html object, it should be interpreted as html by >>>>>>>>> the browser. Do you disagree? >>>>>>>>> >>>>>>>>>> On Mar 8, 2016, at 2:52 PM, Jeff Steinmetz >>>>>>>>>> <jeffrey.steinm...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>> RE 702: >>>>>>>>>> I wanted to respond to Eric’s discussion from December 30. >>>>>>>>>> >>>>>>>>>> I finally had some time to put aside a good chunk of dedicated, >>>>>>>>>> uninterrupted time. >>>>>>>>>> This means I had a chance to “really” dig into this with a Data >>>>>>>>>> Science R developer hat on. >>>>>>>>>> I also thought about this from a DevOps point of view (deploying in >>>>>>>>>> an EC2 cluster, standalone, locally, VM). >>>>>>>>>> I tested it with a spark installation outside of the zeppelin build >>>>>>>>>> - as if it was running on a cluster or standalone install. >>>>>>>>>> >>>>>>>>>> I also had a chance to dig under the hood a bit, and explore what >>>>>>>>>> the Java/Scala code in PR 702 is doing. >>>>>>>>>> >>>>>>>>>> I like the simplicity of this PR (the source code and approach). >>>>>>>>>> >>>>>>>>>> Works as expected, all graphic works, interactive charts works. >>>>>>>>>> >>>>>>>>>> I also see your point about Rendering the text result vs TABLE plot >>>>>>>>>> when the R interpreter result is a data frame. >>>>>>>>>> To confirm - the approach is to use %sql to display it in a native >>>>>>>>>> Zeppelin visualization. >>>>>>>>>> >>>>>>>>>> Your approach makes sense, since this in line with how this works in >>>>>>>>>> other Zeppelin work flows. >>>>>>>>>> I suppose you could add an R interpreter function, such as: >>>>>>>>>> z.R.showDFAsTable(fooDF) if we wanted to force the data frame into a >>>>>>>>>> %table without having to jump to %sql (perhaps a nice addition in >>>>>>>>>> this or a future PR). >>>>>>>>>> >>>>>>>>>> It’s GREAT that %r print('%html') works with the Zeppelin display >>>>>>>>>> system! (as well as the other display system methods) >>>>>>>>>> >>>>>>>>>> Regarding rscala jar. You have a profile that will allow us to sync >>>>>>>>>> up the version rscala, so that makes sense as well. >>>>>>>>>> This too worked as expected. I specifically installed rscala (as >>>>>>>>>> you describe in your docs) in the VM with: >>>>>>>>>> >>>>>>>>>> curl >>>>>>>>>> https://cran.r-project.org/src/contrib/Archive/rscala/rscala_1.0.6.tar.gz >>>>>>>>>> -o /tmp/rscala_1.0.6.tar.gz >>>>>>>>>> R CMD INSTALL /tmp/rscala_1.0.6.tar.gz >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Installing rscala outside of the Zeppelin dependencies does seem to >>>>>>>>>> keep this PR simpler, and reduces the licensing overhead required to >>>>>>>>>> get this PR through (based on comments I see from others) >>>>>>>>>> >>>>>>>>>> I would need to add the two rscala install lines above to PR#751 (I >>>>>>>>>> will add this today) >>>>>>>>>> https://github.com/apache/incubator-zeppelin/pull/751 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Regarding the Interpreters. Just having %r as the our first >>>>>>>>>> interpreter keyword makes sense. Loading knitr within the >>>>>>>>>> interpreter to enable rendering (versus having a %knitr interpreter >>>>>>>>>> specifically) seems to keep things simple. >>>>>>>>>> >>>>>>>>>> In summary - Looks good since everything in your sample R notebook >>>>>>>>>> (as well as a few other tests I tried) worked for me using the VM >>>>>>>>>> script in PR#751. >>>>>>>>>> The documentation also facilitated a smooth installation and allowed >>>>>>>>>> me to create a repeatable script, that when paired with the VM >>>>>>>>>> worked as expected. >>>>>>>>>> >>>>>>>>>> ---- >>>>>>>>>> Jeff Steinmetz >>>>>>>>>> Principal Architect >>>>>>>>>> Akili Interactive >>>>>>>>>> www.akiliinteractive.com <http://www.akiliinteractive.com/> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> From >>>>>>>>>>> Eric Charles <e...@apache.org> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Subject >>>>>>>>>>> [DISCUSS] PR #208 - R Interpreter for Zeppelin >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Date >>>>>>>>>>> Wed, 30 Dec 2015 14:04:33 GMT >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I had a look at >>>>>>>>>>> https://github.com/apache/incubator-zeppelin/pull/208 >>>>>>>>>>> (and related Github repo https://github.com/elbamos/Zeppelin-With-R >>>>>>>>>>> [1]) >>>>>>>>>>> >>>>>>>>>>> Here are a few topics for discussion based on my experience >>>>>>>>>>> developing >>>>>>>>>>> https://github.com/datalayer/zeppelin-R [2]. >>>>>>>>>>> >>>>>>>>>>> 1. rscala jar not in Maven Repository >>>>>>>>>>> >>>>>>>>>>> [1] copies the source (scala and R) code from rscala repo and >>>>>>>>>>> changes/extends/repackages it a bit. [2] declares the jar as system >>>>>>>>>>> scoped library. I recently had incompatibly issues between the 1.0.8 >>>>>>>>>>> (the one you get since 2015-12-10 when you install rscala on your R >>>>>>>>>>> environment) and the 1.0.6 jar I am using part of the zeppelin-R >>>>>>>>>>> build. >>>>>>>>>>> To avoid such issues, why not the user choosing the version via a >>>>>>>>>>> property at build time to fit the version he runs on its host? This >>>>>>>>>>> will >>>>>>>>>>> also allow to benefit from the next rscala releases which fix bugs, >>>>>>>>>>> bring not features... This also means we don't have to copy the >>>>>>>>>>> rscala >>>>>>>>>>> code in Zeppelin tree. >>>>>>>>>>> >>>>>>>>>>> 2. Interpreters >>>>>>>>>>> >>>>>>>>>>> [1] proposes 2 interpreters %sparkr.r and %sparkr.knitr which are >>>>>>>>>>> implemented in their own module apart from the Spark one. To be >>>>>>>>>>> aligned >>>>>>>>>>> the existing pyspark implementation, why not integrating the R code >>>>>>>>>>> into >>>>>>>>>>> the Spark one? Any reason to keep 2 versions which does basically >>>>>>>>>>> the >>>>>>>>>>> same? The unique magic keyword would then be %spark.r >>>>>>>>>>> >>>>>>>>>>> 3. Rendering TABLE plot when interpreter result is a dataframe >>>>>>>>>>> >>>>>>>>>>> This may be confusing. What if I display a plot and simply want to >>>>>>>>>>> print >>>>>>>>>>> the first 10 rows at the end of my code? To keep the same behavior >>>>>>>>>>> as >>>>>>>>>>> the other interpreters, we could make this feature optional >>>>>>>>>>> (disabled by >>>>>>>>>>> default, enabled via property). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thx, Eric >>>>>>>> >> >> >