Re: R and SparkR Support

Eric Charles Tue, 23 Feb 2016 10:20:47 -0800

It would make no sense merging both.

From an end-user perspective, I guess both are equivalent, althoughwith the last commit I made, the Zeppelin Display system is supported in702 (I had no luck when testing this functionality with 208). As I said,feel free to test both and send feature requests.

From a developer perspective, I will reiterate the points I sent on [1]which are addressed in 702 (these points make sense to me but didn'treceive echo so far - would like to get feedback on these):

1.- Use rscala jar instead of forking -> allows to support the platformversion (scala version...) and benefit from the rscala project newversions with patches without having to maintain in the zeppelin sourcetree fork.


2.- Just like Python, develop R in the Spark module

3.- Support the same behavior asthe rest (no TABLE when output is adataframe, support the HTML, TABLE and IMG display system, support theDynamic Form system).


I still have the Dynamic Form system operational.

[1]http://mail-archives.apache.org/mod_mbox/incubator-zeppelin-dev/201512.mbox/%3C5683E471.9010001%40apache.org%3E


On 23/02/16 19:09, Jeff Steinmetz wrote:

Thank you Amos Elberg & Eric Charles:
Is the goal of the community to merge both 208 and 702 at some point as two 
“different” R interpreters?

One that is
   %r
And another that is
   %spark.r

Still trying to wrap my head around the difference.




On 2/23/16, 9:34 AM, "Amos B. Elberg" <amos.elb...@gmail.com> wrote:

Jeff - 702 isn't a fork, it's an alternative based on 208 that has a subset of 
208's features.  208 is the superset. 208 is also what the community is now 
attempting to integrate.

R does support serialization of functions.

208 does support passing a spark table back and forth between R and scala. 
Passing a data.frame through the Zeppelin context will fail in spark up to 1.5. 
It may now be working for some data frames in 1.6.

There are examples that do all these things in the documentation for 208 on my 
repo at github.com/elbamos/Zeppelin-With-R

On Feb 23, 2016, at 12:03 PM, Jeff Steinmetz <jeffrey.steinm...@gmail.com> 
wrote:

Hello zeppelin dev group,

Regarding the R Interpreter Pull requests 208 and 702.  I am trying to figure 
out if the functionality between these are overlapping, or one supports 
something different than the other.  Is 702 a super set of 208 (702 is a fork 
of 208)?

Can you pass the reference of a distributed (parallelized) dataframe built in %spark 
(scala) to the R interpreter?   Similar to z.put(“myDF", myDF)?

Similarly, since R doesn’t support serialization of functions (unless you use 
something from the SparkR library) is there an example of collecting the 
parallel DF to a local DF (which I realize it means the dataset needs to fit in 
local memory on the zeppelin server).

I can to dig into this a bit and help out where appropriate, however its 
unclear which PR to focus my efforts on.

Best,
Jeff Steinmetz
Principal Architect
Akili Interactive Labs

On 2/23/16, 8:01 AM, "elbamos" <g...@git.apache.org> wrote:

Github user elbamos commented on the pull request:

   https://github.com/apache/incubator-zeppelin/pull/702#issuecomment-187764059

   @btiernay support for that has been in 208 all along...

On Feb 23, 2016, at 9:27 AM, Bob Tiernay <notificati...@github.com> wrote:

@echarles This is great! Thanks for all your hard work. Very much appreciated!

â•‰
Reply to this email directly or view it on GitHub.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: R and SparkR Support

Reply via email to