Small precisions on #702:
(snip...)
- 702
* CI fails
Just pushed and it is failing for non R related reasons...
Most importantly, I have seen since a few days that the test are no more
executed for the spark interpreter for all PR builds
[INFO] --- maven-surefire-plugin:2.17:test (default-test) @
zeppelin-spark ---
[INFO] Tests are skipped.
Will have a look at it.
* no tests
There is some test
https://github.com/datalayer/zeppelin-datalayer/blob/rscala-z/spark/src/test/java/org/apache/zeppelin/spark/SparkRInterpreterTest.java
* no docs
There is some doc
https://github.com/datalayer/zeppelin-datalayer/blob/rscala-z/docs/interpreter/R.md
That being said, my personal opinion is that we should follow C, and #208
there has more chances of being merged first.
Again, the goal is not to compare both contributions in terms of
features/merit and decide here which is better, but to build a consensus on
how we as a community proceed in situation of two contributions of same
pluggable feature. In this thread, it means to have no -1s for for at least
one option, though a thoughtful compromise from all sides.
What do you guys think?
I would favor b) but this may take too much time, so to get users the
best choice as soon as possible, c) sounds to me like the way to go.
With PPMC hat on, I feel that we may need to start a separate thread for a
generalised decidion-making process in such situation, irrigating of
current state of issue with R interpreter. And after a making a decision
there, we could use the same guiding principle to resolve this issue, as
well as any other one in the future.
--
Alex
On Tue, Mar 8, 2016 at 2:45 PM, Jeff Steinmetz <jeffrey.steinm...@gmail.com>
wrote:
I should clarify my preference regarding Plan A (to only merge 1 - at
least initially).
“Which” PR to merge (or merge first) is TBD - at least for myself. I’m
still testing both PR options.
Since the original request was not to debate the fate\merit\features of
any particular contribution in this thread, I’ll post my 702 PR findings
separately.
----
Jeff Steinmetz
Principal Architect
Akili Interactive
www.akiliinteractive.com <http://www.akiliinteractive.com/>
On 3/2/16, 9:12 PM, "Jeff Steinmetz" <jeffrey.steinm...@gmail.com> wrote:
I too prefer plan A - merging two different R interpreters sounds like a
maintenance and documentation headache for end users.
Do you or the community feel there are “specific” additional steps from a
“technical” or “development” perspective that need to happen in order merge
208?
If we know what’s holding back the merge technically (all history aside)
we can work as a community to solve it.
Olympic spirit!
Looking forward to helping this through.
----
Jeff Steinmetz
Principal Architect
Akili Interactive
On 3/2/16, 8:14 PM, "Amos Elberg" <amos.elb...@gmail.com> wrote:
Alex -- the gist of my email is that we already have a consensus, and
have had
a consensus since November. The consensus was to merge 208. That's
"Plan A."
With all respect, I don't see that anyone other than you believes we
don't
have a consensus on Plan A already, or has any issue with Plan A.
In fact, I'm going to call now for "lazy consensus" on Plan A: End the
debate
and move rapidly to merge 208, completing whatever work is necessary to
do
that (if any).
For the record, yes, I do object to Plan C. Numerous users have
complained
that with two different PRs, they don't know which interpreter to use.
That's
a strong reason to not merge two. In fact it will confuse people more,
because
one interpreter's R environment won't be shared with the other
interpreter,
and you can't move variables between them. Moreover, no-one has
presented any
benefit to merging the second one.
In addition, while 208 seems to be ready to merge (waiting only on the
work
you're doing on CI), the second PR is nowhere close. So, that's another
reason: 208 should not have to wait for the other to be ready.
But in any event, I disagree that there is any issue here.
If you intend to continue this thread, then please address the issues
raised
in my e-mail earlier. Please also explain any strong objection to Plan
A.
Thanks,
-Amos
On Thursday, March 03, 2016 12:09:33 PM Alexander Bezzubov wrote:
Guys, please let's keep the discussion focused on the subject.
Amos, I do not understand, are you saying that you do object on the
community proceeding with plan C? If not - there is no need to
answer\post
in this thread right now.
Again, we are not debating fate\merit\features of any particular
contribution here.
Please post in this thread only if you strongly disagree with the
suggested
plan.
I'm calling for a lazy consensus and as soon as there are no
objections -
will be ready to proceed with the plan above.
Sooner we reach a consensus on the topic - sooner we can make further
progress.
--
Alex
On Thu, Mar 3, 2016 at 11:45 AM, Amos Elberg <amos.elb...@gmail.com>
wrote:
Alex - What are we still debating at this point?
I'm starting to feel like Charlie Brown with the football here.
The PR was submitted in August and originally reviewed at the
beginning of
September.
In, I think, early December, it was then extensively reviewed and
discussed. I made a few requested changes, and at that time there
was a
decision to merge 208 pending Moon working on the CI problem.
In January the PR was reviewed again, by you and others, and I
thought
you'd decided to merge pending some changes from me, and you were
going to
work on CI.
In February, when people continued to email the list to ask what was
up,
we
said again that the community was moving to merge 208.
The thread started a few days ago. Nobody argued for changing the
plan.
The discussion lapsed until, today, I responded to a technical point.
I'm not sure why this is coming up again. If Eric (or others) feel
strongly about the issues Eric raised with 208, which is things like
whether to link rscala or fork it (or whatever), why can't they just
submit
PRs with those change after 208 is merged? The architectures of the
two
PRs have been converging as Eric's been incorporating functionality.
No-one claims that Eric's interpreter provides any additional
functionality, or that its more stable, or anything like that. So
why are
we still talking about this?
If the issue is that Eric put in substantial work, that was a choice
he
made after he knew the status of 208. He also had the benefit of
seeing
how I solved various technical problems, like using rscala, sharing
the
Spark Context, etc. In fact, when I first started on this project,
I saw
that Eric had done some preliminary work, and wrote him to see if we
could
collaborate. He wasn't interested. In November, when I heard that
Datalayer had produced an interpreter (I didn't realize Datalayer is
Eric)
I wrote them offering to work together. No reply. And in December
also.
No reply. Eric didn't even submit the PR until after there was
already a
consensus to merge 208. His PR only started to approach feature
parity in
the last few weeks, after we decided *again* to try to merge 208.
Someone commented earlier in this thread that we need to get this
resolved
so the community can move on. I agree. I want to move on also.
Is there any substantial reason at this point why we're revisiting
the
issue instead of simply trying to merge 208? Is there any reason
not to
view the discussion in this email chain as resolved in favor of
merging
208
and moving forward? Is there anything you're waiting on me for that
you
need so 208 can get merged? What, at this point, is left to be done
so
208
can be merged?
On Wed, Mar 2, 2016 at 7:53 PM, Alexander Bezzubov <b...@apache.org>
wrote:
Thank you guys for actually answering the question!
My personal opinion on making a progress here, and in further
cases like
that, lies with a plan C.
Please correct me if I'm wrong, but what I can see in this thread
is a
consensus around going further with plan C: merging contribution
as soon
as
it is ready, without the need to block another contributions (as
they
have
technical merit, of course) and let actual users decide.
At this point, I'd really love to hear only from people that
disagree
with
above and have strong opinions about that or think that the
concerns
they
have raised before were not addressed properly.
Thanks again,
I really appreciate everyone's time, spent on this issue.
--
Alex
On Tue, Mar 1, 2016 at 1:02 PM, Jeff Steinmetz <
jeffrey.steinm...@gmail.com>
wrote:
I too was able to use R via PR 208 with success.
I have it running as expected within the Virtual Machine
outlined in
this
updated PR
https://github.com/apache/incubator-zeppelin/pull/751/
With the `repl` package (also installed via the VM script),
plotting
such
as native R histograms worked within the notebook display system
as
well.
So - this looks good to me.
Not to oversimplify things, it “seems” this PR (or this PR and a
future
PR
for packaging) just needs:
- the packaging worked out (get the R scripts included in the
distribution)
- a few license additions to the rscala files (if they are not
generated
but part of the base requirements)
- a profile addition such as -P r to only build with R binaries
if
desired.
Unless I am missing something, it could be merged with one final
focused
effort.
Somebody could tweak the documentation a bit to match the tone
of the
other interpreter docs post merge.
Regards,
Jeff Steinmetz
Principal Architect
Akili Interactive
On 2/29/16, 6:45 AM, "Sourav Mazumder" <
sourav.mazumde...@gmail.com>
wrote:
Very similar is my experience too.
Could run PR 208 with least effort. And so far I am very
successful
to
use
it to create demonstrations covering end to end machine
learning use
cases
in Zeppelin (showcasing how data can be shared across scala,
SparkR,
R
easily where data preparation/model creation done in
SparkR/Scala
where
as
visualization in R) using PR 208 in different meetups and other
forums.
Regards,
Sourav
On Mon, Feb 29, 2016 at 5:04 AM, enzo <
e...@smartinsightsfromdata.com
wrote:
As a keen R user I tried both branches, but I couldn’t make
work
Charles'
version (maybe my mistake). I found some issue on Amos'
version
(mainly
about charting), reported on his github page (he has
suggested to
test
more
extensively and report after merge - fair enough).
In conclusion I do not have sound enough elements to judge on
which
one
is
better. As I’m in favour of competition as a general
principle,
taking
into
account that they seem to be close to the finishing line I
would
suggest to
merge each one and let users decide: I concur with Eran.
It would be useful (just to avoid similar occurrences in the
future)
to
understand why we arrived here though. How is it possible
that a
fundamental pr as R interpreter takes so long to be
integrated? I
would
humbly suggest for the future to give better treatment to the
big
hitting
functionalities. Clearly the more a ‘big’ functionality is
delayed,
the
more will be deemed attractive to develop alternative versions
(some
time
better versions, some time equally useful).
Another consideration is the over present issue of graphics.
From
an
R
standpoint, due to the extreme richness of its graphic
offering, so
far
I
found that no notebook is entirely satisfactory: for example
the
growing
family of htmlwidgets are badly (or not) displayed in many
cases.
It
would
certainly benefit the community to invest time and activities
on
perfecting
these issues, but so be it.
Enzo
e...@smartinsightsfromdata.com
On 29 Feb 2016, at 12:36, Eran Witkon <eranwit...@gmail.com
wrote:
I think we should ask ourselves what is the guiding
principle
here,
for
example, if in the future I want to create yet another JDBC
interpreter
or
Flink interpreter, should I only extend the one that already
exist
or
can I
create my own and let the user community decide?
realistically I
don't
think we can control where people invest their time and
contribution
and
as
long as it has no licencing issues and align with other
project
guidance
it
should be up to the users to decide.
Eran W
On Mon, Feb 29, 2016 at 2:13 PM DuyHai Doan <
doanduy...@gmail.com
wrote:
Hello Alexander
My opinion is no one, unless being an expert with R, is
able to
judge
the
quality of both contributions apart from the authors
themselves.
So
let's
make them work together to merge 2 PR into a good one.
Those
PR,
especially the #208 has been there for a while and it's
high
time
they
get
merged so the community can move on.
Unless there are R experts in the Zeppelin community and
so they
should
speak to give their own opinions.
My 2 cents
Duy Hai DOAN
On Mon, Feb 29, 2016 at 1:04 PM, Alexander Bezzubov <
b...@apache.org>
wrote:
Hi fellow Zeppelin community members,
as you know, we have 2 contributions for ZEPPELIN-156
<https://issues.apache.org/jira/browse/ZEPPELIN-156> AKA
R
<https://github.com/apache/incubator-zeppelin/pull/208>
interpreter
<https://github.com/apache/incubator-zeppelin/pull/702>.
Both have merit, so wearing my PPMC hat, I'd like to
suggest us
to
make a
decision, how we move forward with it avoiding user
confusion.
Here is what we can do:
- a. pick only one of those and merge it
- b. ask authors of both of them to collaborate together
and
come
up
with
one
- c. merge each, as soon as it's ready and let
users\maintainers
decide
which one is best at the end
This is not an official VOTE (which is possible to
arrange, but
is
rather
bad way to build a consensus).
It is a discussion, aimed to see if we all, as community,
can
build
a
consensus together cooperatively - meaning, *everyone
compromises
something *and* there are no really strong opinions
against the
final
plan*.
I specifically DO NOT ask which one is better, have more
features,
etc,
etc, etc.
What I ask for are opinions on a community way of
reconciling
this
situation and moving project forward together.
What do you think?
--
Kind regards,
Alexander.