Re: [DISCUSS] Zeppelin Context for Flink

DuyHai Doan Sat, 23 Apr 2016 01:01:24 -0700

"I'd like to see that Flink have access to the 'z' object. "

--> You're taking the problem at the wrong side.


You need to access the 'z' object not for the object itself but to be able
to call its functions, namely 'z.angular(xxx)' right ?

If you look at the source code, the AngularObjectRegistry is available from
the InterpreterContext object itself, with a little bit
of code, see here:
https://github.com/apache/incubator-zeppelin/blob/master/spark/src/main/java/org/apache/zeppelin/spark/ZeppelinContext.java#L370-L384

So basically, inside the Flink interpreter, you can as well call this piece
of code and achieve the same goal

The 'z.angular()' method is merely a syntactic sugar method to simplify
AngularObjectRegistry interaction

"But the Angular binds don't need to be Spark specific (e.g. living in
the ZeppelinContext
which requires a SparkContext as a constructor)."

--> And it isn't Spark specific, it can be retrieve from InterpreterContext
itself


On Sat, Apr 23, 2016 at 12:27 AM, Trevor Grant <[email protected]>
wrote:

> First of all, awesome work on what you've done here.  Appreciating it more
> and more, the more I grok.
>
> Second of all, thank for the Cassandra snippit. I realized we are talking
> about slightly different things.
> You are talking about ${var}
>
> I wanted something closer to this:
>
> %flink
> import org.apache.zeppelin.interpreter.InterpreterContext
> val resourcePool = InterpreterContext.get().getResourcePool()
> resourcePool.put("foo", "bar")
>
> import org.apache.zeppelin.interpreter.InterpreterContext
> resourcePool: org.apache.zeppelin.resource.ResourcePool =
> org.apache.zeppelin.resource.DistributedResourcePool@21d07d88
>
> ----------------------------------
> %spark z.get("foo")
>
> res4: Object = bar
>
> ^^ This actually works, so I can move on with my day.
>
> Continuing the discussion:
>
> I'd like to see that Flink have access to the 'z' object.  OR, if that is
> deprecated- I hope to see something calling this out in your PR of
> documentation. E.g. using resource pools. I'm not a complete idiot, but it
> took me some time to dig through code to figure this one out (and comments
> of this thread).  I think variable passing is one of the coolest things of
> a Zeppelin setup.  People should be aware that it's a thing and how to do.
>
> Re: Zeppelin being Spark Centric. I say that because the zeppelin context
> is really wrapped up in the Spark interpreter and vice versa. For cripes
> sake, the Spark Context is required for the constructor of the Zeppelin
> Context:
> (This isn't related to your pull request / fine work)
>
> Currently it is something like this:
>
> class SparkInterpreter {
>    // basic interpreter stuff
>    // fancy interpreter fixes
>    // special Zeppelin interpreter magic
> }
>
> class ZeppelinContext( SparkContext ) {
>   // all the binding / watching / other cool stuff
> }
>
> class FlinkInterpreter {
>    // basic interpreter stuff
> }
>
> class IgniteInterpreter {
>    // basic interpreter stuff, but not standardized so patches and fixes
> don't always work as expected and now all interpretters have slightly
> different implementation bc they aren't homogenized.
> }
>
>
> I propose something more like this:
> class ZeppelinIntp {
>    // common resource pools
>    // etc
> }
> object ZeppelinIntp {
>     // common resource pools
> }
>
> class ScalaIntp {
>   // everything for a well oiled and highly functioning scala interpreter
> }
>
> object SparkScalaIntp extends ScalaIntp (sparkParams, ZeppelinIntp, ...){
>     // do spark specific things
> }
>
> object FlinkScalaIntp extends ScalaIntp (flinkParams, ZeppelinIntp, ...){
>     // do flink specific things
> }
>
> object IgniteScalaIntp extends ScalaIntp (igniteParams, ZeppelinIntp, ...){
>     // do ignite specific things
> }
>
> Yea, I know this is a major refactor, but the problem is going to get worse
> as time goes on.
>
> The zeppelin context-spark context may not be worth splitting out- those
> two are really entangled, and for any concievable case the most we would
> want to pass back and forth can be handled by the resource pools. But the
> Angular binds don't need to be Spark specific (e.g. living in the
> ZeppelinContext which requires a SparkContext as a constructor). If
> anything it would make more sense for those to live inside Flink bc it is
> true streaming as opposed to Spark Mini-batching (which comes to the
> scala-shell in v1.1).
>
> Also, I really believe the over arching classes that handle language
> behavior and parsing ought to be off in their own modules.
>
> Possibly a thing for v 0.7?
>
>
>
>
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Fri, Apr 22, 2016 at 4:37 PM, DuyHai Doan <[email protected]> wrote:
>
> > "Back to my original post, I essentially want to add Flink to that list"
> >
> > In that case, inside the Flink interpreter source code, everytime the
> input
> > parser encouters a ${variable} pattern, you have to access the
> > AngularObjectRegistry and replace the template by the actual variable
> > value.
> >
> > It is the responsibility for each interpreter to implement variable
> > interpolation (${var})
> >
> > I did it for the Cassandra interpreter using my own syntax ( {{var}} ) :
> >
> >
> https://github.com/apache/incubator-zeppelin/blob/master/cassandra/src/main/scala/org/apache/zeppelin/cassandra/InterpreterLogic.scala#L306-L327
> >
> >
> > "was looking through your resourcePools. I am under the impression I can
> > use
> > those to pass a variable from one paragraph to another, in an akward sort
> > of fasion (but I may be going about it all wrong). Supposing that can be
> > done (or possibly is already done, but I haven't read the PRs you
> > listed carefully),
> > it would solve what I want to do for the time being."
> >
> > I will create an epic to merge angular object with resource pools to keep
> > only one abstraction. But it doesn't solve the fundamental problem, which
> > is IF an interpreter wants to use variables stored in resource pool, it
> HAS
> > to implement it.
> >
> > The only way we can mutualise code for variable binding is to let
> Zeppelin
> > Engine pre-process the input text bloc of each paragraph and perform
> > variable lookup from Resource Pool then variable replace, and after that
> > forward the text block to the interpreter itself.
> >
> > I think it is a good idea but it would require some refactoring and may
> > break existing behaviors if some interpreter already implemented their
> own
> > variable template handling
> >
> >
> >
> > "2) If we want to keep the code base compact and clean, would it be wiser
> > to refactor in a less Spark-centric way?"
> >
> > There is nothing Spark centric here if we're talking about variable
> > sharing, it applies to all interpreters
> >
> >
> > On Fri, Apr 22, 2016 at 11:24 PM, Trevor Grant <[email protected]
> >
> > wrote:
> >
> > > If I'm reading https://issues.apache.org/jira/browse/ZEPPELIN-635
> > > correctly- this integrates the spark, markdown, and shell interpreters.
> > >
> > > Back to my original post, I essentially want to add Flink to that list.
> > >
> > > To your point about keeping a small and managable code-base:  Under the
> > > hood it seems like Zeppelin is a front end for Spark and oh btw, here
> are
> > > some hacks to make other stuff work too.  For instance there is a lot
> of
> > > code reusage in any scala based interpreter.  Wouldn't it make more
> sense
> > > to have a generic Scala interpreter and extend it for special quirks of
> > > each interpreter as needed, e.g. for the variable bindings of the
> > > particular interpreter, and loading configurations.  Consider the
> > companion
> > > object bug, essentially the same code had to be copy and pasted across
> 4
> > > interpreters, and the Ignite interpreter (as I recall) never even got
> the
> > > fix because of a quirk in the way the tests are written for that
> > > interpreter.
> > >
> > > I was looking through your resourcePools. I am under the impression I
> can
> > > use those to pass a variable from one paragraph to another, in an
> akward
> > > sort of fasion (but I may be going about it all wrong). Supposing that
> > can
> > > be done (or possibly is already done, but I haven't read the PRs you
> > listed
> > > carefully), it would solve what I want to do for the time being.
> > >
> > > Also consider the Python Flink I want to add to this, there will once
> > again
> > > be a lot of duplication of code from the Spark Python interpreter.  A
> > > generic Python interpreter also seems like a more reasonable approach
> > here.
> > >
> > > So basically I've broken this conversation into two parts-
> > > 1) I'm trying to pass variables/object back and forth between
> > > Spark/Flink/Angular/etc. Please help. Seems possible but I'm having a
> > slow
> > > time figuring it out
> > > 2) If we want to keep the code base compact and clean, would it be
> wiser
> > to
> > > refactor in a less Spark-centric way?
> > >
> > >
> > >
> > >
> > > Trevor Grant
> > > Data Scientist
> > > https://github.com/rawkintrevo
> > > http://stackexchange.com/users/3002022/rawkintrevo
> > > http://trevorgrant.org
> > >
> > > *"Fortunate is he, who is able to know the causes of things."  -Virgil*
> > >
> > >
> > > On Fri, Apr 22, 2016 at 3:41 PM, DuyHai Doan <[email protected]>
> > wrote:
> > >
> > > > In this case, it is already implemented.
> > > >
> > > > Look at those merged PR:
> > > >
> > > > - https://github.com/apache/incubator-zeppelin/pull/739
> > > > - https://github.com/apache/incubator-zeppelin/pull/740
> > > > - https://github.com/apache/incubator-zeppelin/pull/741
> > > > - https://github.com/apache/incubator-zeppelin/pull/742
> > > > - https://github.com/apache/incubator-zeppelin/pull/744
> > > > - https://github.com/apache/incubator-zeppelin/pull/745
> > > > - https://github.com/apache/incubator-zeppelin/pull/832
> > > >
> > > > There is one last JIRA pending for documentation, I'll do a PR for
> this
> > > > next week: https://issues.apache.org/jira/browse/ZEPPELIN-742
> > > >
> > > > On Fri, Apr 22, 2016 at 9:52 PM, Trevor Grant <
> > [email protected]>
> > > > wrote:
> > > >
> > > > > I want to be able to put/get/watch variables. Specifically so I can
> > > > > interface with AngularJS for visualizations.
> > > > >
> > > > > I've been groking the codebase trying to find a less invasive way
> to
> > do
> > > > > this.
> > > > >
> > > > > I get wanting to keep the code base clean but sharing variables is
> a
> > > > really
> > > > > nice feature set and shouldn't be that hard to implement?
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > Trevor Grant
> > > > > Data Scientist
> > > > > https://github.com/rawkintrevo
> > > > > http://stackexchange.com/users/3002022/rawkintrevo
> > > > > http://trevorgrant.org
> > > > >
> > > > > *"Fortunate is he, who is able to know the causes of things."
> > -Virgil*
> > > > >
> > > > >
> > > > > On Fri, Apr 22, 2016 at 1:06 PM, DuyHai Doan <[email protected]
> >
> > > > wrote:
> > > > >
> > > > > > I think we should rather let ZeppelinContext un-modified.
> > > > > >
> > > > > > If we update ZeppelinContext for every kind of interpreter, it
> > would
> > > > > become
> > > > > > quickly a behemoth and un-manageable.
> > > > > >
> > > > > > The reason ZeppelinContext has some support for Spark is because
> > it's
> > > > > > historical. Now that the project is going to gain wider audience,
> > we
> > > > > should
> > > > > > focus on keeping the code as cleanest and as modular as possible.
> > > > > >
> > > > > > Can you explain which feature you want to add to ZeppelinContext
> > that
> > > > > will
> > > > > > be useful for Flink ?
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Apr 22, 2016 at 7:12 PM, Trevor Grant <
> > > > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > If one were to extend the Zeppelin context for Flink, I was
> > > thinking
> > > > it
> > > > > > > would make the most sense to update
> > > > > > >
> > > > > > >
> > > ../spark/src/main/java/org/apache/zeppelin/spark/ZeppelinContext.java
> > > > > > >
> > > > > > > Any thoughts from those who are more familiar with that end of
> > the
> > > > code
> > > > > > > base than I?
> > > > > > >
> > > > > > > Ideally we'd have a solution that extend the Zeppelin Context
> to
> > > all
> > > > > > > interpreters.  I know y'all love Spark but there ARE others out
> > > > > there...
> > > > > > >
> > > > > > > Anyone have any branches / previous attempts I could check out?
> > > > > > >
> > > > > > > tg
> > > > > > >
> > > > > > >
> > > > > > > Trevor Grant
> > > > > > > Data Scientist
> > > > > > > https://github.com/rawkintrevo
> > > > > > > http://stackexchange.com/users/3002022/rawkintrevo
> > > > > > > http://trevorgrant.org
> > > > > > >
> > > > > > > *"Fortunate is he, who is able to know the causes of things."
> > > > -Virgil*
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Zeppelin Context for Flink

Reply via email to