Re: [DISCUSS] Zeppelin Context for Flink

DuyHai Doan Sat, 23 Apr 2016 01:05:52 -0700

One idea

Once we merge AngularObjectRegistry with ResourcePool, it will be a good
idea to expose some utility methods like 'getResource(xxx)',
'putResource(yyy)' and 'removeResource(zzz)' directly on the
InterpreterContext object so that any interpreter can use them




On Sat, Apr 23, 2016 at 9:59 AM, DuyHai Doan <[email protected]> wrote:

> "I'd like to see that Flink have access to the 'z' object. "
>
> --> You're taking the problem at the wrong side.
>
> You need to access the 'z' object not for the object itself but to be able
> to call its functions, namely 'z.angular(xxx)' right ?
>
> If you look at the source code, the AngularObjectRegistry is available
> from the InterpreterContext object itself, with a little bit
> of code, see here:
> https://github.com/apache/incubator-zeppelin/blob/master/spark/src/main/java/org/apache/zeppelin/spark/ZeppelinContext.java#L370-L384
>
> So basically, inside the Flink interpreter, you can as well call this
> piece of code and achieve the same goal
>
> The 'z.angular()' method is merely a syntactic sugar method to simplify
> AngularObjectRegistry interaction
>
> "But the Angular binds don't need to be Spark specific (e.g. living in
> the ZeppelinContext which requires a SparkContext as a constructor)."
>
> --> And it isn't Spark specific, it can be retrieve from
> InterpreterContext itself
>
>
> On Sat, Apr 23, 2016 at 12:27 AM, Trevor Grant <[email protected]>
> wrote:
>
>> First of all, awesome work on what you've done here.  Appreciating it more
>> and more, the more I grok.
>>
>> Second of all, thank for the Cassandra snippit. I realized we are talking
>> about slightly different things.
>> You are talking about ${var}
>>
>> I wanted something closer to this:
>>
>> %flink
>> import org.apache.zeppelin.interpreter.InterpreterContext
>> val resourcePool = InterpreterContext.get().getResourcePool()
>> resourcePool.put("foo", "bar")
>>
>> import org.apache.zeppelin.interpreter.InterpreterContext
>> resourcePool: org.apache.zeppelin.resource.ResourcePool =
>> org.apache.zeppelin.resource.DistributedResourcePool@21d07d88
>>
>> ----------------------------------
>> %spark z.get("foo")
>>
>> res4: Object = bar
>>
>> ^^ This actually works, so I can move on with my day.
>>
>> Continuing the discussion:
>>
>> I'd like to see that Flink have access to the 'z' object.  OR, if that is
>> deprecated- I hope to see something calling this out in your PR of
>> documentation. E.g. using resource pools. I'm not a complete idiot, but it
>> took me some time to dig through code to figure this one out (and comments
>> of this thread).  I think variable passing is one of the coolest things of
>> a Zeppelin setup.  People should be aware that it's a thing and how to do.
>>
>> Re: Zeppelin being Spark Centric. I say that because the zeppelin context
>> is really wrapped up in the Spark interpreter and vice versa. For cripes
>> sake, the Spark Context is required for the constructor of the Zeppelin
>> Context:
>> (This isn't related to your pull request / fine work)
>>
>> Currently it is something like this:
>>
>> class SparkInterpreter {
>>    // basic interpreter stuff
>>    // fancy interpreter fixes
>>    // special Zeppelin interpreter magic
>> }
>>
>> class ZeppelinContext( SparkContext ) {
>>   // all the binding / watching / other cool stuff
>> }
>>
>> class FlinkInterpreter {
>>    // basic interpreter stuff
>> }
>>
>> class IgniteInterpreter {
>>    // basic interpreter stuff, but not standardized so patches and fixes
>> don't always work as expected and now all interpretters have slightly
>> different implementation bc they aren't homogenized.
>> }
>>
>>
>> I propose something more like this:
>> class ZeppelinIntp {
>>    // common resource pools
>>    // etc
>> }
>> object ZeppelinIntp {
>>     // common resource pools
>> }
>>
>> class ScalaIntp {
>>   // everything for a well oiled and highly functioning scala interpreter
>> }
>>
>> object SparkScalaIntp extends ScalaIntp (sparkParams, ZeppelinIntp, ...){
>>     // do spark specific things
>> }
>>
>> object FlinkScalaIntp extends ScalaIntp (flinkParams, ZeppelinIntp, ...){
>>     // do flink specific things
>> }
>>
>> object IgniteScalaIntp extends ScalaIntp (igniteParams, ZeppelinIntp,
>> ...){
>>     // do ignite specific things
>> }
>>
>> Yea, I know this is a major refactor, but the problem is going to get
>> worse
>> as time goes on.
>>
>> The zeppelin context-spark context may not be worth splitting out- those
>> two are really entangled, and for any concievable case the most we would
>> want to pass back and forth can be handled by the resource pools. But the
>> Angular binds don't need to be Spark specific (e.g. living in the
>> ZeppelinContext which requires a SparkContext as a constructor). If
>> anything it would make more sense for those to live inside Flink bc it is
>> true streaming as opposed to Spark Mini-batching (which comes to the
>> scala-shell in v1.1).
>>
>> Also, I really believe the over arching classes that handle language
>> behavior and parsing ought to be off in their own modules.
>>
>> Possibly a thing for v 0.7?
>>
>>
>>
>>
>>
>> Trevor Grant
>> Data Scientist
>> https://github.com/rawkintrevo
>> http://stackexchange.com/users/3002022/rawkintrevo
>> http://trevorgrant.org
>>
>> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>>
>>
>> On Fri, Apr 22, 2016 at 4:37 PM, DuyHai Doan <[email protected]>
>> wrote:
>>
>> > "Back to my original post, I essentially want to add Flink to that list"
>> >
>> > In that case, inside the Flink interpreter source code, everytime the
>> input
>> > parser encouters a ${variable} pattern, you have to access the
>> > AngularObjectRegistry and replace the template by the actual variable
>> > value.
>> >
>> > It is the responsibility for each interpreter to implement variable
>> > interpolation (${var})
>> >
>> > I did it for the Cassandra interpreter using my own syntax ( {{var}} ) :
>> >
>> >
>> https://github.com/apache/incubator-zeppelin/blob/master/cassandra/src/main/scala/org/apache/zeppelin/cassandra/InterpreterLogic.scala#L306-L327
>> >
>> >
>> > "was looking through your resourcePools. I am under the impression I can
>> > use
>> > those to pass a variable from one paragraph to another, in an akward
>> sort
>> > of fasion (but I may be going about it all wrong). Supposing that can be
>> > done (or possibly is already done, but I haven't read the PRs you
>> > listed carefully),
>> > it would solve what I want to do for the time being."
>> >
>> > I will create an epic to merge angular object with resource pools to
>> keep
>> > only one abstraction. But it doesn't solve the fundamental problem,
>> which
>> > is IF an interpreter wants to use variables stored in resource pool, it
>> HAS
>> > to implement it.
>> >
>> > The only way we can mutualise code for variable binding is to let
>> Zeppelin
>> > Engine pre-process the input text bloc of each paragraph and perform
>> > variable lookup from Resource Pool then variable replace, and after that
>> > forward the text block to the interpreter itself.
>> >
>> > I think it is a good idea but it would require some refactoring and may
>> > break existing behaviors if some interpreter already implemented their
>> own
>> > variable template handling
>> >
>> >
>> >
>> > "2) If we want to keep the code base compact and clean, would it be
>> wiser
>> > to refactor in a less Spark-centric way?"
>> >
>> > There is nothing Spark centric here if we're talking about variable
>> > sharing, it applies to all interpreters
>> >
>> >
>> > On Fri, Apr 22, 2016 at 11:24 PM, Trevor Grant <
>> [email protected]>
>> > wrote:
>> >
>> > > If I'm reading https://issues.apache.org/jira/browse/ZEPPELIN-635
>> > > correctly- this integrates the spark, markdown, and shell
>> interpreters.
>> > >
>> > > Back to my original post, I essentially want to add Flink to that
>> list.
>> > >
>> > > To your point about keeping a small and managable code-base:  Under
>> the
>> > > hood it seems like Zeppelin is a front end for Spark and oh btw, here
>> are
>> > > some hacks to make other stuff work too.  For instance there is a lot
>> of
>> > > code reusage in any scala based interpreter.  Wouldn't it make more
>> sense
>> > > to have a generic Scala interpreter and extend it for special quirks
>> of
>> > > each interpreter as needed, e.g. for the variable bindings of the
>> > > particular interpreter, and loading configurations.  Consider the
>> > companion
>> > > object bug, essentially the same code had to be copy and pasted
>> across 4
>> > > interpreters, and the Ignite interpreter (as I recall) never even got
>> the
>> > > fix because of a quirk in the way the tests are written for that
>> > > interpreter.
>> > >
>> > > I was looking through your resourcePools. I am under the impression I
>> can
>> > > use those to pass a variable from one paragraph to another, in an
>> akward
>> > > sort of fasion (but I may be going about it all wrong). Supposing that
>> > can
>> > > be done (or possibly is already done, but I haven't read the PRs you
>> > listed
>> > > carefully), it would solve what I want to do for the time being.
>> > >
>> > > Also consider the Python Flink I want to add to this, there will once
>> > again
>> > > be a lot of duplication of code from the Spark Python interpreter.  A
>> > > generic Python interpreter also seems like a more reasonable approach
>> > here.
>> > >
>> > > So basically I've broken this conversation into two parts-
>> > > 1) I'm trying to pass variables/object back and forth between
>> > > Spark/Flink/Angular/etc. Please help. Seems possible but I'm having a
>> > slow
>> > > time figuring it out
>> > > 2) If we want to keep the code base compact and clean, would it be
>> wiser
>> > to
>> > > refactor in a less Spark-centric way?
>> > >
>> > >
>> > >
>> > >
>> > > Trevor Grant
>> > > Data Scientist
>> > > https://github.com/rawkintrevo
>> > > http://stackexchange.com/users/3002022/rawkintrevo
>> > > http://trevorgrant.org
>> > >
>> > > *"Fortunate is he, who is able to know the causes of things."
>> -Virgil*
>> > >
>> > >
>> > > On Fri, Apr 22, 2016 at 3:41 PM, DuyHai Doan <[email protected]>
>> > wrote:
>> > >
>> > > > In this case, it is already implemented.
>> > > >
>> > > > Look at those merged PR:
>> > > >
>> > > > - https://github.com/apache/incubator-zeppelin/pull/739
>> > > > - https://github.com/apache/incubator-zeppelin/pull/740
>> > > > - https://github.com/apache/incubator-zeppelin/pull/741
>> > > > - https://github.com/apache/incubator-zeppelin/pull/742
>> > > > - https://github.com/apache/incubator-zeppelin/pull/744
>> > > > - https://github.com/apache/incubator-zeppelin/pull/745
>> > > > - https://github.com/apache/incubator-zeppelin/pull/832
>> > > >
>> > > > There is one last JIRA pending for documentation, I'll do a PR for
>> this
>> > > > next week: https://issues.apache.org/jira/browse/ZEPPELIN-742
>> > > >
>> > > > On Fri, Apr 22, 2016 at 9:52 PM, Trevor Grant <
>> > [email protected]>
>> > > > wrote:
>> > > >
>> > > > > I want to be able to put/get/watch variables. Specifically so I
>> can
>> > > > > interface with AngularJS for visualizations.
>> > > > >
>> > > > > I've been groking the codebase trying to find a less invasive way
>> to
>> > do
>> > > > > this.
>> > > > >
>> > > > > I get wanting to keep the code base clean but sharing variables
>> is a
>> > > > really
>> > > > > nice feature set and shouldn't be that hard to implement?
>> > > > >
>> > > > > Thoughts?
>> > > > >
>> > > > > Trevor Grant
>> > > > > Data Scientist
>> > > > > https://github.com/rawkintrevo
>> > > > > http://stackexchange.com/users/3002022/rawkintrevo
>> > > > > http://trevorgrant.org
>> > > > >
>> > > > > *"Fortunate is he, who is able to know the causes of things."
>> > -Virgil*
>> > > > >
>> > > > >
>> > > > > On Fri, Apr 22, 2016 at 1:06 PM, DuyHai Doan <
>> [email protected]>
>> > > > wrote:
>> > > > >
>> > > > > > I think we should rather let ZeppelinContext un-modified.
>> > > > > >
>> > > > > > If we update ZeppelinContext for every kind of interpreter, it
>> > would
>> > > > > become
>> > > > > > quickly a behemoth and un-manageable.
>> > > > > >
>> > > > > > The reason ZeppelinContext has some support for Spark is because
>> > it's
>> > > > > > historical. Now that the project is going to gain wider
>> audience,
>> > we
>> > > > > should
>> > > > > > focus on keeping the code as cleanest and as modular as
>> possible.
>> > > > > >
>> > > > > > Can you explain which feature you want to add to ZeppelinContext
>> > that
>> > > > > will
>> > > > > > be useful for Flink ?
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Fri, Apr 22, 2016 at 7:12 PM, Trevor Grant <
>> > > > [email protected]>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > If one were to extend the Zeppelin context for Flink, I was
>> > > thinking
>> > > > it
>> > > > > > > would make the most sense to update
>> > > > > > >
>> > > > > > >
>> > > ../spark/src/main/java/org/apache/zeppelin/spark/ZeppelinContext.java
>> > > > > > >
>> > > > > > > Any thoughts from those who are more familiar with that end of
>> > the
>> > > > code
>> > > > > > > base than I?
>> > > > > > >
>> > > > > > > Ideally we'd have a solution that extend the Zeppelin Context
>> to
>> > > all
>> > > > > > > interpreters.  I know y'all love Spark but there ARE others
>> out
>> > > > > there...
>> > > > > > >
>> > > > > > > Anyone have any branches / previous attempts I could check
>> out?
>> > > > > > >
>> > > > > > > tg
>> > > > > > >
>> > > > > > >
>> > > > > > > Trevor Grant
>> > > > > > > Data Scientist
>> > > > > > > https://github.com/rawkintrevo
>> > > > > > > http://stackexchange.com/users/3002022/rawkintrevo
>> > > > > > > http://trevorgrant.org
>> > > > > > >
>> > > > > > > *"Fortunate is he, who is able to know the causes of things."
>> > > > -Virgil*
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: [DISCUSS] Zeppelin Context for Flink

Reply via email to