Re: [Proposal] New operator graph for MXNet

Junru Shao Wed, 15 May 2019 13:29:47 -0700

Hi Zach,

Thank you for raising these points! I am happy to offer more reading
materials about this topic.


*SSA vs ANF.* ANF and SSA are essentially the same thing [1].

*AD in Relay.* Relay is able to do AD through not only control flow, but
also various data structures and higher-order functjon [2].

[1] Appel, Andrew W. "SSA is functional programming." *ACM SIGPLAN
Notices* 33.4
(1998): 17-20.
[2] Roesch, Jared, et al. "Relay: a new IR for machine learning
frameworks." *Proceedings of the 2nd ACM SIGPLAN International Workshop on
Machine Learning and Programming Languages*. ACM, 2018.


On Wed, May 15, 2019 at 12:01 PM Zach Kimberg <zachary.kimb...@gmail.com>
wrote:

> I would like to raise another option to get back on the topic of changing
> the Operator graph structure. On the page discussing Relay IR [1], it
> discusses mainly the difference between a data flow graph like we use now
> and A-normal [2] which is used in some functional compilers. Is there a
> reason we do not want to use a structure based on Single Static Assignment
> Form (wikipedia explanation [3], lecture note explanation [4]). It is used
> almost universally in the compiler community including in LLVM (clang),
> GCC, Oracle JVM, PyPy, Go, Webkit, and Swift [5]. The major reason behind
> it's pervasiveness is that it has proven very effective for analysis and
> transformations when dealing with control flow.
>
> One possible concern is that it might make automatic differentiation more
> difficult [6]. While it certainly is more complicated than a pure
> functional approach, the functional approach requires users to use
> functional programming. Especially with the languages we support now, that
> doesn't seem like a reasonable assumption. Given that the users are already
> introducing the complexity inherent in imperative programming, we have to
> deal with the increased complexity regardless. I think it might be easier
> to have the tools to deal with that rather than attempting to coerce users
> into a different programming paradigm or convert code between paradigms.
> Furthermore, this may become more important if users are increasingly
> making use of control flow like Junru said.
>
> Zach
>
>
> [1] - https://docs.tvm.ai/dev/relay_intro.html
> [2] - https://en.wikipedia.org/wiki/A-normal_form
> [3] - https://en.wikipedia.org/wiki/Static_single_assignment_form
> [4] - https://www.cs.cmu.edu/~rjsimmon/15411-f15/lec/10-ssa.pdf
> [5] -
>
> https://en.wikipedia.org/wiki/Static_single_assignment_form#Compilers_using_SSA_form
> [6] - https://discuss.tvm.ai/t/choice-about-ir-ssa-or-anf/1757/2
>
> On Wed, May 15, 2019 at 11:51 AM Naveen Swamy <mnnav...@gmail.com> wrote:
>
> > Being dismissive and condescending has been exactly what is plaguing this
> > project.
> >
> > I agree the last paragraph sounds very condescending and very dismissive
> > and it breaks many code of conducts listed.
> >
> > On Wed, May 15, 2019 at 11:31 AM Anirudh Subramanian <
> > anirudh2...@gmail.com>
> > wrote:
> >
> > > Hi Junru,
> > >
> > > Overall, I appreciate the points you made about the proposal.
> > >
> > > Having said that, I would like to remind the Apache Code of Conduct :
> > > https://www.apache.org/foundation/policies/conduct.
> > > "Be empathetic, welcoming, friendly and patient".
> > >
> > > I find your tone condescending. Clearly you understand what he meant
> from
> > > the context whether you prefer to call IR in compilers or data-flow in
> > > distributed systems. You could very well say lets use this terminology
> to
> > > have a common understanding instead of saying go learn the basic
> > concepts.
> > > Before building a cool brand, its important to build a healthy
> community.
> > >
> > > Anirudh
> > >
> > >
> > > On Wed, May 15, 2019 at 12:03 AM Junru Shao <junrushao1...@gmail.com>
> > > wrote:
> > >
> > > > Hi Pedro,
> > > >
> > > > I really appreciate that a diligent and talented engineer eagerly
> wants
> > > to
> > > > improve our system, and am very thankful that you have done so much
> for
> > > our
> > > > community. However, I do want to mention some points that I believe I
> > > > should mention.
> > > >
> > > > While I agree with Tianqi that every design has its pros and cons, I
> > > would
> > > > love to emphasize that a *good taste* of system design is to optimize
> > the
> > > > bottleneck, enhance expressiveness (and usability), i.e. to do what
> > needs
> > > > doing, rather than *trivial nits* that are irrelevant to either
> > > performance
> > > > or expressiveness. Generally speaking, typed or untyped, shared_ptr
> or
> > > > unique_ptr, won't affect the overall performance when it comes to
> deep
> > > > learning workload, specially when we have an async scheduler that
> does
> > > good
> > > > latency hiding in MXNet - to me, these are not major issues that are
> > > worth
> > > > re-designing our entire system.
> > > >
> > > > To benefit users - real-world ML practitioners, the most thing I
> would
> > > love
> > > > to mention is that dataflow graph-based representation is
> increasingly
> > > > incapable of modern neural networks, because the increasingly
> appeared
> > > > structures like arbitrary control flow (w/ continue, break, etc),
> > > > recursion, type conjunction and disjunction, etc. These issues will
> be
> > > our
> > > > priority to address, which is brought by Relay, which addresses all
> > these
> > > > pain points.
> > > >
> > > > Another minor thing I would love to humbly mention is that, for sake
> of
> > > our
> > > > brand, it is our responsibility to be professional about
> terminologies
> > > when
> > > > writing an official proposal on Confluence. As one of the numerous
> > > > examples, the title of the proposal really shocks me for a while,
> > > something
> > > > like "operators graph" blah blah so weird. Educate me if I were
> wrong,
> > > but
> > > > compiler community would prefer the term "intermediate
> representation",
> > > and
> > > > distributed system community would prefer "dataflow graph". If you
> > don't
> > > > have knowledge in these fields, a better way for efficient
> > communication
> > > is
> > > > to get yourself first familiarize the most basic concepts and then do
> > > > discussion. This is a way to save your own valuable time as well.
> > > >
> > > > Again, thank you so much for your hard work, and hope that we could
> > work
> > > > together to win customers in the future :-)
> > > >
> > > > Thanks,
> > > > Junru
> > > >
> > > >
> > > > On Tue, May 14, 2019 at 8:03 PM Tianqi Chen <
> tqc...@cs.washington.edu>
> > > > wrote:
> > > >
> > > > > The core part of the proposal is to move the graph to be much more
> > > > strongly
> > > > > typed template class.
> > > > > I think this is mainly a point of engineering taste, and both sides
> > > have
> > > > > pros and cons, let me list them before I share my thoughts on this
> > > issue:
> > > > >
> > > > > - Typed fields certainly enjoy more compile-time type checking, on
> > the
> > > > > other hand, it is hard to expose
> > > > >    template of explosive possibilities to frontend languages.
> > > > > - More type-erased fields provide runtime flexibility to store
> > > > polymorphic
> > > > > types as well as extensible attributes for graph optimization
> > > > >   - It is hard to use a virtual class to expose every possible
> > > attribute
> > > > > that an operator might have, such as inlining, storage pattern,
> > > gradient
> > > > > etc..
> > > > >   - The nature of supporting a growing set of operator attribute
> > > > requires a
> > > > > type-erased attrs field.
> > > > > - In contrast to your argument(typing is a blocker to features),
> > > > > type-erased or typed code can both get to the same feature except,
> > > except
> > > > > that
> > > > >   typed code gets more compile-time errors while type-erased get
> some
> > > of
> > > > > them in runtime.
> > > > > - Templatized data structures will likely introduce additional
> metal
> > > > > burdens to developers and are not really suitable as a core data
> > > > structure
> > > > >    - Because they imply an explosive number of possible data
> > > structures,
> > > > > while the core data structure should be a single one.
> > > > >
> > > > > Now my view(as an MXNet PMC member) on typed vs type-erased style:
> If
> > > > MXNet
> > > > > is a pure C++ project, I might take more of the typed approach.
> > > > > However, MXNet itself is a project that takes python/scala/clojure
> > and
> > > > > other frontend languages.
> > > > > The introduction of more typing may not align with the original
> goal
> > as
> > > > the
> > > > > tradeoffs I listed above.
> > > > >
> > > > > This proposal is really a drastic change of what NNVM does, as well
> > as
> > > > the
> > > > > optimization passes, and given the scope, in your analogy, "a new
> > > vehicle
> > > > > to solve all the problems"
> > > > > rather than a minor patch. It will take a lot of engineering effort
> > to
> > > > > bring in new features and adapting the existing ones.
> > > > > Because of that, it does merit a discussion about how shall we
> think
> > > > about
> > > > > the future MXNet2.0.
> > > > >
> > > > > Technically Relay is a serious candidate. Of course relay, as well
> as
> > > its
> > > > > core, is in C++ but maintains the multi-language first principle,
> > that
> > > is
> > > > > why the example code was in python.
> > > > > See more related discussion comparing NNVMv1 and relay:
> > > > >
> https://discuss.tvm.ai/t/any-materials-of-relay-for-beginners/2392/5
> > > > >
> > > > > I think the ideal graph data structure candidate for MXNet2.0
> should
> > > have
> > > > > natural support for:
> > > > > - Native support of function, module, and recursions
> > > > > - Control flows
> > > > > - The ability of interpolation with multi-language frontend, e.g.
> > being
> > > > > able to prototype graph optimizations in python/scala/clojure if
> > > needed.
> > > > >
> > > > > Adding these support needs significant engineering effort, and I do
> > > hope
> > > > we
> > > > > only have to do it once. While I don't want to force any conclusion
> > > here,
> > > > > I do think Relay is one such candidate.
> > > > >
> > > > > Tianqi
> > > > >
> > > > >
> > > > > On Tue, May 14, 2019 at 5:58 PM Pedro Larroy <
> > > > pedro.larroy.li...@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Tianqi
> > > > > >
> > > > > > Thanks for the quick response.
> > > > > >
> > > > > > Could you point to examples where graph.h is being exposed which
> > > would
> > > > > > not be possible with what I propose? I don't think my proposal is
> > > > > > having any impact in language bindings, and the way I describe it
> > > > > > doesn't affect having or not having higher language bindings.
> > Please
> > > > > > elaborate so I can understand your concern.  Maybe code examples
> > > where
> > > > > > the graph attributes are being changed from Python?  I don't
> think
> > we
> > > > > > have this on MXNet. This is such a core foundation for MXNet,
> that
> > I
> > > > > > don't think we should compromise on it because other project not
> > > > > > directly related to MXNet might want to expose some untyped graph
> > and
> > > > > > Node attributes.  The current status makes maintaining the code
> > very
> > > > > > painful and also is preventing desired features such as higher
> > order
> > > > > > gradients to be developed. I have heard from you many times how
> > speed
> > > > > > is critical for us to innovate in this quickly changing field.
> > > > > >
> > > > > > My proposal is limited to the graph and wouldn't change the way
> > > > > > operators are registered and arguments are processed for
> operators
> > > for
> > > > > > example.
> > > > > >
> > > > > >
> > > > > > Regarding the second point, the documentation about Relay in the
> > web
> > > > > > which I found for example:
> > > > > >
> > > > > > https://docs.tvm.ai/dev/relay_add_op.html#
> > > > > >
> > > > > > Is somebody working on making Imperative::Backward use this API?
> > this
> > > > > > would be a big change which I'm not aware of. And using an IR is
> > of a
> > > > > > much bigger scope than the change I'm proposing here for example.
> > > > > >
> > > > > > I think I'm having difficulty understanding what are the
> arguments
> > > > > > here. I'm saying I need to change one piece of my car and what
> you
> > > are
> > > > > > selling me is a new vehicle here?  Or your suggestion that we use
> > > > > > Relay for the graph passes in MXNet?
> > > > > >
> > > > > > I would like to see C++ code examples, Python examples are not
> > > > > > sufficient when we talk about the core MXNet.
> > > > > >
> > > > > > Pedro.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, May 14, 2019 at 5:39 PM Tianqi Chen <
> > > tqc...@cs.washington.edu>
> > > > > > wrote:
> > > > > > >
> > > > > > > Thanks for the proposal. Let me share some of my thoughts:
> > > > > > >
> > > > > > > Specific comments on the proposal
> > > > > > > -----------------------------------------------
> > > > > > > The heavy use of generic in the Graph type was a huge departure
> > > from
> > > > > > > type-erased data structure which was presented in the previous
> > > > design.
> > > > > > > While we understand the advantage of typed language(more
> > > compile-time
> > > > > > > checking) and type-erased types(more dynamism) the heavy use of
> > > > > > > the template will actually make the project solely C++ focused,
> > > > making
> > > > > it
> > > > > > > hard to expose intermediate(templatized) data structure to
> > > > > > > other languages like python/scala/clojure.
> > > > > > >
> > > > > > > While I fully understand some of the lessons taught in
> > programming
> > > > > > > C++(reduce shared_ptr, more typing etc.)
> > > > > > > We need to think about the context of MXNet project and **the
> > need
> > > to
> > > > > > > support multi-language as a first-class**.
> > > > > > > Some of the type-erased types are design trade-offs made to
> > support
> > > > > these
> > > > > > > features, and we need to think more
> > > > > > > carefully instead of just applying "rules for C++" which may
> > bring
> > > > > > problems.
> > > > > > >
> > > > > > > Future of NNVM
> > > > > > > ----------------------
> > > > > > > Given that this thread touched upon what we should do for
> better
> > > > > > > computational graph handling. I would recommend also to take a
> > look
> > > > at
> > > > > > > NNVMv2 -- relay.
> > > > > > >
> > > > > > > Relay addresses many of the wish-lists in the proposal already,
> > > such
> > > > as
> > > > > > > operator fusion, high order gradient, offload to hardware,
> > isolated
> > > > > > > compilation, deployment on edge and accelerators etc.
> > > > > > > Relay also address problems not yet being mentioned in the
> > > proposal,
> > > > > > > including control flow and dynamic runtime, automatic layout
> > > > > optimization
> > > > > > > etc.
> > > > > > >
> > > > > > > Tianqi
> > > > > > >
> > > > > > > On Tue, May 14, 2019 at 5:06 PM Sheng Zha <zhash...@apache.org
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Pedro,
> > > > > > > >
> > > > > > > > Thanks for taking the inititaive. Skimming through the design
> > > doc,
> > > > I
> > > > > > > > didn't see comparison with existing solutions such as relay
> in
> > > tvm,
> > > > > > which
> > > > > > > > is already a dependency of mxnet already. Could you elaborate
> > on
> > > > > > comparison
> > > > > > > > with existing solutions in the design doc too?
> > > > > > > >
> > > > > > > > -sz
> > > > > > > >
> > > > > > > > On 2019/05/14 23:49:30, Pedro Larroy <
> > > pedro.larroy.li...@gmail.com
> > > > >
> > > > > > > > wrote:
> > > > > > > > > Hi dev@
> > > > > > > > >
> > > > > > > > > As a result of my deep dives on the graph machinery I have
> > > > created
> > > > > a
> > > > > > > > > new proposal to improve the operator graph in MXNet.
> > > > > > > > >
> > > > > > > > > This would mean superseding the use of NNVM Graph in MXNet
> > and
> > > > > having
> > > > > > > > > a new implementation that we can use to simplify a lot of
> > code
> > > > and
> > > > > do
> > > > > > > > > powerful graph manipulation and passes such as operator
> > fusion
> > > > and
> > > > > > > > > other optimizations.
> > > > > > > > >
> > > > > > > > > As it would be a change with big impact and ramifications,
> > your
> > > > > > > > > thoughts and feedback on the document would be highly
> > > appreciated
> > > > > so
> > > > > > > > > we can take potential future interesting use cases:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0
> > > > > > > > >
> > > > > > > > > Pedro.
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [Proposal] New operator graph for MXNet

Reply via email to