Re: [Proposal] New operator graph for MXNet

Anirudh Subramanian Wed, 15 May 2019 11:32:14 -0700

Hi Junru,

Overall, I appreciate the points you made about the proposal.


Having said that, I would like to remind the Apache Code of Conduct :
https://www.apache.org/foundation/policies/conduct.
"Be empathetic, welcoming, friendly and patient".

I find your tone condescending. Clearly you understand what he meant from
the context whether you prefer to call IR in compilers or data-flow in
distributed systems. You could very well say lets use this terminology to
have a common understanding instead of saying go learn the basic concepts.
Before building a cool brand, its important to build a healthy community.

Anirudh


On Wed, May 15, 2019 at 12:03 AM Junru Shao <junrushao1...@gmail.com> wrote:

> Hi Pedro,
>
> I really appreciate that a diligent and talented engineer eagerly wants to
> improve our system, and am very thankful that you have done so much for our
> community. However, I do want to mention some points that I believe I
> should mention.
>
> While I agree with Tianqi that every design has its pros and cons, I would
> love to emphasize that a *good taste* of system design is to optimize the
> bottleneck, enhance expressiveness (and usability), i.e. to do what needs
> doing, rather than *trivial nits* that are irrelevant to either performance
> or expressiveness. Generally speaking, typed or untyped, shared_ptr or
> unique_ptr, won't affect the overall performance when it comes to deep
> learning workload, specially when we have an async scheduler that does good
> latency hiding in MXNet - to me, these are not major issues that are worth
> re-designing our entire system.
>
> To benefit users - real-world ML practitioners, the most thing I would love
> to mention is that dataflow graph-based representation is increasingly
> incapable of modern neural networks, because the increasingly appeared
> structures like arbitrary control flow (w/ continue, break, etc),
> recursion, type conjunction and disjunction, etc. These issues will be our
> priority to address, which is brought by Relay, which addresses all these
> pain points.
>
> Another minor thing I would love to humbly mention is that, for sake of our
> brand, it is our responsibility to be professional about terminologies when
> writing an official proposal on Confluence. As one of the numerous
> examples, the title of the proposal really shocks me for a while, something
> like "operators graph" blah blah so weird. Educate me if I were wrong, but
> compiler community would prefer the term "intermediate representation", and
> distributed system community would prefer "dataflow graph". If you don't
> have knowledge in these fields, a better way for efficient communication is
> to get yourself first familiarize the most basic concepts and then do
> discussion. This is a way to save your own valuable time as well.
>
> Again, thank you so much for your hard work, and hope that we could work
> together to win customers in the future :-)
>
> Thanks,
> Junru
>
>
> On Tue, May 14, 2019 at 8:03 PM Tianqi Chen <tqc...@cs.washington.edu>
> wrote:
>
> > The core part of the proposal is to move the graph to be much more
> strongly
> > typed template class.
> > I think this is mainly a point of engineering taste, and both sides have
> > pros and cons, let me list them before I share my thoughts on this issue:
> >
> > - Typed fields certainly enjoy more compile-time type checking, on the
> > other hand, it is hard to expose
> >    template of explosive possibilities to frontend languages.
> > - More type-erased fields provide runtime flexibility to store
> polymorphic
> > types as well as extensible attributes for graph optimization
> >   - It is hard to use a virtual class to expose every possible attribute
> > that an operator might have, such as inlining, storage pattern, gradient
> > etc..
> >   - The nature of supporting a growing set of operator attribute
> requires a
> > type-erased attrs field.
> > - In contrast to your argument(typing is a blocker to features),
> > type-erased or typed code can both get to the same feature except, except
> > that
> >   typed code gets more compile-time errors while type-erased get some of
> > them in runtime.
> > - Templatized data structures will likely introduce additional metal
> > burdens to developers and are not really suitable as a core data
> structure
> >    - Because they imply an explosive number of possible data structures,
> > while the core data structure should be a single one.
> >
> > Now my view(as an MXNet PMC member) on typed vs type-erased style: If
> MXNet
> > is a pure C++ project, I might take more of the typed approach.
> > However, MXNet itself is a project that takes python/scala/clojure and
> > other frontend languages.
> > The introduction of more typing may not align with the original goal as
> the
> > tradeoffs I listed above.
> >
> > This proposal is really a drastic change of what NNVM does, as well as
> the
> > optimization passes, and given the scope, in your analogy, "a new vehicle
> > to solve all the problems"
> > rather than a minor patch. It will take a lot of engineering effort to
> > bring in new features and adapting the existing ones.
> > Because of that, it does merit a discussion about how shall we think
> about
> > the future MXNet2.0.
> >
> > Technically Relay is a serious candidate. Of course relay, as well as its
> > core, is in C++ but maintains the multi-language first principle, that is
> > why the example code was in python.
> > See more related discussion comparing NNVMv1 and relay:
> > https://discuss.tvm.ai/t/any-materials-of-relay-for-beginners/2392/5
> >
> > I think the ideal graph data structure candidate for MXNet2.0 should have
> > natural support for:
> > - Native support of function, module, and recursions
> > - Control flows
> > - The ability of interpolation with multi-language frontend, e.g. being
> > able to prototype graph optimizations in python/scala/clojure if needed.
> >
> > Adding these support needs significant engineering effort, and I do hope
> we
> > only have to do it once. While I don't want to force any conclusion here,
> > I do think Relay is one such candidate.
> >
> > Tianqi
> >
> >
> > On Tue, May 14, 2019 at 5:58 PM Pedro Larroy <
> pedro.larroy.li...@gmail.com
> > >
> > wrote:
> >
> > > Hi Tianqi
> > >
> > > Thanks for the quick response.
> > >
> > > Could you point to examples where graph.h is being exposed which would
> > > not be possible with what I propose? I don't think my proposal is
> > > having any impact in language bindings, and the way I describe it
> > > doesn't affect having or not having higher language bindings. Please
> > > elaborate so I can understand your concern.  Maybe code examples where
> > > the graph attributes are being changed from Python?  I don't think we
> > > have this on MXNet. This is such a core foundation for MXNet, that I
> > > don't think we should compromise on it because other project not
> > > directly related to MXNet might want to expose some untyped graph and
> > > Node attributes.  The current status makes maintaining the code very
> > > painful and also is preventing desired features such as higher order
> > > gradients to be developed. I have heard from you many times how speed
> > > is critical for us to innovate in this quickly changing field.
> > >
> > > My proposal is limited to the graph and wouldn't change the way
> > > operators are registered and arguments are processed for operators for
> > > example.
> > >
> > >
> > > Regarding the second point, the documentation about Relay in the web
> > > which I found for example:
> > >
> > > https://docs.tvm.ai/dev/relay_add_op.html#
> > >
> > > Is somebody working on making Imperative::Backward use this API? this
> > > would be a big change which I'm not aware of. And using an IR is of a
> > > much bigger scope than the change I'm proposing here for example.
> > >
> > > I think I'm having difficulty understanding what are the arguments
> > > here. I'm saying I need to change one piece of my car and what you are
> > > selling me is a new vehicle here?  Or your suggestion that we use
> > > Relay for the graph passes in MXNet?
> > >
> > > I would like to see C++ code examples, Python examples are not
> > > sufficient when we talk about the core MXNet.
> > >
> > > Pedro.
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, May 14, 2019 at 5:39 PM Tianqi Chen <tqc...@cs.washington.edu>
> > > wrote:
> > > >
> > > > Thanks for the proposal. Let me share some of my thoughts:
> > > >
> > > > Specific comments on the proposal
> > > > -----------------------------------------------
> > > > The heavy use of generic in the Graph type was a huge departure from
> > > > type-erased data structure which was presented in the previous
> design.
> > > > While we understand the advantage of typed language(more compile-time
> > > > checking) and type-erased types(more dynamism) the heavy use of
> > > > the template will actually make the project solely C++ focused,
> making
> > it
> > > > hard to expose intermediate(templatized) data structure to
> > > > other languages like python/scala/clojure.
> > > >
> > > > While I fully understand some of the lessons taught in programming
> > > > C++(reduce shared_ptr, more typing etc.)
> > > > We need to think about the context of MXNet project and **the need to
> > > > support multi-language as a first-class**.
> > > > Some of the type-erased types are design trade-offs made to support
> > these
> > > > features, and we need to think more
> > > > carefully instead of just applying "rules for C++" which may bring
> > > problems.
> > > >
> > > > Future of NNVM
> > > > ----------------------
> > > > Given that this thread touched upon what we should do for better
> > > > computational graph handling. I would recommend also to take a look
> at
> > > > NNVMv2 -- relay.
> > > >
> > > > Relay addresses many of the wish-lists in the proposal already, such
> as
> > > > operator fusion, high order gradient, offload to hardware, isolated
> > > > compilation, deployment on edge and accelerators etc.
> > > > Relay also address problems not yet being mentioned in the proposal,
> > > > including control flow and dynamic runtime, automatic layout
> > optimization
> > > > etc.
> > > >
> > > > Tianqi
> > > >
> > > > On Tue, May 14, 2019 at 5:06 PM Sheng Zha <zhash...@apache.org>
> wrote:
> > > >
> > > > > Hi Pedro,
> > > > >
> > > > > Thanks for taking the inititaive. Skimming through the design doc,
> I
> > > > > didn't see comparison with existing solutions such as relay in tvm,
> > > which
> > > > > is already a dependency of mxnet already. Could you elaborate on
> > > comparison
> > > > > with existing solutions in the design doc too?
> > > > >
> > > > > -sz
> > > > >
> > > > > On 2019/05/14 23:49:30, Pedro Larroy <pedro.larroy.li...@gmail.com
> >
> > > > > wrote:
> > > > > > Hi dev@
> > > > > >
> > > > > > As a result of my deep dives on the graph machinery I have
> created
> > a
> > > > > > new proposal to improve the operator graph in MXNet.
> > > > > >
> > > > > > This would mean superseding the use of NNVM Graph in MXNet and
> > having
> > > > > > a new implementation that we can use to simplify a lot of code
> and
> > do
> > > > > > powerful graph manipulation and passes such as operator fusion
> and
> > > > > > other optimizations.
> > > > > >
> > > > > > As it would be a change with big impact and ramifications, your
> > > > > > thoughts and feedback on the document would be highly appreciated
> > so
> > > > > > we can take potential future interesting use cases:
> > > > > >
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0
> > > > > >
> > > > > > Pedro.
> > > > > >
> > > > >
> > >
> >
>

Re: [Proposal] New operator graph for MXNet

Reply via email to