Hi Junru, Overall, I appreciate the points you made about the proposal.
Having said that, I would like to remind the Apache Code of Conduct : https://www.apache.org/foundation/policies/conduct. "Be empathetic, welcoming, friendly and patient". I find your tone condescending. Clearly you understand what he meant from the context whether you prefer to call IR in compilers or data-flow in distributed systems. You could very well say lets use this terminology to have a common understanding instead of saying go learn the basic concepts. Before building a cool brand, its important to build a healthy community. Anirudh On Wed, May 15, 2019 at 12:03 AM Junru Shao <junrushao1...@gmail.com> wrote: > Hi Pedro, > > I really appreciate that a diligent and talented engineer eagerly wants to > improve our system, and am very thankful that you have done so much for our > community. However, I do want to mention some points that I believe I > should mention. > > While I agree with Tianqi that every design has its pros and cons, I would > love to emphasize that a *good taste* of system design is to optimize the > bottleneck, enhance expressiveness (and usability), i.e. to do what needs > doing, rather than *trivial nits* that are irrelevant to either performance > or expressiveness. Generally speaking, typed or untyped, shared_ptr or > unique_ptr, won't affect the overall performance when it comes to deep > learning workload, specially when we have an async scheduler that does good > latency hiding in MXNet - to me, these are not major issues that are worth > re-designing our entire system. > > To benefit users - real-world ML practitioners, the most thing I would love > to mention is that dataflow graph-based representation is increasingly > incapable of modern neural networks, because the increasingly appeared > structures like arbitrary control flow (w/ continue, break, etc), > recursion, type conjunction and disjunction, etc. These issues will be our > priority to address, which is brought by Relay, which addresses all these > pain points. > > Another minor thing I would love to humbly mention is that, for sake of our > brand, it is our responsibility to be professional about terminologies when > writing an official proposal on Confluence. As one of the numerous > examples, the title of the proposal really shocks me for a while, something > like "operators graph" blah blah so weird. Educate me if I were wrong, but > compiler community would prefer the term "intermediate representation", and > distributed system community would prefer "dataflow graph". If you don't > have knowledge in these fields, a better way for efficient communication is > to get yourself first familiarize the most basic concepts and then do > discussion. This is a way to save your own valuable time as well. > > Again, thank you so much for your hard work, and hope that we could work > together to win customers in the future :-) > > Thanks, > Junru > > > On Tue, May 14, 2019 at 8:03 PM Tianqi Chen <tqc...@cs.washington.edu> > wrote: > > > The core part of the proposal is to move the graph to be much more > strongly > > typed template class. > > I think this is mainly a point of engineering taste, and both sides have > > pros and cons, let me list them before I share my thoughts on this issue: > > > > - Typed fields certainly enjoy more compile-time type checking, on the > > other hand, it is hard to expose > > template of explosive possibilities to frontend languages. > > - More type-erased fields provide runtime flexibility to store > polymorphic > > types as well as extensible attributes for graph optimization > > - It is hard to use a virtual class to expose every possible attribute > > that an operator might have, such as inlining, storage pattern, gradient > > etc.. > > - The nature of supporting a growing set of operator attribute > requires a > > type-erased attrs field. > > - In contrast to your argument(typing is a blocker to features), > > type-erased or typed code can both get to the same feature except, except > > that > > typed code gets more compile-time errors while type-erased get some of > > them in runtime. > > - Templatized data structures will likely introduce additional metal > > burdens to developers and are not really suitable as a core data > structure > > - Because they imply an explosive number of possible data structures, > > while the core data structure should be a single one. > > > > Now my view(as an MXNet PMC member) on typed vs type-erased style: If > MXNet > > is a pure C++ project, I might take more of the typed approach. > > However, MXNet itself is a project that takes python/scala/clojure and > > other frontend languages. > > The introduction of more typing may not align with the original goal as > the > > tradeoffs I listed above. > > > > This proposal is really a drastic change of what NNVM does, as well as > the > > optimization passes, and given the scope, in your analogy, "a new vehicle > > to solve all the problems" > > rather than a minor patch. It will take a lot of engineering effort to > > bring in new features and adapting the existing ones. > > Because of that, it does merit a discussion about how shall we think > about > > the future MXNet2.0. > > > > Technically Relay is a serious candidate. Of course relay, as well as its > > core, is in C++ but maintains the multi-language first principle, that is > > why the example code was in python. > > See more related discussion comparing NNVMv1 and relay: > > https://discuss.tvm.ai/t/any-materials-of-relay-for-beginners/2392/5 > > > > I think the ideal graph data structure candidate for MXNet2.0 should have > > natural support for: > > - Native support of function, module, and recursions > > - Control flows > > - The ability of interpolation with multi-language frontend, e.g. being > > able to prototype graph optimizations in python/scala/clojure if needed. > > > > Adding these support needs significant engineering effort, and I do hope > we > > only have to do it once. While I don't want to force any conclusion here, > > I do think Relay is one such candidate. > > > > Tianqi > > > > > > On Tue, May 14, 2019 at 5:58 PM Pedro Larroy < > pedro.larroy.li...@gmail.com > > > > > wrote: > > > > > Hi Tianqi > > > > > > Thanks for the quick response. > > > > > > Could you point to examples where graph.h is being exposed which would > > > not be possible with what I propose? I don't think my proposal is > > > having any impact in language bindings, and the way I describe it > > > doesn't affect having or not having higher language bindings. Please > > > elaborate so I can understand your concern. Maybe code examples where > > > the graph attributes are being changed from Python? I don't think we > > > have this on MXNet. This is such a core foundation for MXNet, that I > > > don't think we should compromise on it because other project not > > > directly related to MXNet might want to expose some untyped graph and > > > Node attributes. The current status makes maintaining the code very > > > painful and also is preventing desired features such as higher order > > > gradients to be developed. I have heard from you many times how speed > > > is critical for us to innovate in this quickly changing field. > > > > > > My proposal is limited to the graph and wouldn't change the way > > > operators are registered and arguments are processed for operators for > > > example. > > > > > > > > > Regarding the second point, the documentation about Relay in the web > > > which I found for example: > > > > > > https://docs.tvm.ai/dev/relay_add_op.html# > > > > > > Is somebody working on making Imperative::Backward use this API? this > > > would be a big change which I'm not aware of. And using an IR is of a > > > much bigger scope than the change I'm proposing here for example. > > > > > > I think I'm having difficulty understanding what are the arguments > > > here. I'm saying I need to change one piece of my car and what you are > > > selling me is a new vehicle here? Or your suggestion that we use > > > Relay for the graph passes in MXNet? > > > > > > I would like to see C++ code examples, Python examples are not > > > sufficient when we talk about the core MXNet. > > > > > > Pedro. > > > > > > > > > > > > > > > > > > > > > On Tue, May 14, 2019 at 5:39 PM Tianqi Chen <tqc...@cs.washington.edu> > > > wrote: > > > > > > > > Thanks for the proposal. Let me share some of my thoughts: > > > > > > > > Specific comments on the proposal > > > > ----------------------------------------------- > > > > The heavy use of generic in the Graph type was a huge departure from > > > > type-erased data structure which was presented in the previous > design. > > > > While we understand the advantage of typed language(more compile-time > > > > checking) and type-erased types(more dynamism) the heavy use of > > > > the template will actually make the project solely C++ focused, > making > > it > > > > hard to expose intermediate(templatized) data structure to > > > > other languages like python/scala/clojure. > > > > > > > > While I fully understand some of the lessons taught in programming > > > > C++(reduce shared_ptr, more typing etc.) > > > > We need to think about the context of MXNet project and **the need to > > > > support multi-language as a first-class**. > > > > Some of the type-erased types are design trade-offs made to support > > these > > > > features, and we need to think more > > > > carefully instead of just applying "rules for C++" which may bring > > > problems. > > > > > > > > Future of NNVM > > > > ---------------------- > > > > Given that this thread touched upon what we should do for better > > > > computational graph handling. I would recommend also to take a look > at > > > > NNVMv2 -- relay. > > > > > > > > Relay addresses many of the wish-lists in the proposal already, such > as > > > > operator fusion, high order gradient, offload to hardware, isolated > > > > compilation, deployment on edge and accelerators etc. > > > > Relay also address problems not yet being mentioned in the proposal, > > > > including control flow and dynamic runtime, automatic layout > > optimization > > > > etc. > > > > > > > > Tianqi > > > > > > > > On Tue, May 14, 2019 at 5:06 PM Sheng Zha <zhash...@apache.org> > wrote: > > > > > > > > > Hi Pedro, > > > > > > > > > > Thanks for taking the inititaive. Skimming through the design doc, > I > > > > > didn't see comparison with existing solutions such as relay in tvm, > > > which > > > > > is already a dependency of mxnet already. Could you elaborate on > > > comparison > > > > > with existing solutions in the design doc too? > > > > > > > > > > -sz > > > > > > > > > > On 2019/05/14 23:49:30, Pedro Larroy <pedro.larroy.li...@gmail.com > > > > > > > wrote: > > > > > > Hi dev@ > > > > > > > > > > > > As a result of my deep dives on the graph machinery I have > created > > a > > > > > > new proposal to improve the operator graph in MXNet. > > > > > > > > > > > > This would mean superseding the use of NNVM Graph in MXNet and > > having > > > > > > a new implementation that we can use to simplify a lot of code > and > > do > > > > > > powerful graph manipulation and passes such as operator fusion > and > > > > > > other optimizations. > > > > > > > > > > > > As it would be a change with big impact and ramifications, your > > > > > > thoughts and feedback on the document would be highly appreciated > > so > > > > > > we can take potential future interesting use cases: > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0 > > > > > > > > > > > > Pedro. > > > > > > > > > > > > > > > > >