Hi Zach, Thank you for raising these points! I am happy to offer more reading materials about this topic.
*SSA vs ANF.* ANF and SSA are essentially the same thing [1]. *AD in Relay.* Relay is able to do AD through not only control flow, but also various data structures and higher-order functjon [2]. [1] Appel, Andrew W. "SSA is functional programming." *ACM SIGPLAN Notices* 33.4 (1998): 17-20. [2] Roesch, Jared, et al. "Relay: a new IR for machine learning frameworks." *Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages*. ACM, 2018. On Wed, May 15, 2019 at 12:01 PM Zach Kimberg <zachary.kimb...@gmail.com> wrote: > I would like to raise another option to get back on the topic of changing > the Operator graph structure. On the page discussing Relay IR [1], it > discusses mainly the difference between a data flow graph like we use now > and A-normal [2] which is used in some functional compilers. Is there a > reason we do not want to use a structure based on Single Static Assignment > Form (wikipedia explanation [3], lecture note explanation [4]). It is used > almost universally in the compiler community including in LLVM (clang), > GCC, Oracle JVM, PyPy, Go, Webkit, and Swift [5]. The major reason behind > it's pervasiveness is that it has proven very effective for analysis and > transformations when dealing with control flow. > > One possible concern is that it might make automatic differentiation more > difficult [6]. While it certainly is more complicated than a pure > functional approach, the functional approach requires users to use > functional programming. Especially with the languages we support now, that > doesn't seem like a reasonable assumption. Given that the users are already > introducing the complexity inherent in imperative programming, we have to > deal with the increased complexity regardless. I think it might be easier > to have the tools to deal with that rather than attempting to coerce users > into a different programming paradigm or convert code between paradigms. > Furthermore, this may become more important if users are increasingly > making use of control flow like Junru said. > > Zach > > > [1] - https://docs.tvm.ai/dev/relay_intro.html > [2] - https://en.wikipedia.org/wiki/A-normal_form > [3] - https://en.wikipedia.org/wiki/Static_single_assignment_form > [4] - https://www.cs.cmu.edu/~rjsimmon/15411-f15/lec/10-ssa.pdf > [5] - > > https://en.wikipedia.org/wiki/Static_single_assignment_form#Compilers_using_SSA_form > [6] - https://discuss.tvm.ai/t/choice-about-ir-ssa-or-anf/1757/2 > > On Wed, May 15, 2019 at 11:51 AM Naveen Swamy <mnnav...@gmail.com> wrote: > > > Being dismissive and condescending has been exactly what is plaguing this > > project. > > > > I agree the last paragraph sounds very condescending and very dismissive > > and it breaks many code of conducts listed. > > > > On Wed, May 15, 2019 at 11:31 AM Anirudh Subramanian < > > anirudh2...@gmail.com> > > wrote: > > > > > Hi Junru, > > > > > > Overall, I appreciate the points you made about the proposal. > > > > > > Having said that, I would like to remind the Apache Code of Conduct : > > > https://www.apache.org/foundation/policies/conduct. > > > "Be empathetic, welcoming, friendly and patient". > > > > > > I find your tone condescending. Clearly you understand what he meant > from > > > the context whether you prefer to call IR in compilers or data-flow in > > > distributed systems. You could very well say lets use this terminology > to > > > have a common understanding instead of saying go learn the basic > > concepts. > > > Before building a cool brand, its important to build a healthy > community. > > > > > > Anirudh > > > > > > > > > On Wed, May 15, 2019 at 12:03 AM Junru Shao <junrushao1...@gmail.com> > > > wrote: > > > > > > > Hi Pedro, > > > > > > > > I really appreciate that a diligent and talented engineer eagerly > wants > > > to > > > > improve our system, and am very thankful that you have done so much > for > > > our > > > > community. However, I do want to mention some points that I believe I > > > > should mention. > > > > > > > > While I agree with Tianqi that every design has its pros and cons, I > > > would > > > > love to emphasize that a *good taste* of system design is to optimize > > the > > > > bottleneck, enhance expressiveness (and usability), i.e. to do what > > needs > > > > doing, rather than *trivial nits* that are irrelevant to either > > > performance > > > > or expressiveness. Generally speaking, typed or untyped, shared_ptr > or > > > > unique_ptr, won't affect the overall performance when it comes to > deep > > > > learning workload, specially when we have an async scheduler that > does > > > good > > > > latency hiding in MXNet - to me, these are not major issues that are > > > worth > > > > re-designing our entire system. > > > > > > > > To benefit users - real-world ML practitioners, the most thing I > would > > > love > > > > to mention is that dataflow graph-based representation is > increasingly > > > > incapable of modern neural networks, because the increasingly > appeared > > > > structures like arbitrary control flow (w/ continue, break, etc), > > > > recursion, type conjunction and disjunction, etc. These issues will > be > > > our > > > > priority to address, which is brought by Relay, which addresses all > > these > > > > pain points. > > > > > > > > Another minor thing I would love to humbly mention is that, for sake > of > > > our > > > > brand, it is our responsibility to be professional about > terminologies > > > when > > > > writing an official proposal on Confluence. As one of the numerous > > > > examples, the title of the proposal really shocks me for a while, > > > something > > > > like "operators graph" blah blah so weird. Educate me if I were > wrong, > > > but > > > > compiler community would prefer the term "intermediate > representation", > > > and > > > > distributed system community would prefer "dataflow graph". If you > > don't > > > > have knowledge in these fields, a better way for efficient > > communication > > > is > > > > to get yourself first familiarize the most basic concepts and then do > > > > discussion. This is a way to save your own valuable time as well. > > > > > > > > Again, thank you so much for your hard work, and hope that we could > > work > > > > together to win customers in the future :-) > > > > > > > > Thanks, > > > > Junru > > > > > > > > > > > > On Tue, May 14, 2019 at 8:03 PM Tianqi Chen < > tqc...@cs.washington.edu> > > > > wrote: > > > > > > > > > The core part of the proposal is to move the graph to be much more > > > > strongly > > > > > typed template class. > > > > > I think this is mainly a point of engineering taste, and both sides > > > have > > > > > pros and cons, let me list them before I share my thoughts on this > > > issue: > > > > > > > > > > - Typed fields certainly enjoy more compile-time type checking, on > > the > > > > > other hand, it is hard to expose > > > > > template of explosive possibilities to frontend languages. > > > > > - More type-erased fields provide runtime flexibility to store > > > > polymorphic > > > > > types as well as extensible attributes for graph optimization > > > > > - It is hard to use a virtual class to expose every possible > > > attribute > > > > > that an operator might have, such as inlining, storage pattern, > > > gradient > > > > > etc.. > > > > > - The nature of supporting a growing set of operator attribute > > > > requires a > > > > > type-erased attrs field. > > > > > - In contrast to your argument(typing is a blocker to features), > > > > > type-erased or typed code can both get to the same feature except, > > > except > > > > > that > > > > > typed code gets more compile-time errors while type-erased get > some > > > of > > > > > them in runtime. > > > > > - Templatized data structures will likely introduce additional > metal > > > > > burdens to developers and are not really suitable as a core data > > > > structure > > > > > - Because they imply an explosive number of possible data > > > structures, > > > > > while the core data structure should be a single one. > > > > > > > > > > Now my view(as an MXNet PMC member) on typed vs type-erased style: > If > > > > MXNet > > > > > is a pure C++ project, I might take more of the typed approach. > > > > > However, MXNet itself is a project that takes python/scala/clojure > > and > > > > > other frontend languages. > > > > > The introduction of more typing may not align with the original > goal > > as > > > > the > > > > > tradeoffs I listed above. > > > > > > > > > > This proposal is really a drastic change of what NNVM does, as well > > as > > > > the > > > > > optimization passes, and given the scope, in your analogy, "a new > > > vehicle > > > > > to solve all the problems" > > > > > rather than a minor patch. It will take a lot of engineering effort > > to > > > > > bring in new features and adapting the existing ones. > > > > > Because of that, it does merit a discussion about how shall we > think > > > > about > > > > > the future MXNet2.0. > > > > > > > > > > Technically Relay is a serious candidate. Of course relay, as well > as > > > its > > > > > core, is in C++ but maintains the multi-language first principle, > > that > > > is > > > > > why the example code was in python. > > > > > See more related discussion comparing NNVMv1 and relay: > > > > > > https://discuss.tvm.ai/t/any-materials-of-relay-for-beginners/2392/5 > > > > > > > > > > I think the ideal graph data structure candidate for MXNet2.0 > should > > > have > > > > > natural support for: > > > > > - Native support of function, module, and recursions > > > > > - Control flows > > > > > - The ability of interpolation with multi-language frontend, e.g. > > being > > > > > able to prototype graph optimizations in python/scala/clojure if > > > needed. > > > > > > > > > > Adding these support needs significant engineering effort, and I do > > > hope > > > > we > > > > > only have to do it once. While I don't want to force any conclusion > > > here, > > > > > I do think Relay is one such candidate. > > > > > > > > > > Tianqi > > > > > > > > > > > > > > > On Tue, May 14, 2019 at 5:58 PM Pedro Larroy < > > > > pedro.larroy.li...@gmail.com > > > > > > > > > > > wrote: > > > > > > > > > > > Hi Tianqi > > > > > > > > > > > > Thanks for the quick response. > > > > > > > > > > > > Could you point to examples where graph.h is being exposed which > > > would > > > > > > not be possible with what I propose? I don't think my proposal is > > > > > > having any impact in language bindings, and the way I describe it > > > > > > doesn't affect having or not having higher language bindings. > > Please > > > > > > elaborate so I can understand your concern. Maybe code examples > > > where > > > > > > the graph attributes are being changed from Python? I don't > think > > we > > > > > > have this on MXNet. This is such a core foundation for MXNet, > that > > I > > > > > > don't think we should compromise on it because other project not > > > > > > directly related to MXNet might want to expose some untyped graph > > and > > > > > > Node attributes. The current status makes maintaining the code > > very > > > > > > painful and also is preventing desired features such as higher > > order > > > > > > gradients to be developed. I have heard from you many times how > > speed > > > > > > is critical for us to innovate in this quickly changing field. > > > > > > > > > > > > My proposal is limited to the graph and wouldn't change the way > > > > > > operators are registered and arguments are processed for > operators > > > for > > > > > > example. > > > > > > > > > > > > > > > > > > Regarding the second point, the documentation about Relay in the > > web > > > > > > which I found for example: > > > > > > > > > > > > https://docs.tvm.ai/dev/relay_add_op.html# > > > > > > > > > > > > Is somebody working on making Imperative::Backward use this API? > > this > > > > > > would be a big change which I'm not aware of. And using an IR is > > of a > > > > > > much bigger scope than the change I'm proposing here for example. > > > > > > > > > > > > I think I'm having difficulty understanding what are the > arguments > > > > > > here. I'm saying I need to change one piece of my car and what > you > > > are > > > > > > selling me is a new vehicle here? Or your suggestion that we use > > > > > > Relay for the graph passes in MXNet? > > > > > > > > > > > > I would like to see C++ code examples, Python examples are not > > > > > > sufficient when we talk about the core MXNet. > > > > > > > > > > > > Pedro. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, May 14, 2019 at 5:39 PM Tianqi Chen < > > > tqc...@cs.washington.edu> > > > > > > wrote: > > > > > > > > > > > > > > Thanks for the proposal. Let me share some of my thoughts: > > > > > > > > > > > > > > Specific comments on the proposal > > > > > > > ----------------------------------------------- > > > > > > > The heavy use of generic in the Graph type was a huge departure > > > from > > > > > > > type-erased data structure which was presented in the previous > > > > design. > > > > > > > While we understand the advantage of typed language(more > > > compile-time > > > > > > > checking) and type-erased types(more dynamism) the heavy use of > > > > > > > the template will actually make the project solely C++ focused, > > > > making > > > > > it > > > > > > > hard to expose intermediate(templatized) data structure to > > > > > > > other languages like python/scala/clojure. > > > > > > > > > > > > > > While I fully understand some of the lessons taught in > > programming > > > > > > > C++(reduce shared_ptr, more typing etc.) > > > > > > > We need to think about the context of MXNet project and **the > > need > > > to > > > > > > > support multi-language as a first-class**. > > > > > > > Some of the type-erased types are design trade-offs made to > > support > > > > > these > > > > > > > features, and we need to think more > > > > > > > carefully instead of just applying "rules for C++" which may > > bring > > > > > > problems. > > > > > > > > > > > > > > Future of NNVM > > > > > > > ---------------------- > > > > > > > Given that this thread touched upon what we should do for > better > > > > > > > computational graph handling. I would recommend also to take a > > look > > > > at > > > > > > > NNVMv2 -- relay. > > > > > > > > > > > > > > Relay addresses many of the wish-lists in the proposal already, > > > such > > > > as > > > > > > > operator fusion, high order gradient, offload to hardware, > > isolated > > > > > > > compilation, deployment on edge and accelerators etc. > > > > > > > Relay also address problems not yet being mentioned in the > > > proposal, > > > > > > > including control flow and dynamic runtime, automatic layout > > > > > optimization > > > > > > > etc. > > > > > > > > > > > > > > Tianqi > > > > > > > > > > > > > > On Tue, May 14, 2019 at 5:06 PM Sheng Zha <zhash...@apache.org > > > > > > wrote: > > > > > > > > > > > > > > > Hi Pedro, > > > > > > > > > > > > > > > > Thanks for taking the inititaive. Skimming through the design > > > doc, > > > > I > > > > > > > > didn't see comparison with existing solutions such as relay > in > > > tvm, > > > > > > which > > > > > > > > is already a dependency of mxnet already. Could you elaborate > > on > > > > > > comparison > > > > > > > > with existing solutions in the design doc too? > > > > > > > > > > > > > > > > -sz > > > > > > > > > > > > > > > > On 2019/05/14 23:49:30, Pedro Larroy < > > > pedro.larroy.li...@gmail.com > > > > > > > > > > > > > wrote: > > > > > > > > > Hi dev@ > > > > > > > > > > > > > > > > > > As a result of my deep dives on the graph machinery I have > > > > created > > > > > a > > > > > > > > > new proposal to improve the operator graph in MXNet. > > > > > > > > > > > > > > > > > > This would mean superseding the use of NNVM Graph in MXNet > > and > > > > > having > > > > > > > > > a new implementation that we can use to simplify a lot of > > code > > > > and > > > > > do > > > > > > > > > powerful graph manipulation and passes such as operator > > fusion > > > > and > > > > > > > > > other optimizations. > > > > > > > > > > > > > > > > > > As it would be a change with big impact and ramifications, > > your > > > > > > > > > thoughts and feedback on the document would be highly > > > appreciated > > > > > so > > > > > > > > > we can take potential future interesting use cases: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0 > > > > > > > > > > > > > > > > > > Pedro. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >