Hi Thanks for all the materials and keypoints raised. The discussion has many ramifications, I will think about them and research them very carefully before replying further. Please also don't quickly dismiss the points I have raised and reduce them to typed vs untyped or pedantic C++ comments, we have been debugging missing nodes and pointers in the graph when doing second order gradient for weeks with no success due to the design of the graph.
There's 60 years of software development learnings and practice behind some concepts, and compiler theory that deep learning frameworks can also take advantage of instead of rediscovering everything again until we end up in a typed pure functional IR. In some of the materials linked you also point out limitations of the current architecture. I think it's good that we raise this topic and it shows that we need to have a deeper and structured conversation on how we evolve the dataflow graph in MXNet. Maybe you can help cross polinizing this conversation between the TVM and MXNet project. If there's an intention to change from NNVM to NNVM2 I think this should have been communicated or discussed with the community before. Until then. Pedro. On Tue, May 14, 2019 at 8:03 PM Tianqi Chen <tqc...@cs.washington.edu> wrote: > > The core part of the proposal is to move the graph to be much more strongly > typed template class. > I think this is mainly a point of engineering taste, and both sides have > pros and cons, let me list them before I share my thoughts on this issue: > > - Typed fields certainly enjoy more compile-time type checking, on the > other hand, it is hard to expose > template of explosive possibilities to frontend languages. > - More type-erased fields provide runtime flexibility to store polymorphic > types as well as extensible attributes for graph optimization > - It is hard to use a virtual class to expose every possible attribute > that an operator might have, such as inlining, storage pattern, gradient > etc.. > - The nature of supporting a growing set of operator attribute requires a > type-erased attrs field. > - In contrast to your argument(typing is a blocker to features), > type-erased or typed code can both get to the same feature except, except > that > typed code gets more compile-time errors while type-erased get some of > them in runtime. > - Templatized data structures will likely introduce additional metal > burdens to developers and are not really suitable as a core data structure > - Because they imply an explosive number of possible data structures, > while the core data structure should be a single one. > > Now my view(as an MXNet PMC member) on typed vs type-erased style: If MXNet > is a pure C++ project, I might take more of the typed approach. > However, MXNet itself is a project that takes python/scala/clojure and > other frontend languages. > The introduction of more typing may not align with the original goal as the > tradeoffs I listed above. > > This proposal is really a drastic change of what NNVM does, as well as the > optimization passes, and given the scope, in your analogy, "a new vehicle > to solve all the problems" > rather than a minor patch. It will take a lot of engineering effort to > bring in new features and adapting the existing ones. > Because of that, it does merit a discussion about how shall we think about > the future MXNet2.0. > > Technically Relay is a serious candidate. Of course relay, as well as its > core, is in C++ but maintains the multi-language first principle, that is > why the example code was in python. > See more related discussion comparing NNVMv1 and relay: > https://discuss.tvm.ai/t/any-materials-of-relay-for-beginners/2392/5 > > I think the ideal graph data structure candidate for MXNet2.0 should have > natural support for: > - Native support of function, module, and recursions > - Control flows > - The ability of interpolation with multi-language frontend, e.g. being > able to prototype graph optimizations in python/scala/clojure if needed. > > Adding these support needs significant engineering effort, and I do hope we > only have to do it once. While I don't want to force any conclusion here, > I do think Relay is one such candidate. > > Tianqi > > > On Tue, May 14, 2019 at 5:58 PM Pedro Larroy <pedro.larroy.li...@gmail.com> > wrote: > > > Hi Tianqi > > > > Thanks for the quick response. > > > > Could you point to examples where graph.h is being exposed which would > > not be possible with what I propose? I don't think my proposal is > > having any impact in language bindings, and the way I describe it > > doesn't affect having or not having higher language bindings. Please > > elaborate so I can understand your concern. Maybe code examples where > > the graph attributes are being changed from Python? I don't think we > > have this on MXNet. This is such a core foundation for MXNet, that I > > don't think we should compromise on it because other project not > > directly related to MXNet might want to expose some untyped graph and > > Node attributes. The current status makes maintaining the code very > > painful and also is preventing desired features such as higher order > > gradients to be developed. I have heard from you many times how speed > > is critical for us to innovate in this quickly changing field. > > > > My proposal is limited to the graph and wouldn't change the way > > operators are registered and arguments are processed for operators for > > example. > > > > > > Regarding the second point, the documentation about Relay in the web > > which I found for example: > > > > https://docs.tvm.ai/dev/relay_add_op.html# > > > > Is somebody working on making Imperative::Backward use this API? this > > would be a big change which I'm not aware of. And using an IR is of a > > much bigger scope than the change I'm proposing here for example. > > > > I think I'm having difficulty understanding what are the arguments > > here. I'm saying I need to change one piece of my car and what you are > > selling me is a new vehicle here? Or your suggestion that we use > > Relay for the graph passes in MXNet? > > > > I would like to see C++ code examples, Python examples are not > > sufficient when we talk about the core MXNet. > > > > Pedro. > > > > > > > > > > > > > > On Tue, May 14, 2019 at 5:39 PM Tianqi Chen <tqc...@cs.washington.edu> > > wrote: > > > > > > Thanks for the proposal. Let me share some of my thoughts: > > > > > > Specific comments on the proposal > > > ----------------------------------------------- > > > The heavy use of generic in the Graph type was a huge departure from > > > type-erased data structure which was presented in the previous design. > > > While we understand the advantage of typed language(more compile-time > > > checking) and type-erased types(more dynamism) the heavy use of > > > the template will actually make the project solely C++ focused, making it > > > hard to expose intermediate(templatized) data structure to > > > other languages like python/scala/clojure. > > > > > > While I fully understand some of the lessons taught in programming > > > C++(reduce shared_ptr, more typing etc.) > > > We need to think about the context of MXNet project and **the need to > > > support multi-language as a first-class**. > > > Some of the type-erased types are design trade-offs made to support these > > > features, and we need to think more > > > carefully instead of just applying "rules for C++" which may bring > > problems. > > > > > > Future of NNVM > > > ---------------------- > > > Given that this thread touched upon what we should do for better > > > computational graph handling. I would recommend also to take a look at > > > NNVMv2 -- relay. > > > > > > Relay addresses many of the wish-lists in the proposal already, such as > > > operator fusion, high order gradient, offload to hardware, isolated > > > compilation, deployment on edge and accelerators etc. > > > Relay also address problems not yet being mentioned in the proposal, > > > including control flow and dynamic runtime, automatic layout optimization > > > etc. > > > > > > Tianqi > > > > > > On Tue, May 14, 2019 at 5:06 PM Sheng Zha <zhash...@apache.org> wrote: > > > > > > > Hi Pedro, > > > > > > > > Thanks for taking the inititaive. Skimming through the design doc, I > > > > didn't see comparison with existing solutions such as relay in tvm, > > which > > > > is already a dependency of mxnet already. Could you elaborate on > > comparison > > > > with existing solutions in the design doc too? > > > > > > > > -sz > > > > > > > > On 2019/05/14 23:49:30, Pedro Larroy <pedro.larroy.li...@gmail.com> > > > > wrote: > > > > > Hi dev@ > > > > > > > > > > As a result of my deep dives on the graph machinery I have created a > > > > > new proposal to improve the operator graph in MXNet. > > > > > > > > > > This would mean superseding the use of NNVM Graph in MXNet and having > > > > > a new implementation that we can use to simplify a lot of code and do > > > > > powerful graph manipulation and passes such as operator fusion and > > > > > other optimizations. > > > > > > > > > > As it would be a change with big impact and ramifications, your > > > > > thoughts and feedback on the document would be highly appreciated so > > > > > we can take potential future interesting use cases: > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0 > > > > > > > > > > Pedro. > > > > > > > > > > >