I'm speaking under my "MXNet contributor" hat. It will be sad that our new model format and compiler is not supported by our own contributors. It puts us in a bad position to reach out to outside to ask for support.
If you really what to do it with the onnx <-> mxnet way, I suggest putting the codes under https://github.com/aws. Best Mu On Thu, Oct 19, 2017 at 9:51 AM, Lupesko, Hagay <lupe...@gmail.com> wrote: > Since there seems to be a difficulty to reach a consensus here, and this > is a new area, maybe a good compromise would be to contribute this under > /contrib as experimental, with whatever way Roshani thinks makes sense. > Once there is code in place, and MXNet users and contributors are able to > check it out, we can consider future steps. > > Does this proposal make sense to folks? > > On 10/18/17, 23:01, "Tianqi Chen" <workc...@gmail.com on behalf of > tqc...@cs.washington.edu> wrote: > > I want to offer one last thing in terms of technical details. I > mentioned > two trends in the deep learning systems. There is one last thing that > is > omitted. How should we build a good deploy end for deep learning > models. > > There is always a paradox to this problem: > > - On one hand, the deployment end needs to be lightweight and portable. > - We want a lot of optimizations (memory layout compute) and feature > support, this makes the project big. > > All the existing systems suffer from this problem. The solution is > simple, > separates the optimization part from the actual runtime and compiles > the > things down to a bare metal module. And this is the solution nnvm/top > compiler pipeline offer, which I believe will become a standard > practice of > deployment and where all systems go to > > Tianqi > > On Wed, Oct 18, 2017 at 10:03 PM, Tianqi Chen < > tqc...@cs.washington.edu> > wrote: > > > OK, there is some miscommunication in here I guess. We only need to > do a > > "canonization" step in python API that goes a symbol to symbol > translation > > layer. It can be done in purely in python, and there is no need for > going > > "down" into c++ to do this. > > > > For example, the current nnvm.from_mxnet API takes Module or Gluon > module > > and get you back nnvm/top graph in python. > > > > All we are asking for is to decomposing it into > > > > def mxnet_to_onnx(module): > > nnvm_graph, params = nnvm_from_mxnet(module) > > onnx = nnvm_to_onnx(nnvm_graph, params) > > return onnx > > > > This allows nnvm_from_mxnet to be reused for other purposes, like > > compiling API to deployable modules > > > > Tianqi > > > > On Wed, Oct 18, 2017 at 9:55 PM, Lupesko, Hagay <lupe...@gmail.com> > wrote: > > > >> Tianqi: > >> Thanks for detailing the trends. I fully agree that ONNX is just a > graph > >> serialization format – nothing more, nothing less. I also think we > all > >> agree that this simple mechanism holds lots of value to DL users > since it > >> allows them to move between frameworks easily (e.g. train with > MXNet, > >> deploy on a mobile device with Caffe2, or the other way around). > >> As you said, In Memory IR is different than serialization formats > such as > >> ONNX. They are designed to make the runtime execution as efficient > as > >> possible, leveraging software and hardware optimizations. They are > indeed > >> complex, and where the “meat” is. > >> (BTW ONNX regards itself as an “IR” format, but not in the same > sense as > >> NNVM). > >> > >> At the end of the day, Roshani is aiming to deliver a simple > >> functionality to MXNet users: (1) take an ONNX file, and load it > into MXNet > >> so you get a graph+weights you can work with (2) Given a trained > model, > >> save it as an ONNX file. Since MXNet users do not interact with NNVM > >> directly, but rather interact with MXNet API (MXNet Module), isn’t > the > >> simplest thing to do is just to construct the Module “on the fly” > using > >> MXNet API? Taking the other approach, we will go from the top level > MXNet > >> “load” API, go “down” to NNVM to construct the graph, go back up to > MXNet > >> to expose it as a Module. This seems to complex and does not add any > >> benefit. In whatever way we construct the MXNet Module object, NNVM > will > >> always be the underlying in memory IR that is being executed, so > why not > >> take the simpler route? > >> > >> Hagay > >> > >> On 10/18/17, 19:42, "Tianqi Chen" <workc...@gmail.com on behalf of > >> tqc...@cs.washington.edu> wrote: > >> > >> Hi Chris: > >> > >> There is no intention to move things away from mxnet. The > reduction of > >> lines of code by having a better design in general, and > usually, you > >> write > >> less redundant code by benefiting from better design. As I may > quote: > >> "the > >> best design is not achieved not when there is nothing to add, > but when > >> there is nothing to be taken away." > >> > >> MXNet has always benefited from this philosophy and improves > with the > >> new > >> designs and proper modularization. For example, we see such > reduction > >> and > >> convenience happening when migrating from MXNet's legacy op to > the > >> NNVM's mechanism. The new mechanism now enables things like > sparse > >> aware > >> support and other stuff which would be much harder to support. > >> > >> The nnvm/tvm stack comes brings the same benefit(if not more) > and it > >> will > >> only add more features to MXNet itself. Offering more hardware > >> backends and > >> optimization, allowing us to write less code and spent less > time to > >> optimize for each backend by going through TVM > >> > >> Tianqi > >> > >> On Wed, Oct 18, 2017 at 7:15 PM, Chris Olivier < > cjolivie...@gmail.com > >> > > >> wrote: > >> > >> > Reduce code base of mxnet? By increasing scope of the dmlc > modules? > >> Is the > >> > intent to make mxnet a thin language wrapper around a group > of dmlc > >> > modules? > >> > > >> > > >> > On Wed, Oct 18, 2017 at 6:58 PM Tianqi Chen < > >> tqc...@cs.washington.edu> > >> > wrote: > >> > > >> > > To better answer Hagay's question, I would like to dive > down a > >> bit deeper > >> > > on the relation between MXNet, NNVM and model exchange > format > >> like ONNX. > >> > > > >> > > There are two major trends in deep learning systems now: > >> > > > >> > > - Common serializable formats, like ONNX and CoreML, that > defines > >> the > >> > model > >> > > exchange format. > >> > > - The in-memory graph IR for quick optimization and JIT. > NNVM, > >> > Tensorflow's > >> > > XLA falls into this category. > >> > > > >> > > The exchange formats are great, it only poses a layer of > >> conversion, > >> > which > >> > > is good for exchange. The real meat still comes from the > >> compilation and > >> > > JIT pipeline you have to offer. For that, we will need an > >> in-memory IR, > >> > > because of the cost of constructing, serialize could be > high for > >> the > >> > > exchange formats like protobuf. And usually, the exchange > >> formats are > >> > > designed in a minimalistic fashion, making it less easy to > extend > >> more > >> > > information to support in-depth optimization like automatic > >> quantization, > >> > > accelerator support. > >> > > > >> > > The current MXNet relies on NNVM for in-memory IR > manipulation > >> but does > >> > not > >> > > contain a compilation component that compiles to the > hardware > >> backends. > >> > > Doing export to an exchange format and then back into NNVM > run the > >> > > compilation poses too much burden that JIT compiler could > pay. > >> Using the > >> > > same in-memory graph IR as the compilation stack give much > more > >> advantage > >> > > in terms of this. > >> > > > >> > > The newly introduces nnvm/top and compiler offers in-memory > graph > >> > > optimization and compilation and offers more hardware > backend > >> directly > >> > via > >> > > TVM. We already see promising results in edge deployments > with a > >> much > >> > lower > >> > > overhead of runtime. We will further benefit quickly from > more > >> graph > >> > > optimizations that it has to offer. > >> > > > >> > > Building support around this new paradigm offers us > advantage of > >> being > >> > > future compatible and takes full benefit of the points I > >> mentioned above > >> > > > >> > > Tianqi > >> > > > >> > > > >> > > > >> > > On Wed, Oct 18, 2017 at 4:57 PM, Lupesko, Hagay < > >> lupe...@gmail.com> > >> > wrote: > >> > > > >> > > > Roshani – this is an exciting initiative, ONNX support on > MXNet > >> will > >> > > > enable more users to ramp up on MXNet, which is great. > >> > > > > >> > > > Tianqi – a few questions and thoughts about your note: > >> > > > - “More hardware backends to mxnet” – MXNet users get the > same > >> benefit > >> > of > >> > > > HW support implementing ONNX import on top of MXNet > symbolic, > >> right? > >> > > > - “NNVM Compiler now received contributions from AWS, UW > and > >> many other > >> > > > folks in MXNet community.” – agreed it is ramping up, but > when > >> you look > >> > > at > >> > > > the data, it is clear that it is very early on for NNVM. > >> Looking at the > >> > > > repo, it has overall 223 commits, 0 releases. Compare it > to > >> MXNet with > >> > > 6136 > >> > > > commits and 32 releases. It seems to be still early on for > >> NNVM, and > >> > for > >> > > a > >> > > > more reliable initial implementation building the import > on top > >> of > >> > MXNet > >> > > is > >> > > > easier, faster and safer. MXNet has lots of users already > using > >> the > >> > > > Symbolic API which hopefully mean that is a mature API > that is > >> not > >> > likely > >> > > > to have breaking changes or major issues. > >> > > > > >> > > > I’m supportive option 1 proposed by Roshani (building > serde on > >> top of > >> > > > MXNet symbolic), but to do it as an encapsulated > implementation > >> detail, > >> > > so > >> > > > the implementation can be migrated to NNVM or another > >> implementation in > >> > > the > >> > > > future, if at that point it seems like the right thing to > do. > >> > > > > >> > > > Interested in hearing other opinions though… > >> > > > > >> > > > Hagay > >> > > > > >> > > > On 10/18/17, 14:13, "Tianqi Chen" <workc...@gmail.com on > >> behalf of > >> > > > tqc...@cs.washington.edu> wrote: > >> > > > > >> > > > I am strongly recommending going through the > nnvm/top. One > >> major > >> > > > reason in > >> > > > here is that the support of nnvm/top layer NOT ONLY > mean > >> > > compatibility > >> > > > of > >> > > > model format with onnx. These are the major benefits: > >> > > > > >> > > > > >> > > > - More hardware backends to mxnet, including opencl, > metal, > >> > Raspberry > >> > > > Pi, > >> > > > web browser. These things are automatically enabled > by going > >> > through > >> > > > this > >> > > > layer. In general, we design nnvm/tvm stack to > resolve the > >> > challenge > >> > > of > >> > > > current mxnet's weakness in terms deploying to more > hardware > >> > > backends. > >> > > > > >> > > > - More frontend capabilities, nnvm's gluon style IR > ingests > >> now > >> > from > >> > > > CoreML, ONNX and in future keras. Supporting those > will > >> reduce the > >> > > > amount > >> > > > of engineering effort needed. > >> > > > > >> > > > - Future compatibility. We all agree that the future > being > >> migrated > >> > > to > >> > > > gluon's API. NNVM/top tries to look ahead by directly > >> adopting the > >> > > > symbolic > >> > > > API to be gluon. > >> > > > > >> > > > > >> > > > I would also like to correct some of the mentioned > facts > >> with > >> > regard > >> > > to > >> > > > nnvm/tvm stack > >> > > > > >> > > > 1. Nascent project with few contributors > >> > > > > >> > > > NNVM Compiler now received contributions from AWS, UW > and > >> many > >> > other > >> > > > folks > >> > > > in MXNet community. NNVM itself is already being used > by > >> MXNet. > >> > > > MXNet's internal IR is migrating toward gluon, and its > >> final form > >> > > being > >> > > > nnvm/top > >> > > > > >> > > > 3. Does not support all operators that exist in > MXNet > >> Symbolic > >> > API > >> > > > > >> > > > Neither NNVM/top or onnx support all operators that > exist > >> in mxnet > >> > > > symbolic > >> > > > API. The end goal here is mainly to make nnvm/top onnx > >> compatible, > >> > > > which is > >> > > > a more reasonable goal. > >> > > > > >> > > > 4. No CI Pipeline and testcases > >> > > > > >> > > > NNVM already contains a compiler contains unittests > and ci > >> tested > >> > > with > >> > > > integration https://github.com/dmlc/nnvm, with a CI > >> pipline that > >> > is > >> > > > well > >> > > > tested on CPU and GPU cases for front-ends. > >> > > > > >> > > > Tianqi > >> > > > > >> > > > > >> > > > On Wed, Oct 18, 2017 at 1:41 PM, Roshani Nagmote < > >> > > > roshaninagmo...@gmail.com> > >> > > > wrote: > >> > > > > >> > > > > Hi guys, > >> > > > > > >> > > > > > >> > > > > I am working on supporting ONNX < > >> https://github.com/onnx/onnx> > >> > > > pre-trained > >> > > > > models in Apache MXNet and would like to seek your > >> opinion on the > >> > > > choice of > >> > > > > implementation. I also have created a GitHub issue > >> > > > > <https://github.com/apache/ > incubator-mxnet/issues/8319>. > >> > > Supporting > >> > > > ONNX > >> > > > > in > >> > > > > MXNet will enable users to move between frameworks > with > >> their > >> > > > models, this > >> > > > > will also enable MXNet project to be a part of the > ONNX > >> open > >> > > > standard and > >> > > > > steer the direction of ONNX. > >> > > > > > >> > > > > > >> > > > > For those who don’t know ONNX, ONNX is an open > source > >> format for > >> > AI > >> > > > models > >> > > > > which enables models to be transferred between > >> frameworks. Refer > >> > to > >> > > > > https://github.com/onnx/onnx for more details. > >> > > > > > >> > > > > > >> > > > > To implement the import/export functionality in > MXNet, I > >> propose > >> > to > >> > > > expose > >> > > > > a MXNet python module “serde”(name taken from > Apache Hive > >> > project) > >> > > > with the > >> > > > > following methods supporting different formats: > >> > > > > > >> > > > > sym, params = mxnet.serde.import(other_format_file, > >> > > > other_format=‘onnx’) > >> > > > > > >> > > > > other_format_file = mxnet.serde.export(mxnet_sym, > >> mxnet_params, > >> > > > ‘onnx’) > >> > > > > > >> > > > > > >> > > > > The implementation under the hood can be done in > two ways: > >> > > > > > >> > > > > > >> > > > > 1) Implement at the MXNet layer by parsing the ONNX > >> model(in > >> > > protobuf > >> > > > > format) and turn into MXNet Symbolic operators and > build > >> MXNet > >> > > model > >> > > > > directly. Similarly, I can convert the MXNet model > to > >> ONNX format > >> > > at > >> > > > this > >> > > > > layer. > >> > > > > > >> > > > > > >> > > > > 2) The DMLC community has released the nnvm/tvm > complier > >> and an > >> > > > > intermediate representation of the models, refer: > >> > > > > http://www.tvmlang.org/2017/ > 10/06/nnvm/tvm-compiler- > >> > > > announcement.html > >> > > > > <http://www.tvmlang.org/2017/10/06/nnvm-compiler- > >> > announcement.html > >> > > > > >> > > > > > >> > > > > Based on the conversation on the GitHub issue > >> > > > > <https://github.com/apache/ > incubator-mxnet/issues/8319> I > >> > opened, > >> > > Mu > >> > > > > mentioned that MXNet would use nnvm/tvm as the > backend in > >> the > >> > > future. > >> > > > > > >> > > > > > >> > > > > We could hook into this layer to implement the > >> import/export > >> > > > functionality. > >> > > > > nnvm/tvm has ONNX 0.1 version import implemented. > >> > > > > > >> > > > > For import, > >> > > > > > >> > > > > 1. > >> > > > > > >> > > > > I will need to enhance nnvm/tvm’s importer to > support > >> ONNX 0.2 > >> > > > > 2. > >> > > > > > >> > > > > Implement nnvm/tvm->mxnet symbolic operators. > >> > > > > > >> > > > > For export: > >> > > > > > >> > > > > > >> > > > > 1. > >> > > > > > >> > > > > mxnet->nnvm/tvm ( nnvm/tvm provides this > implementation > >> > already) > >> > > > > 2. > >> > > > > > >> > > > > I will need to Implement nnvm/tvm>onnx. > >> > > > > > >> > > > > > >> > > > > These are the pros and cons I see in the above > approaches: > >> > > > > > >> > > > > 1. > >> > > > > > >> > > > > Import/export at mxnet layer > >> > > > > > >> > > > > Pros: > >> > > > > > >> > > > > 1. > >> > > > > > >> > > > > Stable APIs currently used by users. > >> > > > > 2. > >> > > > > > >> > > > > Larger Apache MXNet community of contributors. > >> > > > > 3. > >> > > > > > >> > > > > CI pipeline to catch bugs. > >> > > > > 4. > >> > > > > > >> > > > > Comparatively less time to implement and put it > in the > >> hands > >> > of > >> > > > the > >> > > > > users. > >> > > > > > >> > > > > Cons: > >> > > > > > >> > > > > 1. > >> > > > > > >> > > > > In the future we may have to reimplement at the > >> nnvm/tvm > >> > layer, > >> > > > in case > >> > > > > MXNet moves to the nnvm/tvm backend(assuming it > will > >> move). > >> > > > > > >> > > > > > >> > > > > > >> > > > > 1. > >> > > > > > >> > > > > Import/export at nnvm/tvm layer > >> > > > > > >> > > > > Pros: > >> > > > > > >> > > > > 1. > >> > > > > > >> > > > > Less engineering work in case mxnet moves to > nnvm/tvm > >> > > > > 2. > >> > > > > > >> > > > > nnvm/tvm would become a hub to convert to > different > >> formats. > >> > > > > 3. > >> > > > > > >> > > > > nnvm operators are more in parity with mxnet’s > gluon > >> APIs this > >> > > > could be > >> > > > > useful in case Gluon becomes the only standard > that > >> MXNet will > >> > > > support. > >> > > > > > >> > > > > Cons: > >> > > > > > >> > > > > 1. > >> > > > > > >> > > > > Nascent project with few contributors > >> > > > > 2. > >> > > > > > >> > > > > Does not support all operators that exist in > MXNet > >> Symbolic > >> > API > >> > > > > 3. > >> > > > > > >> > > > > No CI Pipeline > >> > > > > 4. > >> > > > > > >> > > > > Current Apache MXNet project does not use > nnvm/tvm > >> backend > >> > > > > 5. > >> > > > > > >> > > > > mxnet->nnvm/tvm backend needs more testing and > user > >> feedback. > >> > > > > > >> > > > > > >> > > > > Any suggestions on both of these approaches? From > user's > >> > > > perspective, this > >> > > > > will be an implementation detail that is not > exposed. > >> > > > > > >> > > > > Thanks, > >> > > > > > >> > > > > Roshani > >> > > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > >> > > >> > >> > >> > >> > > > > > >