Re: Request for suggestions- Supporting onnx in mxnet

Mu Li Thu, 19 Oct 2017 10:02:02 -0700

I'm speaking under my "MXNet contributor" hat.

It will be sad that our new model format and compiler is not supported by
our own contributors. It puts us in a bad position to reach out to outside
to ask for support.


If you really what to do it with the onnx <-> mxnet way, I suggest putting
the codes under https://github.com/aws.

Best
Mu

On Thu, Oct 19, 2017 at 9:51 AM, Lupesko, Hagay <lupe...@gmail.com> wrote:

> Since there seems to be a difficulty to reach a consensus here, and this
> is a new area, maybe a good compromise would be to contribute this under
> /contrib as experimental, with whatever way Roshani thinks makes sense.
> Once there is code in place, and MXNet users and contributors are able to
> check it out, we can consider future steps.
>
> Does this proposal make sense to folks?
>
> On 10/18/17, 23:01, "Tianqi Chen" <workc...@gmail.com on behalf of
> tqc...@cs.washington.edu> wrote:
>
>     I want to offer one last thing in terms of technical details. I
> mentioned
>     two trends in the deep learning systems. There is one last thing that
> is
>     omitted. How should we build a good deploy end for deep learning
> models.
>
>     There is always a paradox to this problem:
>
>     - On one hand, the deployment end needs to be lightweight and portable.
>     - We want a lot of optimizations (memory layout compute) and feature
>     support, this makes the project big.
>
>     All the existing systems suffer from this problem. The solution is
> simple,
>     separates the optimization part from the actual runtime and compiles
> the
>     things down to a bare metal module. And this is the solution nnvm/top
>     compiler pipeline offer, which I believe will become a standard
> practice of
>     deployment and where all systems go to
>
>     Tianqi
>
>     On Wed, Oct 18, 2017 at 10:03 PM, Tianqi Chen <
> tqc...@cs.washington.edu>
>     wrote:
>
>     > OK, there is some miscommunication in here I guess.  We only need to
> do a
>     > "canonization" step in python API that goes a symbol to symbol
> translation
>     > layer. It can be done in purely in python, and there is no need for
> going
>     > "down" into c++ to do this.
>     >
>     > For example, the current nnvm.from_mxnet API takes Module or Gluon
> module
>     > and get you back nnvm/top graph in python.
>     >
>     > All we are asking for is to decomposing it into
>     >
>     > def mxnet_to_onnx(module):
>     >    nnvm_graph, params = nnvm_from_mxnet(module)
>     >    onnx = nnvm_to_onnx(nnvm_graph, params)
>     >    return onnx
>     >
>     > This allows nnvm_from_mxnet to be reused for other purposes, like
>     > compiling API to deployable modules
>     >
>     > Tianqi
>     >
>     > On Wed, Oct 18, 2017 at 9:55 PM, Lupesko, Hagay <lupe...@gmail.com>
> wrote:
>     >
>     >> Tianqi:
>     >> Thanks for detailing the trends. I fully agree that ONNX is just a
> graph
>     >> serialization format – nothing more, nothing less. I also think we
> all
>     >> agree that this simple mechanism holds lots of value to DL users
> since it
>     >> allows them to move between frameworks easily (e.g. train with
> MXNet,
>     >> deploy on a mobile device with Caffe2, or the other way around).
>     >> As you said, In Memory IR is different than serialization formats
> such as
>     >> ONNX. They are designed to make the runtime execution as efficient
> as
>     >> possible, leveraging software and hardware optimizations. They are
> indeed
>     >> complex, and where the “meat” is.
>     >> (BTW ONNX regards itself as an “IR” format, but not in the same
> sense as
>     >> NNVM).
>     >>
>     >> At the end of the day, Roshani is aiming to deliver a simple
>     >> functionality to MXNet users: (1) take an ONNX file, and load it
> into MXNet
>     >> so you get a graph+weights you can work with (2) Given a trained
> model,
>     >> save it as an ONNX file. Since MXNet users do not interact with NNVM
>     >> directly, but rather interact with MXNet API (MXNet Module), isn’t
> the
>     >> simplest thing to do is just to construct the Module “on the fly”
> using
>     >> MXNet API? Taking the other approach, we will go from the top level
> MXNet
>     >> “load” API, go “down” to NNVM to construct the graph, go back up to
> MXNet
>     >> to expose it as a Module. This seems to complex and does not add any
>     >> benefit. In whatever way we construct the MXNet Module object, NNVM
> will
>     >> always be the underlying in memory IR that is being executed, so
> why not
>     >> take the simpler route?
>     >>
>     >> Hagay
>     >>
>     >> On 10/18/17, 19:42, "Tianqi Chen" <workc...@gmail.com on behalf of
>     >> tqc...@cs.washington.edu> wrote:
>     >>
>     >>     Hi Chris:
>     >>
>     >>     There is no intention to move things away from mxnet. The
> reduction of
>     >>     lines of code by having a better design in general, and
> usually, you
>     >> write
>     >>     less redundant code by benefiting from better design. As I may
> quote:
>     >> "the
>     >>     best design is not achieved not when there is nothing to add,
> but when
>     >>     there is nothing to be taken away."
>     >>
>     >>     MXNet has always benefited from this philosophy and improves
> with the
>     >> new
>     >>     designs and proper modularization. For example, we see such
> reduction
>     >> and
>     >>     convenience happening when migrating from MXNet's legacy op to
> the
>     >>     NNVM's mechanism. The new mechanism now enables things like
> sparse
>     >> aware
>     >>     support and other stuff which would be much harder to support.
>     >>
>     >>     The nnvm/tvm stack comes brings the same benefit(if not more)
> and it
>     >> will
>     >>     only add more features to MXNet itself. Offering more hardware
>     >> backends and
>     >>     optimization, allowing us to write less code and spent less
> time to
>     >>     optimize for each backend by going through TVM
>     >>
>     >>     Tianqi
>     >>
>     >>     On Wed, Oct 18, 2017 at 7:15 PM, Chris Olivier <
> cjolivie...@gmail.com
>     >> >
>     >>     wrote:
>     >>
>     >>     > Reduce code base of mxnet? By increasing scope of the dmlc
> modules?
>     >> Is the
>     >>     > intent to make mxnet a thin language wrapper around a group
> of dmlc
>     >>     > modules?
>     >>     >
>     >>     >
>     >>     > On Wed, Oct 18, 2017 at 6:58 PM Tianqi Chen <
>     >> tqc...@cs.washington.edu>
>     >>     > wrote:
>     >>     >
>     >>     > > To better answer Hagay's question, I would like to dive
> down a
>     >> bit deeper
>     >>     > > on the relation between MXNet, NNVM and model exchange
> format
>     >> like ONNX.
>     >>     > >
>     >>     > > There are two major trends in deep learning systems now:
>     >>     > >
>     >>     > > - Common serializable formats, like ONNX and CoreML, that
> defines
>     >> the
>     >>     > model
>     >>     > > exchange format.
>     >>     > > - The in-memory graph IR for quick optimization and JIT.
> NNVM,
>     >>     > Tensorflow's
>     >>     > > XLA falls into this category.
>     >>     > >
>     >>     > > The exchange formats are great, it only poses a layer of
>     >> conversion,
>     >>     > which
>     >>     > > is good for exchange. The real meat still comes from the
>     >> compilation and
>     >>     > > JIT pipeline you have to offer. For that, we will need an
>     >> in-memory IR,
>     >>     > > because of the cost of constructing, serialize could be
> high for
>     >> the
>     >>     > > exchange formats like protobuf.  And usually, the exchange
>     >> formats are
>     >>     > > designed in a minimalistic fashion, making it less easy to
> extend
>     >> more
>     >>     > > information to support in-depth optimization like automatic
>     >> quantization,
>     >>     > > accelerator support.
>     >>     > >
>     >>     > > The current MXNet relies on NNVM for in-memory IR
> manipulation
>     >> but does
>     >>     > not
>     >>     > > contain a compilation component that compiles to the
> hardware
>     >> backends.
>     >>     > > Doing export to an exchange format and then back into NNVM
> run the
>     >>     > > compilation poses too much burden that JIT compiler could
> pay.
>     >> Using the
>     >>     > > same in-memory graph IR as the compilation stack give much
> more
>     >> advantage
>     >>     > > in terms of this.
>     >>     > >
>     >>     > > The newly introduces nnvm/top and compiler offers in-memory
> graph
>     >>     > > optimization and compilation and offers more hardware
> backend
>     >> directly
>     >>     > via
>     >>     > > TVM. We already see promising results in edge deployments
> with a
>     >> much
>     >>     > lower
>     >>     > > overhead of runtime. We will further benefit quickly from
> more
>     >> graph
>     >>     > > optimizations that it has to offer.
>     >>     > >
>     >>     > > Building support around this new paradigm offers us
> advantage of
>     >> being
>     >>     > > future compatible and takes full benefit of the points I
>     >> mentioned above
>     >>     > >
>     >>     > > Tianqi
>     >>     > >
>     >>     > >
>     >>     > >
>     >>     > > On Wed, Oct 18, 2017 at 4:57 PM, Lupesko, Hagay <
>     >> lupe...@gmail.com>
>     >>     > wrote:
>     >>     > >
>     >>     > > > Roshani – this is an exciting initiative, ONNX support on
> MXNet
>     >> will
>     >>     > > > enable more users to ramp up on MXNet, which is great.
>     >>     > > >
>     >>     > > > Tianqi – a few questions and thoughts about your note:
>     >>     > > > - “More hardware backends to mxnet” – MXNet users get the
> same
>     >> benefit
>     >>     > of
>     >>     > > > HW support implementing ONNX import on top of MXNet
> symbolic,
>     >> right?
>     >>     > > > - “NNVM Compiler now received contributions from AWS, UW
> and
>     >> many other
>     >>     > > > folks in MXNet community.” – agreed it is ramping up, but
> when
>     >> you look
>     >>     > > at
>     >>     > > > the data, it is clear that it is very early on for NNVM.
>     >> Looking at the
>     >>     > > > repo, it has overall 223 commits, 0 releases. Compare it
> to
>     >> MXNet with
>     >>     > > 6136
>     >>     > > > commits and 32 releases. It seems to be still early on for
>     >> NNVM, and
>     >>     > for
>     >>     > > a
>     >>     > > > more reliable initial implementation building the import
> on top
>     >> of
>     >>     > MXNet
>     >>     > > is
>     >>     > > > easier, faster and safer. MXNet has lots of users already
> using
>     >> the
>     >>     > > > Symbolic API which hopefully mean that is a mature API
> that is
>     >> not
>     >>     > likely
>     >>     > > > to have breaking changes or major issues.
>     >>     > > >
>     >>     > > > I’m supportive option 1 proposed by Roshani (building
> serde on
>     >> top of
>     >>     > > > MXNet symbolic), but to do it as an encapsulated
> implementation
>     >> detail,
>     >>     > > so
>     >>     > > > the implementation can be migrated to NNVM or another
>     >> implementation in
>     >>     > > the
>     >>     > > > future, if at that point it seems like the right thing to
> do.
>     >>     > > >
>     >>     > > > Interested in hearing other opinions though…
>     >>     > > >
>     >>     > > > Hagay
>     >>     > > >
>     >>     > > > On 10/18/17, 14:13, "Tianqi Chen" <workc...@gmail.com on
>     >> behalf of
>     >>     > > > tqc...@cs.washington.edu> wrote:
>     >>     > > >
>     >>     > > >     I am strongly recommending going through the
> nnvm/top. One
>     >> major
>     >>     > > > reason in
>     >>     > > >     here is that the support of nnvm/top layer NOT ONLY
> mean
>     >>     > > compatibility
>     >>     > > > of
>     >>     > > >     model format with onnx. These are the major benefits:
>     >>     > > >
>     >>     > > >
>     >>     > > >     - More hardware backends to mxnet, including opencl,
> metal,
>     >>     > Raspberry
>     >>     > > > Pi,
>     >>     > > >     web browser. These things are automatically enabled
> by going
>     >>     > through
>     >>     > > > this
>     >>     > > >     layer. In general, we design nnvm/tvm stack to
> resolve the
>     >>     > challenge
>     >>     > > of
>     >>     > > >     current mxnet's weakness in terms deploying to more
> hardware
>     >>     > > backends.
>     >>     > > >
>     >>     > > >     - More frontend capabilities, nnvm's gluon style IR
> ingests
>     >> now
>     >>     > from
>     >>     > > >     CoreML, ONNX and in future keras. Supporting those
> will
>     >> reduce the
>     >>     > > > amount
>     >>     > > >     of engineering effort needed.
>     >>     > > >
>     >>     > > >     - Future compatibility. We all agree that the future
> being
>     >> migrated
>     >>     > > to
>     >>     > > >     gluon's API. NNVM/top tries to look ahead by directly
>     >> adopting the
>     >>     > > > symbolic
>     >>     > > >     API to be gluon.
>     >>     > > >
>     >>     > > >
>     >>     > > >     I would also like to correct some of the mentioned
> facts
>     >> with
>     >>     > regard
>     >>     > > to
>     >>     > > >     nnvm/tvm stack
>     >>     > > >
>     >>     > > >     1.   Nascent project with few contributors
>     >>     > > >
>     >>     > > >     NNVM Compiler now received contributions from AWS, UW
> and
>     >> many
>     >>     > other
>     >>     > > > folks
>     >>     > > >     in MXNet community. NNVM itself is already being used
> by
>     >> MXNet.
>     >>     > > >     MXNet's internal IR is migrating toward gluon, and its
>     >> final form
>     >>     > > being
>     >>     > > >     nnvm/top
>     >>     > > >
>     >>     > > >     3.   Does not support all operators that exist in
> MXNet
>     >> Symbolic
>     >>     > API
>     >>     > > >
>     >>     > > >     Neither NNVM/top or onnx support all operators that
> exist
>     >> in mxnet
>     >>     > > > symbolic
>     >>     > > >     API. The end goal here is mainly to make nnvm/top onnx
>     >> compatible,
>     >>     > > > which is
>     >>     > > >     a more reasonable goal.
>     >>     > > >
>     >>     > > >     4.  No CI Pipeline and testcases
>     >>     > > >
>     >>     > > >     NNVM already contains a compiler contains unittests
> and ci
>     >> tested
>     >>     > > with
>     >>     > > >     integration  https://github.com/dmlc/nnvm, with a CI
>     >> pipline that
>     >>     > is
>     >>     > > > well
>     >>     > > >     tested on CPU and GPU cases for front-ends.
>     >>     > > >
>     >>     > > >     Tianqi
>     >>     > > >
>     >>     > > >
>     >>     > > >     On Wed, Oct 18, 2017 at 1:41 PM, Roshani Nagmote <
>     >>     > > > roshaninagmo...@gmail.com>
>     >>     > > >     wrote:
>     >>     > > >
>     >>     > > >     > Hi guys,
>     >>     > > >     >
>     >>     > > >     >
>     >>     > > >     > I am working on supporting ONNX <
>     >> https://github.com/onnx/onnx>
>     >>     > > > pre-trained
>     >>     > > >     > models in Apache MXNet and would like to seek your
>     >> opinion on the
>     >>     > > > choice of
>     >>     > > >     > implementation. I also have created a GitHub issue
>     >>     > > >     > <https://github.com/apache/
> incubator-mxnet/issues/8319>.
>     >>     > > Supporting
>     >>     > > > ONNX
>     >>     > > >     > in
>     >>     > > >     > MXNet will enable users to move between frameworks
> with
>     >> their
>     >>     > > > models, this
>     >>     > > >     > will also enable MXNet project to be a part of the
> ONNX
>     >> open
>     >>     > > > standard and
>     >>     > > >     > steer the direction of ONNX.
>     >>     > > >     >
>     >>     > > >     >
>     >>     > > >     > For those who don’t know ONNX, ONNX is an open
> source
>     >> format for
>     >>     > AI
>     >>     > > > models
>     >>     > > >     > which enables models to be transferred between
>     >> frameworks. Refer
>     >>     > to
>     >>     > > >     > https://github.com/onnx/onnx for more details.
>     >>     > > >     >
>     >>     > > >     >
>     >>     > > >     > To implement the import/export functionality in
> MXNet, I
>     >> propose
>     >>     > to
>     >>     > > > expose
>     >>     > > >     > a MXNet python module “serde”(name taken from
> Apache Hive
>     >>     > project)
>     >>     > > > with the
>     >>     > > >     > following methods supporting different formats:
>     >>     > > >     >
>     >>     > > >     > sym, params = mxnet.serde.import(other_format_file,
>     >>     > > > other_format=‘onnx’)
>     >>     > > >     >
>     >>     > > >     > other_format_file =  mxnet.serde.export(mxnet_sym,
>     >> mxnet_params,
>     >>     > > > ‘onnx’)
>     >>     > > >     >
>     >>     > > >     >
>     >>     > > >     > The implementation under the hood can be done in
> two ways:
>     >>     > > >     >
>     >>     > > >     >
>     >>     > > >     > 1) Implement at the MXNet layer by parsing the ONNX
>     >> model(in
>     >>     > > protobuf
>     >>     > > >     > format) and turn into MXNet Symbolic operators and
> build
>     >> MXNet
>     >>     > > model
>     >>     > > >     > directly. Similarly, I can convert the MXNet model
> to
>     >> ONNX format
>     >>     > > at
>     >>     > > > this
>     >>     > > >     > layer.
>     >>     > > >     >
>     >>     > > >     >
>     >>     > > >     > 2) The DMLC community has released the nnvm/tvm
> complier
>     >> and an
>     >>     > > >     > intermediate representation of the models, refer:
>     >>     > > >     > http://www.tvmlang.org/2017/
> 10/06/nnvm/tvm-compiler-
>     >>     > > > announcement.html
>     >>     > > >     > <http://www.tvmlang.org/2017/10/06/nnvm-compiler-
>     >>     > announcement.html
>     >>     > > >
>     >>     > > >     >
>     >>     > > >     > Based on the conversation on the GitHub issue
>     >>     > > >     > <https://github.com/apache/
> incubator-mxnet/issues/8319> I
>     >>     > opened,
>     >>     > > Mu
>     >>     > > >     > mentioned that MXNet would use nnvm/tvm as the
> backend in
>     >> the
>     >>     > > future.
>     >>     > > >     >
>     >>     > > >     >
>     >>     > > >     > We could hook into this layer to implement the
>     >> import/export
>     >>     > > > functionality.
>     >>     > > >     > nnvm/tvm has ONNX 0.1 version import implemented.
>     >>     > > >     >
>     >>     > > >     > For import,
>     >>     > > >     >
>     >>     > > >     >    1.
>     >>     > > >     >
>     >>     > > >     >    I will need to enhance nnvm/tvm’s importer to
> support
>     >> ONNX 0.2
>     >>     > > >     >    2.
>     >>     > > >     >
>     >>     > > >     >    Implement nnvm/tvm->mxnet symbolic operators.
>     >>     > > >     >
>     >>     > > >     > For export:
>     >>     > > >     >
>     >>     > > >     >
>     >>     > > >     >    1.
>     >>     > > >     >
>     >>     > > >     >    mxnet->nnvm/tvm ( nnvm/tvm provides this
> implementation
>     >>     > already)
>     >>     > > >     >    2.
>     >>     > > >     >
>     >>     > > >     >    I will need to Implement nnvm/tvm>onnx.
>     >>     > > >     >
>     >>     > > >     >
>     >>     > > >     > These are the pros and cons I see in the above
> approaches:
>     >>     > > >     >
>     >>     > > >     >    1.
>     >>     > > >     >
>     >>     > > >     >    Import/export at mxnet layer
>     >>     > > >     >
>     >>     > > >     > Pros:
>     >>     > > >     >
>     >>     > > >     >    1.
>     >>     > > >     >
>     >>     > > >     >    Stable APIs currently used by users.
>     >>     > > >     >    2.
>     >>     > > >     >
>     >>     > > >     >    Larger Apache MXNet community of contributors.
>     >>     > > >     >    3.
>     >>     > > >     >
>     >>     > > >     >    CI pipeline to catch bugs.
>     >>     > > >     >    4.
>     >>     > > >     >
>     >>     > > >     >    Comparatively less time to implement and put it
> in the
>     >> hands
>     >>     > of
>     >>     > > > the
>     >>     > > >     >    users.
>     >>     > > >     >
>     >>     > > >     > Cons:
>     >>     > > >     >
>     >>     > > >     >    1.
>     >>     > > >     >
>     >>     > > >     >    In the future we may have to reimplement at the
>     >> nnvm/tvm
>     >>     > layer,
>     >>     > > > in case
>     >>     > > >     >    MXNet moves to the nnvm/tvm backend(assuming it
> will
>     >> move).
>     >>     > > >     >
>     >>     > > >     >
>     >>     > > >     >
>     >>     > > >     >    1.
>     >>     > > >     >
>     >>     > > >     >    Import/export at nnvm/tvm layer
>     >>     > > >     >
>     >>     > > >     > Pros:
>     >>     > > >     >
>     >>     > > >     >    1.
>     >>     > > >     >
>     >>     > > >     >    Less engineering work in case mxnet moves to
> nnvm/tvm
>     >>     > > >     >    2.
>     >>     > > >     >
>     >>     > > >     >    nnvm/tvm would become a hub to convert to
> different
>     >> formats.
>     >>     > > >     >    3.
>     >>     > > >     >
>     >>     > > >     >    nnvm operators are more in parity with mxnet’s
> gluon
>     >> APIs this
>     >>     > > > could be
>     >>     > > >     >    useful in case Gluon becomes the only standard
> that
>     >> MXNet will
>     >>     > > > support.
>     >>     > > >     >
>     >>     > > >     > Cons:
>     >>     > > >     >
>     >>     > > >     >    1.
>     >>     > > >     >
>     >>     > > >     >    Nascent project with few contributors
>     >>     > > >     >    2.
>     >>     > > >     >
>     >>     > > >     >    Does not support all operators that exist in
> MXNet
>     >> Symbolic
>     >>     > API
>     >>     > > >     >    3.
>     >>     > > >     >
>     >>     > > >     >    No CI Pipeline
>     >>     > > >     >    4.
>     >>     > > >     >
>     >>     > > >     >    Current Apache MXNet project does not use
> nnvm/tvm
>     >> backend
>     >>     > > >     >    5.
>     >>     > > >     >
>     >>     > > >     >    mxnet->nnvm/tvm backend needs more testing and
> user
>     >> feedback.
>     >>     > > >     >
>     >>     > > >     >
>     >>     > > >     > Any suggestions on both of these approaches? From
> user's
>     >>     > > > perspective, this
>     >>     > > >     > will be an implementation detail that is not
> exposed.
>     >>     > > >     >
>     >>     > > >     > Thanks,
>     >>     > > >     >
>     >>     > > >     > Roshani
>     >>     > > >     >
>     >>     > > >
>     >>     > > >
>     >>     > > >
>     >>     > > >
>     >>     > >
>     >>     >
>     >>
>     >>
>     >>
>     >>
>     >
>
>
>
>

Re: Request for suggestions- Supporting onnx in mxnet

Reply via email to