> This seems to be a big change to the existing operator mode (imperative and 
> symbolic).

Essentially the motivation for deferred compute is to extend imperative mode to 
enable users to "construct a symbol" without using symbolic API. This addresses 
confusion around having two APIs and prevents divergence between imperative and 
symbolic APIs. There's no need to drop the existing imperative / symbolic APIs 
due to deferred compute.

> Could you please provide more information.

Please ask a question and I'll answer ;)

> AFAIK, symbolic API already does deferred init, imperative API is provided to 
> improve user experience. Based on this RFC, what's the advantage of this new 
> deferred_compute mode? As a user, when should I use it or not.

Based on deferred compute we can simplify `gluon.HybridBlock` API so that it 
matches the `gluon.Block` API. For example, consider you'd like to reimplement 
`Dense(HybridBlock)` based on extended `HybridBlock` API based on deferred 
compute:

``` python
class Dense(HybridBlock):
    def __init__(self, units, use_bias=True, flatten=True,
                 dtype='float32', weight_initializer=None, 
bias_initializer='zeros',
                 in_units=0): 
        super().__init__()
        self._flatten = flatten
        self._units = units
        self.weight = gluon.Parameter(shape=(units, in_units),
                                      init=weight_initializer, dtype=dtype,
                                      allow_deferred_init=True)
        if use_bias:
            self.bias = gluon.Parameter(shape=(units,),
                                        init=bias_initializer, dtype=dtype,
                                        allow_deferred_init=True)
        else:
            self.bias = None

    def forward(self, x):  # We allow users to overwrite forward() directly.    
        ctx = x.context
        return npx.FullyConnected(x, self.weight.data(ctx), self.bias.data(ctx),
              no_bias=bias is None, num_hidden=self._units,
              flatten=self._flatten, name='fwd')
```

`HybridBlock` can wrap the execution of `forward` into a deferred compute 
session and obtain a symbolic representation of the computation and pass it to 
`CachedOp`.

There would be no reason for users to explicitly use the API.

> Another question. We all know deferred init cause bad user experience when it 
> comes to debugging. Would this RFC address the debuggability issue?

This RFC is orthogonal to deferred init. When updating `gluon.HybridBlock` API 
based on deferred compute, one option is to require statically known shapes of 
weights at construction time **if** users implement `def forward`. For 
backwards compatibility we likely want to keep deferred init around for 
existing code relying on `mx.sym` and implementing `def hybrid_forward`.

However, the other option is to allow deferred initialization of weights and 
require users to implement `infer_shape`:

https://github.com/apache/incubator-mxnet/blob/910c608f682a47fc2c43375b5f5a426b563e5821/python/mxnet/gluon/block.py#L1073-L1075

This works around the failures of symbolic shape inference for deferred init in 
case of dynamic shape ops, while still allowing users to decide the shape of 
weight at first forward.

In the example above, it could look like:

``` python
class Dense(HybridBlock):
    def __init__(self, units, use_bias=True, flatten=True,
                 dtype='float32', weight_initializer=None, 
bias_initializer='zeros',
                 in_units=0): 
        [...]

    def infer_shape(self, x):
        self.weight.shape = (self.weight.shape[0], x.shape[1])

    def forward(self, x):
        [...]
```

> If it's about performance optimization, could we have some initial data of 
> using this new deferred mode vs. existing imperative mode?

There is the option to improve performance of imperative mode by deferring the 
computation and optimizing the computational graph before performing the 
computation. But this is not the main motivation and I haven't optimized for 
this use-case (yet). In the `gluon.HybridBlock` case, we only run with deferred 
compute once to construct the symbolic graph and then pass over to `CachedOp` 
for optimized execution.

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/16376#issuecomment-579529593

Reply via email to