> This seems to be a big change to the existing operator mode (imperative and > symbolic).
Essentially the motivation for deferred compute is to extend imperative mode to enable users to "construct a symbol" without using symbolic API. This addresses confusion around having two APIs and prevents divergence between imperative and symbolic APIs. There's no need to drop the existing imperative / symbolic APIs due to deferred compute. > Could you please provide more information. Please ask a question and I'll answer ;) > AFAIK, symbolic API already does deferred init, imperative API is provided to > improve user experience. Based on this RFC, what's the advantage of this new > deferred_compute mode? As a user, when should I use it or not. Based on deferred compute we can simplify `gluon.HybridBlock` API so that it matches the `gluon.Block` API. For example, consider you'd like to reimplement `Dense(HybridBlock)` based on extended `HybridBlock` API based on deferred compute: ``` python class Dense(HybridBlock): def __init__(self, units, use_bias=True, flatten=True, dtype='float32', weight_initializer=None, bias_initializer='zeros', in_units=0): super().__init__() self._flatten = flatten self._units = units self.weight = gluon.Parameter(shape=(units, in_units), init=weight_initializer, dtype=dtype, allow_deferred_init=True) if use_bias: self.bias = gluon.Parameter(shape=(units,), init=bias_initializer, dtype=dtype, allow_deferred_init=True) else: self.bias = None def forward(self, x): # We allow users to overwrite forward() directly. ctx = x.context return npx.FullyConnected(x, self.weight.data(ctx), self.bias.data(ctx), no_bias=bias is None, num_hidden=self._units, flatten=self._flatten, name='fwd') ``` `HybridBlock` can wrap the execution of `forward` into a deferred compute session and obtain a symbolic representation of the computation and pass it to `CachedOp`. There would be no reason for users to explicitly use the API. > Another question. We all know deferred init cause bad user experience when it > comes to debugging. Would this RFC address the debuggability issue? This RFC is orthogonal to deferred init. When updating `gluon.HybridBlock` API based on deferred compute, one option is to require statically known shapes of weights at construction time **if** users implement `def forward`. For backwards compatibility we likely want to keep deferred init around for existing code relying on `mx.sym` and implementing `def hybrid_forward`. However, the other option is to allow deferred initialization of weights and require users to implement `infer_shape`: https://github.com/apache/incubator-mxnet/blob/910c608f682a47fc2c43375b5f5a426b563e5821/python/mxnet/gluon/block.py#L1073-L1075 This works around the failures of symbolic shape inference for deferred init in case of dynamic shape ops, while still allowing users to decide the shape of weight at first forward. In the example above, it could look like: ``` python class Dense(HybridBlock): def __init__(self, units, use_bias=True, flatten=True, dtype='float32', weight_initializer=None, bias_initializer='zeros', in_units=0): [...] def infer_shape(self, x): self.weight.shape = (self.weight.shape[0], x.shape[1]) def forward(self, x): [...] ``` > If it's about performance optimization, could we have some initial data of > using this new deferred mode vs. existing imperative mode? There is the option to improve performance of imperative mode by deferring the computation and optimizing the computational graph before performing the computation. But this is not the main motivation and I haven't optimized for this use-case (yet). In the `gluon.HybridBlock` case, we only run with deferred compute once to construct the symbolic graph and then pass over to `CachedOp` for optimized execution. -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/apache/incubator-mxnet/issues/16376#issuecomment-579529593