Hey !
I take the exemple of Mnist CNN from the [tuto](https://mxnet.apache.org/versions/1.7/api/python/docs/tutorials/packages/gluon/image/mnist.html), that is : net=gluon.nn.HybridSequential() with net.name_scope(): net.add(gluon.nn.Conv2D(20, kernel_size=(5,5), activation='tanh')) net.add(gluon.nn.MaxPool2D(pool_size=(2,2), strides = (2,2))) net.add(gluon.nn.Conv2D(50, kernel_size=(5,5), activation='tanh')) net.add(gluon.nn.MaxPool2D(pool_size=(2,2), strides = (2,2))) net.add(gluon. nn.Dense(500, activation='tanh')) net.add(gluon. nn.Dense(10, activation='tanh')) net.initialize(mx.init.Xavier(magnitude=2.3)) I don't understand why the number of paramters after a fully connected layer is not the same when calling summary and print_summary. `mx.viz.print_summary(net(mx.sym.var('data')), shape={'data':(1,1,32,32)} )` gives Layer (type) Output Shape Param # Previous Layer ======================================================================================================================== data(null) 1x32x32 0 ________________________________________________________________________________________________________________________ hybridsequential0_conv0_fwd(Convolution) 20x28x28 520 data ________________________________________________________________________________________________________________________ hybridsequential0_conv0_tanh_fwd(Activation) 20x28x28 0 hybridsequential0_conv0_fwd ________________________________________________________________________________________________________________________ hybridsequential0_pool0_fwd(Pooling) 20x14x14 0 hybridsequential0_conv0_tanh_fwd ________________________________________________________________________________________________________________________ hybridsequential0_conv1_fwd(Convolution) 50x10x10 25050 hybridsequential0_pool0_fwd ________________________________________________________________________________________________________________________ hybridsequential0_conv1_tanh_fwd(Activation) 50x10x10 0 hybridsequential0_conv1_fwd ________________________________________________________________________________________________________________________ hybridsequential0_pool1_fwd(Pooling) 50x5x5 0 hybridsequential0_conv1_tanh_fwd ________________________________________________________________________________________________________________________ hybridsequential0_dense0_fwd(FullyConnected) 500 25500 hybridsequential0_pool1_fwd ________________________________________________________________________________________________________________________ hybridsequential0_dense0_tanh_fwd(Activation) 500 0 hybridsequential0_dense0_fwd ________________________________________________________________________________________________________________________ hybridsequential0_dense1_fwd(FullyConnected) 10 5010 hybridsequential0_dense0_tanh_fw ________________________________________________________________________________________________________________________ hybridsequential0_dense1_tanh_fwd(Activation) 10 0 hybridsequential0_dense1_fwd ======================================================================================================================== Total params: 56080 and `net.summary(mx.nd.zeros((1,1,32,32)))` gives Layer (type) Output Shape Param # ================================================================================ Input (1, 1, 32, 32) 0 Activation-1 <Symbol hybridsequential0_conv0_tanh_fwd> 0 Activation-2 (1, 20, 28, 28) 0 Conv2D-3 (1, 20, 28, 28) 520 MaxPool2D-4 (1, 20, 14, 14) 0 Activation-5 <Symbol hybridsequential0_conv1_tanh_fwd> 0 Activation-6 (1, 50, 10, 10) 0 Conv2D-7 (1, 50, 10, 10) 25050 MaxPool2D-8 (1, 50, 5, 5) 0 Activation-9 <Symbol hybridsequential0_dense0_tanh_fwd> 0 Activation-10 (1, 500) 0 Dense-11 (1, 500) 625500 Activation-12 <Symbol hybridsequential0_dense1_tanh_fwd> 0 Activation-13 (1, 10) 0 Dense-14 (1, 10) 5010 ================================================================================ Parameters in forward computation graph, duplicate included Total params: 656080 Trainable params: 656080 Non-trainable params: 0 Shared params in forward computation graph: 0 Unique parameters in model: 656080 The number of parameters differs in the first fully connected layer (with 500units). In first case : 25500in second case : 625500. Could someone explain me why this two functions don't give the same results and tell me which is right ? thanks --- [Visit Topic](https://discuss.mxnet.apache.org/t/difference-between-net-summary-and-viz-print-summary/6619/1) or reply to this email to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.mxnet.apache.org/email/unsubscribe/4537dc9d01851a7acd166f5ffddcaf4b7a40d1ec7144ea18dd0a477afd1c3c78).
