QueensGambit commented on issue #8832: I didn't get a reasonable speed-up when applying depthwise convolution to VGG16 URL: https://github.com/apache/incubator-mxnet/issues/8832#issuecomment-513513890 Hi @lawrencewxj @edmBernard For a certain model architecture which heavily uses depthwise separable convolutions, I achieved a 1.4x speed-up on GPU ( MXNET-cu10 1.4.1,CUDA 10.0, cuDNN v7.5.1.10), but a 3x speed-up on CPU (MXNET-mkl 1.4.1). The main reason for this is because grouped convolutions cause memory fraction and are naturally not well suited for GPUs. The following paper conducted an experiment on this (see Table 2). **ShuffleNet V2: Practical Guidelines for EfficientCNN Architecture Design** (Ma et al., 2018) * https://arxiv.org/pdf/1807.11164.pdf Recent cuDNN-Versions are improving the performance for group-convolutions: * https://docs.nvidia.com/deeplearning/sdk/pdf/cuDNN-Release-Notes.pdf It could be good to verify that MXNet makes use of all these recent optimizations. Best, ~QueensGambit
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services