Hello all, I notice there is a significant performance regression in SpMV after this PR (https://github.com/apache/incubator-mxnet/pull/12380). It seems the problem occurs when running on multiple GPUs (e.g., 8 GPUs). When running multiple GPUs for training, SpMV on CPU only uses two or three threads for computation. It seems to me that this is a serious bug. Can the bug be fixed in release 1.4?
Thanks, Da